The data is presented to the neural network in different ways for different neural networks that I'm testing. However, this isn't really what's important to determining a model's likelihood of overfitting. What is most important is the number of input nodes, output nodes, layers of hidden nodes, and nodes in each hidden layer.
This sounds strange to me. Most important must be what data to train it on, don't you agree? While the most important
goal of the neural net is that it actually does what it is supposed to do, in this case predict the future? The actual technical design, the amount of nodes, feedback propagation etc., can only work to fit data faster and better.
Yes, of course the specific data you train the neural network on is important to determining whether or not it's going to overfit. However, when predicting bitcoin prices there is really only one available set of training data, so that is kind of irrelevant in these circumstances. We're always going to be training on historic bitcoin price data, so the factors that affect whether or not it overfits this data are generally going to be the structure. More complex structures allow it to represent more complex functions.
So basically, when I say that the neural network is not complex enough to overfit this data, I mean that the structure is not complex enough that it could learn the data closely enough to overfit.
Surely even a very simple network can overfit the training data set, as opposed to the working data set? (There may be a language issue here so perhaps I misunderstand.)
I think you understood correctly. A simple neural network cannot overfit a very large and complicated set of training data. People here seem to think that neural networks of any size and structure can represent any function, but this is not true. For example, you could have a neural network with 800 inputs and 800 outputs, but no matter how much you trained it on whatever data you give it, it will never be able to map a simple XOR function.
I'd be interested in hearing how he is presenting the BTC data to the neural network. I once saw someone present dates along with price data to a neural network he designed. It ended up fitting itself to the dates, buying and selling on specific dates, without really looking at the prices at all.
This is a very good example of what I was trying to say above. (And a more complex network could very well have made this fit even better. Yay.)
A big warning sign for me is that the author is proud that historical data "fits". I am just as interested modelling bitcoin price data as anyone around here, but I defer judgement until I've seen it work its magic on future data.
If this model were overfitting, it would perform a lot better than it's performing on historic data, and its average error would be much lower. I've seen these neural networks overfit, and when they do, I restructure them and change parameters so that they don't. I've tested these historic data, data from other exchanges, and future data. The model is not overfitting. And no, it doesn't look at dates. It is very obvious that doing this would lead to overfitting lol... I don't know why anyone would ever input dates into a neural network for predicting prices. This model only looks at past prices right now (though hopefully it will look at sentiment data too soon) and the past prices are represented in various different ways in different neural networks.