Author

Topic: Bitcoin Price Prediction Software Using Neural Networks (Read 4148 times)

member
Activity: 88
Merit: 10
The problem of neural networks and logistic regression is following - your cost function can easily stuck at local optima.
Two ways to avoid it:
1. Few times randomly initialize your initial weight parameters and see which iteration performs better (chooses better local optima)
2. Switch to different learning algorithm


Btw, I don't understand why people avoid linear regression with polynomial features. At least it converges when finding global optima
member
Activity: 84
Merit: 10
I experimented a bit with having it only train on more recent portions of the data in hopes that this might make it better at predicted future data. I might try this again when I work on a 4-layer network, which I'll be starting soon. Like you said, it's hard to know if it might do better on future data by only training on recent data.

The reason that I say the data is large enough that the neural network would have to be able to generalize to attain a reasonable error is because a function that fit to the historic bitcoin price data without generalizing would have to be more complex than three-layer neural networks are capable of. Three layer neural networks are limited to being capable of representing continuous functions, and a continuous function just wouldn't be able to fit to all that data in a way that it wouldn't for new data. This is part of why I'm going to work on a 4-layer neural network (since they are capable of representing any function).
full member
Activity: 154
Merit: 100
I agree that training the Neural Network on 60% of the data probably leads to slightly worse results. I still think that this is vastly better than having essentially no clue about how well your algorithm performs.

The arguments you give to justify training on 100% of the data set seem more backed up by assumptions than actual evidence. The dataset might be large enough, or not, how would you know if you haven't tested it?

Additionally, training on the full 3 years.. i'm not sure how useful that is. The market was a lot different a year ago. patterns that where true back them might no longer be true today. But that's something that is quite hard to decide because of the limited data available.
member
Activity: 84
Merit: 10
I can see your point about how testing it on the set it is trained on does not show how well it generalizes to new samples. However, regardless of how a neural network is trained, the most accurate test of its performance will be testing it on the sample that is closest to what it will be used for. This is especially true since the data is large enough that the neural network would not be able to attain a reasonable error without being able to generalize.

If I were to train it on 60% of the data and test it on the other 40%, it would definitely perform worse. The reason, however, would just be because it hasn't had anywhere near as much data to train on. A better way of testing the neural network's ability to generalize to new data might be to collect bitcoin price data for about a month and then test the neural network now on that next month's data. This might perform worse than on the test/training sample, but I doubt it would be significantly different.

I think the potential issue you are pointing out would be more relevant on a neural network that was training on a much smaller amount of data. I could see a neural network learning a specific pattern in a set of data that may not necessarily be indicative of new data. However, with something like 3 years worth of bitcoin transactions, I doubt something like that would happen.
full member
Activity: 154
Merit: 100
I'm not suggesting that you should test your algorithm on a different data set. I'm suggesting that you should test the performance of your algorithm on a separate data set.

Lets assume for a moment that we're trying to train an algorithm to classify john's handwriting. To do this, we give the algorithm a bunch of data either classified as 'john' or 'not john'. We give the algorithm all the data and start training. After training, assume the algorithm has 95% accuracy (whatever that means) on the training set.

The problem with this approach is that we have no idea how well this algorithm generalizes to new samples. If we give the algorithm 100 new samples of john's handwriting. How well will we do? The answer is that we don't know. Since we have never tested the performance of our algorithm on samples that it has never seen before.

A better approach would be to create a split in the data. Give the algorithm 60% of the data to train on, then measure the accuracy of your predictions on the other 40% of the data. As i have demonstrated in my previous post, it is possible to have perfect predictions on the training set, but essentially random predictions on the testing set.


Applying this to your situation, give your algorithm 60% ('the training set') or so of the data (randomly chosen) and test the accuracy on the other 40% ('the testing set'). You'll see that the predictive accuracy on the training set is vastly different than the predictive accuracy of the test set. If that isn't the case, i would be highly impressed.

Source: Andrew Ng's excelent Machine learning course on Coursera.
member
Activity: 84
Merit: 10
There's your problem. Accuracy measures are totally useless when not done in separate test data.

I was playing around with SVM's yesterday, and i was able to get 100% accuracy classifying 'buy' or 'sell' moments on my training set. But after a few hours of hacking my test or validation set accuracy where still below the level attained by dice rolls. In fact the predictions where worse than dice rolls.

I disagree. What could possibly a better test of how accurate the neural network would be at predicting bitcoin prices than actual bitcoin prices (all of them over the past three years)? Testing its accuracy on any other data set would be misleading because the test data would be less indicative of what will happen in the future than the training data.

What you are saying about measuring accuracy would be true for an application like facial recognition, handwriting recognition, or sound analysis, but it is not for bitcoin. The reason is that bitcoin is unique, and sounds/faces/handwriting are not. What I mean by this is that you can train a neural network to recognize John's handwriting, and it will be able to read Joe's handwriting. However, I doubt the equivalent for bitcoin would work. If you trained a neural network to recognize, say, stock price patterns, it would probably perform worse at recognizing bitcoin price patterns than it would if it had been trained on bitcoin price data.
full member
Activity: 154
Merit: 100

Also I do not use a separate data set for testing - I train it and test it on the entire historic data.

There's your problem. Accuracy measures are totally useless when not done in separate test data.

I was playing around with SVM's yesterday, and i was able to get 100% accuracy classifying 'buy' or 'sell' moments on my training set. But after a few hours of hacking my test or validation set accuracy where still below the level attained by dice rolls. In fact the predictions where worse than dice rolls.
member
Activity: 84
Merit: 10
It is not possible to compare those percentages unless we use exactly the same test. Which is why i gave you the code used to calculate my error Smiley

It would be useful if you could create a better way to provide an accuracy. For example, classify all examples in 2 categories: uptrend, or downtrend. Then see how many false positives each category has.

Edit: i forgot something, do you use training and test set separation?

I think you are calculating yours pretty much the same way I do mine? I just calculate what percent off each prediction is from the actual price, and then average all of these percentages, having it make predictions at every point throughout bitcoin's historic data.

Your suggestion about classifying everything as uptrend or downtrend is interesting, I might try that. That could be insightful to seeing if it predicts the graph shape well even if it gets the magnitudes of rises and falls incorrect.

Also I do not use a separate data set for testing - I train it and test it on the entire historic data.
full member
Activity: 154
Merit: 100
Code:
bins.vwa = large array with the volume weighted average binned by 1 hour
errors = []

for i in range( len(bins.vwa) - 25 ):
forecast = bins.vwa[i]
actual = bins.vwa[i+24]
diff = actual - forecast
relError = abs(diff) / forecast * 100.0
errors.append( relError )

print sum( errors ) / len( errors ), '% error on average'

How do you execute this in web ?
It's python code: https://www.python.org/



But also doesn't your program prove that mine is effective? You said that by guessing the price will always stay the same you had an average error of 2.68%, and that was only looking at the price since November. Mine has an average error of 1.3%, or often less, by looking at all data over the past 3 years. There's no way this would be possible if it wasn't guessing trends correctly more often than incorrectly. Honestly your post just gave me a lot more confidence in my own software lol
It is not possible to compare those percentages unless we use exactly the same test. Which is why i gave you the code used to calculate my error Smiley

It would be useful if you could create a better way to provide an accuracy. For example, classify all examples in 2 categories: uptrend, or downtrend. Then see how many false positives each category has.

Edit: i forgot something, do you use training and test set separation?
member
Activity: 84
Merit: 10
'over the past day'. That really means nothing.


For kicks and giggles, i hacked together a tool that always forecasts that nothing will happen. The average 24h error over the period since november based on BTC-e data is 2.68%

Python:
Code:
bins.vwa = large array with the volume weighted average binned by 1 hour
errors = []

for i in range( len(bins.vwa) - 25 ):
forecast = bins.vwa[i]
actual = bins.vwa[i+24]
diff = actual - forecast
relError = abs(diff) / forecast * 100.0
errors.append( relError )

print sum( errors ) / len( errors ), '% error on average'

If you really have something that is more accurate than random dice rolls. You're going to make tons of money trading 'the big stocks'. Forecasting the price past a few seconds is usually thought to be impossible. Most HFT firms make money over predicting the price microseconds away.

lol I mean, obviously performance over one day isn't going to be a solid measurement of overall performance. I was just noting that since it's been accurate recently, it may be more likely to be accurate in the immediate future.

But also doesn't your program prove that mine is effective? You said that by guessing the price will always stay the same you had an average error of 2.68%, and that was only looking at the price since November. Mine has an average error of 1.3%, or often less, by looking at all data over the past 3 years. There's no way this would be possible if it wasn't guessing trends correctly more often than incorrectly. Honestly your post just gave me a lot more confidence in my own software lol
full member
Activity: 154
Merit: 100
Over the past day ish it has definitely had an average error of less than 1% for the 24 hour chart.
'over the past day'. That really means nothing.


For kicks and giggles, i hacked together a tool that always forecasts that nothing will happen. The average 24h error over the period since november based on BTC-e data is 2.68%

Python:
Code:
bins.vwa = large array with the volume weighted average binned by 1 hour
errors = []

for i in range( len(bins.vwa) - 25 ):
forecast = bins.vwa[i]
actual = bins.vwa[i+24]
diff = actual - forecast
relError = abs(diff) / forecast * 100.0
errors.append( relError )

print sum( errors ) / len( errors ), '% error on average'

If you really have something that is more accurate than random dice rolls. You're going to make tons of money trading 'the big stocks'. Forecasting the price past a few seconds is usually thought to be impossible. Most HFT firms make money over predicting the price microseconds away.
hero member
Activity: 714
Merit: 500
NEED CRYPTO CODER? COIN DEVELOPER? PM US FOR HELP!
Very well done i loved it man great effort for the community!!!
just 2 quick questions:
what's Time zone you are using???
source available Huh
member
Activity: 84
Merit: 10
Chancellor - The 24 hour and 5 day prediction charts are not always coherent for a couple reason. The first reason is because the 24 hour chart predicts average prices over 1 hour periods and the 5 day one predicts average prices over 6 hour periods, so the 24 hour chart is more likely to show slightly more volatility. The other reason they aren't always coherent is simply because the predictions are created by different neural networks with different inputs. In the case of significant differences between their predictions, most of the time the 24 hour chart is more accurate.

Darkstone2 - The error is how far off the predicted prices are from the actual prices on average. That means that if the 24 hour chart predicts a price of $500, the price will, on average, fall between $493.5 and $506.5. Sure it's not 100% accurate, but you will almost definitely not find a human who will do that well. Plus, the error includes when real-life events that can't be foreseen by software affect bitcoin prices, and during those times the price is further off than those average errors. This means that normally, it is actually more accurate than that. Over the past day ish it has definitely had an average error of less than 1% for the 24 hour chart.

I can assure you this does much better than random dice rolls lol... I was testing a trade simulating program that made decisions based on these predictions for a while, and it was earning a decent amount of money until the China news came out and prices plummeted.

Oh and the 20 day prediction is more for kicks and giggles. That's why I put the little note above the chart explaining that it isn't accurate... plus it says it has like 8% error.
full member
Activity: 154
Merit: 100
According to his website, the error is the historical stddev.

That means that the forecast is, on average (for the hour chart) 2.6% 1.6% off the actual rate. This seems pretty terrible to me.

Last week i tried my own machine learning price forecasting. My best result was worse than random dice rolls. Forecasting stock prizes is hard. I certainly would not except any kind of accuracy beyond 30 seconds of market data. Providing an 20 day prediction is simply madness.
full member
Activity: 154
Merit: 100
I have noticed that 24 hour prediction and 5 day prediction are not coherent with each other. For example now 24 hour prediction prognoses $577.5 at 4:00 CET tomorrow. 5 days prediction shows $497 for the same hour. The difference is far more than advertised 2.6% of error...
member
Activity: 84
Merit: 10
Looks cool, is there a way to get the source?

Thanks! The source code is proprietary for now, but you can access the prediction data at the following URLs:

24 hour: www.btcpredictions.com/STData.txt
5 day: www.btcpredictions.com/LTData.txt
20 day: www.btcpredictions.com/SLTData.txt

The first line of each of these files is a question mark followed by the unix time stamp of when the prediction was made. The second line is an exclamation point followed by the average error for the neural network that made those predictions. Every other line is the letter k followed by a price.

You are welcome to use any/all of this data but if you display it publicly please cite the source Smiley
newbie
Activity: 26
Merit: 0
Looks cool, is there a way to get the source?
member
Activity: 84
Merit: 10
I've been working on software that uses neural networks to predict bitcoin prices. It has been making reasonably accurate predictions. You can see them on this website:

http://www.btcpredictions.com

The 24 hour and 5 day predictions are often pretty good, and the 20 day prediction is a bit less accurate but still interesting to see. The predictions are certainly far from perfect but I think they still provide valuable information. By considering this forecast in addition to recent news stories (and your own, presumably intelligent, judgement), I think there is some solid potential to make money trading.
Jump to: