Author

Topic: Thinking of doing my master's thesis (in statistics) on Bitcoin. (Read 1365 times)

sr. member
Activity: 520
Merit: 253
555
1. Analyse the network hashrate for cyclical trends (is a large increase in hashrate returning the same time every day / week ?) 2. Find correlations between network hashrate and known external phenomena  (heatwaves might lead many to turn GPUs off, countries with high electricity costs probably have fewer miners)

Seconded. Whenever I look at the graphs by Sipa, I cannot help imagining there must be some daily/weekly trends. For example, workplace computers being turned on and off daily. However, that would only be noticeable if the computers are unevenly distributed by timezone, although weekends would show a clear difference. Another weekly factor is that exchanges are quieter on weekends, when there are no bank transfers.

I wonder what it would look like if the hashrate curves for each week were normalized, so as to compare them to each other to spot any trends. This would also filter out most of the inherent variance in hashing results.
donator
Activity: 2058
Merit: 1007
Poor impulse control.
Organofcorti,  that's a cool and well-done blog.  Where do you get the information?

Well, hello_good_sir and thank you.

All data is self published by pools, either as json, csv or html tables. I use R's webscraping tools, JSON converters and curl to get the data.

Pools that don't publish data as csv or json are a bit of a pain - every weekend when I come to post the weekly average data, one of them changes something on their data page and I spend too much time rewriting and testing the script.

If you're concerned that self published data from pools could be faked, you might find posts 10,11, and 12 at howtohop.blogspot.com and posts 1 and 1.5 at organofcorti.blogspot.com interesting.

https://bitcointalksearch.org/topic/m.769530

hero member
Activity: 1008
Merit: 531
Organofcorti,  that's a cool and well-done blog.  Where do you get the information?

Szuetam, those are really good questions.  I'll look into the blockchain a bit this weekend and see how feasible they are.
sr. member
Activity: 377
Merit: 253
I can imagine lots of interesting stats which you could get out of block-chain itself for example:
1. How long average bitcoins stay at one address before moving to another
2. How this time is correlated with amount of transfer/amount at this address
3. Diagram showing how long coins already mined wear not transferred and how much of them there is.
(I'm curious for example what is amount of BTC not moving at all for last two years)
4. How average transaction volume depends of an hour (daytime)
5. How average number of transaction depends on that too.
6. Some way to determine and estimate number of unique users etc.


I think that today statistics needs to be visualized properly to make it usable like here for example:
http://www.ted.com/talks/lang/pl/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
Because avarage user won't spend his time to understand it without visualization.
donator
Activity: 2058
Merit: 1007
Poor impulse control.
There's lots of data available from bitcoin mining pools. I use some of it here: http://organofcorti.blogspot.com  - some of the posts there might give you some ideas.

Other ideas:
1. Analyse the network hashrate for cyclical trends (is a large increase in hashrate returning the same time every day / week ?) 2. Find correlations between network hashrate and known external phenomena  (heatwaves might lead many to turn GPUs off, countries with high electricity costs probably have fewer miners)
3. Determine if there is a relationship between percentage of blocks orphaned and block size.

HTH
hero member
Activity: 1008
Merit: 531
FreeMoney, thanks for the tip.  Satoshi's Dice seems to post every bet and the results, and there are a lot of bets.  If I were to do something on this I would be doing a project on betting.  I would need to try to find patterns in who is betting, how much, when, etc.... if wins/losses affect future bets.  I am not really sure that I want to do something on betting, but this is a contender.

Kiba, I have had bad luck with trying to get data (or as they like to call it, trade secrets) from companies before.  My new motto is: if it isn't publicly available, it isn't available.  I suppose I could try to offer some sort of data mining service to maximize their sales through market segmentation.  So maybe we would be conducting experiments to see how to squeeze more money out of people.  My local grocery store just started doing this and now each customer gets unique prices.

Cbeast, I think that what you are describing is... trying to determine prices using non-market methods...but I am not sure.  Statistics is about looking at a lump of data and making mathematically-sound conclusions.  It is more like science than engineering.  So creating a new system to do something wouldn't be acceptable as a project.  Evaluating systems given their historical performance data would though.

Thank you everyone for the ideas so far!
donator
Activity: 1736
Merit: 1014
Let's talk governance, lipstick, and pigs.
Howabout trying to figure out a statitical algorithm that could analyze user input of Bitcoin price (in various currencies) at the point of sale or trade? Let's just say for argument's sake that the major exchanges no longer have bank access or get hacked or shut down and individuals had to form their own exchanges, how would they price Bitcoin? Users would need to report their Bitcoin price to a hypothetical decentralized network or the Bitcoin Network itself that is (hypothetically) modified for this function. You would need to eliminate HFT and obvious price manipulation. Since there isn't such data available, you could use data from different exchanges and compare the ones with large orders and manipulation vs. smaller exchanges.

This would not be an easy test, but it may prove useful to Bitcoin.
legendary
Activity: 980
Merit: 1020
How about you collaborate with http://bitmit.net and other platform services and quantify the economy of bitcoin?
legendary
Activity: 1246
Merit: 1016
Strength in numbers
There is a buttload of data concerning satoshi dice.
hero member
Activity: 1008
Merit: 531
I need an interesting problem, but more importantly I need data.  Relevant citable sources are somewhat important.

Last year I did a couple of semester projects on various topics and I kept hitting the same problem.  I would pick something really interesting and then be unable to find data (or people who said that they would give it to me didn't come through), or I would pick something original and nothing had been written about it by researchers (and thus I wasn't able to cite anything).  So then I would switch topics and I was already behind the rest of the class.  I don't want to be in this situation so I want to make sure that I have the data before I commit to a topic.

First of all, about me:  I am a grad student in statistics.  I am not actually that good at statistics but I am good enough.  I know how to program but I am more familiar with old-timey stuff (C++, x86) and the theory (turing machines) than web programming.  So writing a webcrawler is probably out of the question, but once I get the data onto my harddrive I will have no trouble processing it.

So what kind of data can I get?  I know that the blockchain is public information but I am guessing that it isn't in a user-friendly format.  I know that exchanges have historical information, but I know that bitcoin prices fluctuate and are often influenced by major events (mtgox hacked, pirate40, news coverate).  Apparently silk road does 2 million in business per month?  Do they release this information or are people checking out the block chain?  What about miners?  What kind of information is available on them?

As for my topic, that would mostly depend on what kind of data I can get my hands on.  I am thinking that it might be interesting to try to categorize addresses by their transaction behavior.  Remember that this has to be on statistics, so I have to study this from a data-centric perspective.  Talking about the protocol or the economic model using logic isn't an option, I have to focus on data.


If you can help me I would appreciate it, and maybe interest a few people in bitcoin.  Thanks.
Jump to: