Pages:
Author

Topic: Discussion for MtGox trade data downloader - page 3. (Read 14422 times)

sr. member
Activity: 246
Merit: 250
December 16, 2013, 06:25:35 PM
#74
Hi all,

VERY interesting Thread here since i wanted to have a long time some reliable tickdata for bitcoin. I must say that i am absolutely no programmer - so i can not really worship all the great work Nitrous, Poikkeus and others have done, but it sounds really good and somehow i will get it running - hopefully Grin

Could someone help me out to understand this better:

I saw that mt-gox at Google Bigquery consists of 2 Tables: trades and trades_raw. What is the difference between them?
Is it possible to convert the dump somehow in an .csv file?
The tool from poikkeus has one file history.dat can i download this and - same question - get this somehow into a .csv file?

Greetings to you all...

 

Hi BNO,

The 2 tables relate to a discussion I had with MagicalTux (CEO of MtGox). My tool only really works if the bigquery data is presorted, but that can't be guaranteed if data is repeatedly being inserted into the table due to the way BQ works. I asked MagicalTux if he could implement a system of two tables, where one is updated, and the other is a sorted version of the updated table, hence the two tables. Since no updates have occurred since May though, it doesn't make a difference currently. Long story short, there is no difference between the tables Tongue.

If you download the data using my tool, then there are export options which can convert into a few different formats, CSV included (you can also choose which data to use and even do candles). Obviously though, the tool can only get data up to May (and has been a bit buggy lately). As far as I know, there aren't any other tools that can give you bulk csv data, although you might want to look at http://wizb.it which may at some point provide a working data service. I believe there are some charting websites out there though which can give you csv data, though with much lower resolution.

Please note that my tool isn't compatible with the full dump I just posted, and so this cannot be used to generate CSV. I might reformat it into a compatible version though seeing as there's quite a lot of demand for this data.

Sorry that your options seem a little sparse currently, if I had more time I'd love to develop a proper tool or set up a web service to make it easy to get data for any exchange, I know there's demand for it. I think I might make this a side project of mine, though it'll probably be quite slow.
BNO
full member
Activity: 160
Merit: 103
December 16, 2013, 04:33:55 PM
#73
Hi all,

VERY interesting Thread here since i wanted to have a long time some reliable tickdata for bitcoin. I must say that i am absolutely no programmer - so i can not really worship all the great work Nitrous, Poikkeus and others have done, but it sounds really good and somehow i will get it running - hopefully Grin

Could someone help me out to understand this better:

I saw that mt-gox at Google Bigquery consists of 2 Tables: trades and trades_raw. What is the difference between them?
Is it possible to convert the dump somehow in an .csv file?
The tool from poikkeus has one file history.dat can i download this and - same question - get this somehow into a .csv file?

Greetings to you all...

 
member
Activity: 73
Merit: 10
December 15, 2013, 08:12:21 AM
#72
I've made trade data downloader with Java for my upcoming trading bot. I could modify it to only download trading data from the date you want and make CSV file. The files could be updated by running it again.

Or if you are a java coder you can do it yourself. https://github.com/jussirantala/goxbot
sr. member
Activity: 246
Merit: 250
December 13, 2013, 05:38:47 AM
#71
Big thanks, Nitrous! Smiley

No problem Smiley

Also, for anyone who does want to maintain their own up to date local database and has managed to get my original app to work, siergiej has created a node.js script which you may find useful -- https://gist.github.com/siergiej/5971230 -- it should update the database in the correct format, but please note that it will probably not be compatible with the dump I just posted.
newbie
Activity: 4
Merit: 0
December 13, 2013, 04:42:32 AM
#70
Big thanks, Nitrous! Smiley
sr. member
Activity: 246
Merit: 250
December 12, 2013, 06:23:55 PM
#69
Ok, it seems that the tool is now breaking for some unknown reason (changes to the bigquery API?), and I haven't updated it in a while so I can't be sure exactly why. Seeing as my tool hasn't changed, and it seems to be breaking around 1024000 rows for those who are having problems, I would guess it's a problem on Google's end, though more likely their API just changed and I should update my tool.

In addition, I'm sure many of you are rather frustrated that my app doesn't let you download data more up to date than May. So, rather than spend time I don't have updating the app, I'm going to release a database which is up to date to today, Thu, 12 Dec 2013 22:13:40 GMT. It is in a slightly different format, but I've tried to keep the columns as similar to the app as possible:

Money_Trade__intThe trade ID. As you may know, this is a sequential integer up to 218868, whereupon it is a unix micro timestamp.
DateintUnix timestamp (1s resolution).
Primarystr'1' for primary (recorded in original currency), '0' for non-primary.
ItemstrCurrently, only contains 'BTC'
Currency__str'USD', 'EUR', 'GBP', etc.
Typestr'bid' or 'ask'.
Propertiesstre.g. 'market'. May be a comma separated list if appropriate, e.g. 'limit,mixed_currency'
AmountintQuantity traded in terms of Item.
PriceintPrice at which the trade occurred, in terms of Currency__.
Bid_User_Rest_App__strID specific to the authorised application used to perform the trade on the bid side, if applicable.
Ask_User_Rest_App__strID specific to the authorised application used to perform the trade on the ask side, if applicable.

The most significant difference is that Date is a unix timestamp rather than in ISO format.

Here's the link: https://docs.google.com/file/d/0B3hexlKVFpMpYmdoUUhXckRrT2s

The table is dump, and it is indexed on Money_Trade__ asc, [Primary] desc (index is named ix).

Sorry if you're annoyed that you spent time and bandwidth downloading the data using my app, however it was my belief that MtGox would complete their side of the service so that my app would then be used to maintain and update a local copy of the database in any format. Then you wouldn't have had to rely on a third party maintaining an up-to-date copy, giving you complete control over your backtesting.

I don't think this database will be compatible with the app unfortunately, so you won't be able to export it to different formats, and I don't plan on keeping it up to date, but at least it's better than the old May database and works. Also, sorry that I've been rather less active as of late, I've got some pretty big commitments currently and, as such, I unfortunately have far less time to do stuff like this Sad
newbie
Activity: 4
Merit: 0
December 12, 2013, 09:22:21 AM
#68
Hi!

Thanks for a great tool. But I can't make it work properly. I started it several times, and always I get the same error, and my dump always the same size - 107 mb. I'm not tried to continue updating because it looks like it's corrupted.
Is there any way to fix it?
http://s12.postimg.org/opvn4mmp9/Screen_Shot_2013_12_12_at_7_32_48_AM.pnghttp://s10.postimg.org/8q04iicmh/Screen_Shot_2013_12_12_at_7_32_56_AM.pnghttp://s16.postimg.org/xmfafdztx/Screen_Shot_2013_12_12_at_5_56_13_PM.png
sr. member
Activity: 246
Merit: 250
December 09, 2013, 09:26:02 AM
#67
Hi,

I'm getting the following error when trying to download after being partially successful

Traceback (most recent call last):
  File "app.py", line 118, in thread_bootstrap
    self.thread_init()
  File "./bq/mtgox.py", line 153, in run
    raise Exception, "Insertion error: %d =/= %d" % (pos,size)
Exception: Insertion error: 1040000 =/= 1024000

This means that whilst downloading, something bad happened either on Google's end, your end, or with your internet connection, and a set of trades wasn't properly inserted into the database, so you're missing 16000 entries. You have a few options:

1) I would recommend starting again if it's not too much trouble for your internet connection, as your data could be corrupted somehow.
2) Alternatively, you could resume the download as it looks like the last 16k entries just somehow weren't inserted, and it may not be corrupted at all.
3) Or to be safe, you could manually remove the last 24k rows to try to remove any potential corruption and resume from 1 million.

1 is the safest, but you'll probably be fine with any of the options.
member
Activity: 77
Merit: 10
December 08, 2013, 08:50:07 PM
#66
Hi,

I'm getting the following error when trying to download after being partially successful

Traceback (most recent call last):
  File "app.py", line 118, in thread_bootstrap
    self.thread_init()
  File "./bq/mtgox.py", line 153, in run
    raise Exception, "Insertion error: %d =/= %d" % (pos,size)
Exception: Insertion error: 1040000 =/= 1024000
sr. member
Activity: 246
Merit: 250
got an error during download


An error occurred that shouldn't have happened. Please report this on the tool's forum thread at bitcointalk (you should be taken there when you click ok).

Traceback (most recent call last):
  File "/Applications/MtGox-Trades-Tool.app/Contents/Resources/app.py", line 97, in thread_bootstrap
  File "mtgox.pyc", line 120, in run
  File "bq.pyc", line 100, in gen2
  File "oauth2client/util.pyc", line 128, in positional_wrapper
  File "apiclient/http.pyc", line 676, in execute
  File "oauth2client/util.pyc", line 128, in positional_wrapper
  File "oauth2client/client.pyc", line 494, in new_request
  File "oauth2client/client.pyc", line 663, in _refresh
  File "oauth2client/client.pyc", line 682, in _do_refresh_request
  File "httplib2/__init__.pyc", line 1570, in request
  File "httplib2/__init__.pyc", line 1317, in _request
  File "httplib2/__init__.pyc", line 1286, in _conn_request
  File "/usr/lib/python2.7/httplib.py", line 1027, in getresponse
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
  File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
BadStatusLine: ''


Hmm, that sounds like a temporary problem with either Google or your internet connection.

For most errors, you should be able to start the program again. It should pick up where it left off and continue downloading.
newbie
Activity: 9
Merit: 0
got an error during download


An error occurred that shouldn't have happened. Please report this on the tool's forum thread at bitcointalk (you should be taken there when you click ok).

Traceback (most recent call last):
  File "/Applications/MtGox-Trades-Tool.app/Contents/Resources/app.py", line 97, in thread_bootstrap
  File "mtgox.pyc", line 120, in run
  File "bq.pyc", line 100, in gen2
  File "oauth2client/util.pyc", line 128, in positional_wrapper
  File "apiclient/http.pyc", line 676, in execute
  File "oauth2client/util.pyc", line 128, in positional_wrapper
  File "oauth2client/client.pyc", line 494, in new_request
  File "oauth2client/client.pyc", line 663, in _refresh
  File "oauth2client/client.pyc", line 682, in _do_refresh_request
  File "httplib2/__init__.pyc", line 1570, in request
  File "httplib2/__init__.pyc", line 1317, in _request
  File "httplib2/__init__.pyc", line 1286, in _conn_request
  File "/usr/lib/python2.7/httplib.py", line 1027, in getresponse
  File "/usr/lib/python2.7/httplib.py", line 407, in begin
  File "/usr/lib/python2.7/httplib.py", line 371, in _read_status
BadStatusLine: ''
sr. member
Activity: 246
Merit: 250
The (very good and appreciated) "Unofficial Documentation for MtGox's HTTP API v2" has some glitches in the description for the API function money/trades/fetch :

(see https://bitbucket.org/nitrous/mtgox-api#markdown-header-moneytradesfetch)

there are not really "gaps" (except the one big gap for USD when they switch from a counter to microtimestamp)

the mtGox API function money/trades/fetch simply works as follows :

Yeah, I think in a previous commit to the docs I actually did say that, but I wanted to emphasise (a) that this was due to gaps in trades, ie no trades happened in a 24h window, and (b) that people should not try to circumvent this due to server load. Of course, for small date ranges (or for your centralised database project), that's fine, but I don't want to openly advocate such usage. After all, it's the reason MagicalTux introduced the bq database in the first place!

the python script that 100x posted before will fail on some rare traded currencypairs for that reason.

Yeah I mentioned that a few posts ago. In actual fact, they don't even have to be particularly rare. I know GBP has some very long gaps near the beginning. Obviously for rare currencies this will be even more of a problem.
newbie
Activity: 13
Merit: 0
I am thinking to provide the tradedata by another method to the users.

I fetched all the data for myself, so if a considerable amount of users wants to use that, I will provide the tradedata in a different way (not rely on mtgox updating the data, not rely on big query).

please check out following post : https://bitcointalksearch.org/topic/m.3017272 and let me know.

yours sincerely

bitranox
newbie
Activity: 13
Merit: 0
Quote
For other less popular currencies, however, I have found that there are many gaps, some quite long, and so without regular bq updates this is liable to break a script. Of course if you have an up to date database and can run your script regularly, at least once per day, this will not be a problem as you will catch all trades as they come in. Alternatively, you could manually advance the last_id if you haven't caught up to the current time yet, but you need to confirm the limit is indeed 86400s first.

The (very good and appreciated) "Unofficial Documentation for MtGox's HTTP API v2" has some glitches in the description for the API function money/trades/fetch :

(see https://bitbucket.org/nitrous/mtgox-api#markdown-header-moneytradesfetch)

there are not really "gaps" (except the one big gap for USD when they switch from a counter to microtimestamp)

the mtGox API function money/trades/fetch simply works as follows :

if You pass a microtimestamp as parameter :
- it will return maximum 1000 records of a given currencypair
- it will return maximum the trades that happend 86400 seconds after the given microtimestamp
meaning if there are less then 1000 trades within 86400 seconds, You will receive just the trades that happened 86400 seconds after the given microtimestamp.

if You dont pass a microtimestamp parameter :
- it will receive the tradedate for a given currencypair within the last 86400 seconds (24 hours) from now. In that case the "1000 record" limit does not apply.

so - if no trades happened within the timespan of 86400 seconds for a given currencypair, You will not get back any datarows - what will only happen for some rarely traded currencypairs
in that case You will need to add 86400 seconds to the timestamp (respectively 86400 * 1E6 microseconds) and query again for the next day.

As nitrox pointed out in some other posts, I would not recommend that every user should download the full trade history from mtgox because of serverload issues, etc ...

however - since the bq database is not updated until now, I will provide the full dumps until up-to-date in the next couple of days. (what will be updated daily or hourly - I wait for user requests on that issue)

the python script that 100x posted before will fail on some rare traded currencypairs for that reason.
sr. member
Activity: 246
Merit: 250
I haven't posted an update about this yet because I haven't got a definitive update from MagicalTux that bq is even still happening Sad At the moment, with US legal issues, litecoin, etc, MagicalTux isn't really focusing on bq at all, but hopefully he will finish it eventually. I don't anticipate this to be anytime soon though, and he probably needs to be reminded about bq occasionally (and shown that there is a demand for it). Unfortunately, I don't have much time to work on this at the moment, and I'm going to university in a few weeks, so if anyone still wants regular data then perhaps someone with python experience might consider forking my project and adding in 100x's script? My idea was to use two tables in the database - one for BQ data, the other for API data. Then you could create a view into a union of both these tables, and delete API data as BQ replaces it. Remember that the API doesn't provide all fields that BQ does, and you have to access each currency individually with the API. If you don't need these other fields though, and only need a few select currencies, you could then use this hybrid system to do live exports as well (as Loozik requested).
sr. member
Activity: 246
Merit: 250
I ended up putting my trade analysis work on hold for a while, but I wanted to stop by and mention that I was able to get the remainder of the data using the API and add it to my db just fine. Thanks again for all your help.
No problem. Thanks, I'm sure many people here will find that useful until MtGox gets around to finishing the bq database. For anyone still interested in this tool, I'm going to post an update on the situation immediately after this post.

I was curious about the gaps in the API data that you mentioned, what type of gaps exactly are you talking about? I did some extremely basic verification and recreated a few candles for a few random days, and I got the same result as bitcoincharts.com (after filtering to USD trades properly).
For USD I wouldn't expect any gaps. Essentially, as well as the 1000 trade limit in the API, there are also other limits such as a time limit, something like 86400 seconds. What this means is that if no trades happen in a 24-hour period, then your script will break down, as no data will be returned and you won't ever update your last_id. The most well known gap is for USD across the tid transition (see my documentation on this here) between tids 218868 and 1309108565842636. I believe this is the only USD gap, and since this was back in 2011 it is covered by the bq database and is not a problem.

For other less popular currencies, however, I have found that there are many gaps, some quite long, and so without regular bq updates this is liable to break a script. Of course if you have an up to date database and can run your script regularly, at least once per day, this will not be a problem as you will catch all trades as they come in. Alternatively, you could manually advance the last_id if you haven't caught up to the current time yet, but you need to confirm the limit is indeed 86400s first.
jr. member
Activity: 30
Merit: 501
Seek the truth
hero member
Activity: 623
Merit: 500
I tried it on Ubuntu, after authentication I get "Unexpected Exception" with a long message (cant be copied from the window) at the end of which it says: IOError: [Errno 13] Permission denied: '/home/username/.config/mtgox-trades-tool/creds.dat

not sure what to make of that Undecided

Edit: ok, kinda linux n00b Grin. got it working now..
sr. member
Activity: 246
Merit: 250
sr. member
Activity: 467
Merit: 250

Trying, but unsuccessful so far.

windows version (on win7-x64) crashes about 50% through.
linux version (on debian7) crashes after entering auth code with:

Code:
No handlers could be found for logger "oauth2client.util"
Traceback (most recent call last):
  File "app.py", line 81, in __call__
    return apply(self.func, args)
  File "/usr/src/bq/bq.py", line 169, in complete
    credential = flow.step2_exchange(code, http)
  File "/usr/local/lib/python2.7/dist-packages/google_api_python_client-1.1-py2.7.egg/oauth2client/util.py", line 128, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/google_api_python_client-1.1-py2.7.egg/oauth2client/client.py", line 1283, in step2_exchange
    headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2-0.8-py2.7.egg/httplib2/__init__.py", line 1570, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2-0.8-py2.7.egg/httplib2/__init__.py", line 1317, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2-0.8-py2.7.egg/httplib2/__init__.py", line 1252, in _conn_request
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/httplib2-0.8-py2.7.egg/httplib2/__init__.py", line 1021, in connect
    self.disable_ssl_certificate_validation, self.ca_certs)
  File "/usr/local/lib/python2.7/dist-packages/httplib2-0.8-py2.7.egg/httplib2/__init__.py", line 80, in _ssl_wrap_socket
    cert_reqs=cert_reqs, ca_certs=ca_certs)
  File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
    ciphers=ciphers)
  File "/usr/lib/python2.7/ssl.py", line 141, in __init__
    ciphers)
SSLError: [Errno 185090050] _ssl.c:340: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib


Ideas?
Pages:
Jump to: