i downloaded the 2nd file (899,2 Mib, but i on my computer it shows with 1,07 GiB) and exportet it to an .csv file - that worked. Great! Today i was playing a little with the data, reading your manual (very well written kudos) but many things are still very unclear to me:
What really confuses me is the whole complex surrounding the timestamps. When i convert the file to an .csv for example the last entry looks like this when i open the .csv in notepad:
2010-07-17,
23:09:17,0.049510,2000000000
when i import the same .csv into Excel and i change the formating of the cells to hh:mm:ss:ms (could not find smaller units than milliseconds in Excel) than excel shows me: this as timestamp:
23:09:17:917 for me several things are not clear for me with this:
1. How can Excel display something which is not even in the .csv file contained. Did Excel just "make this up"?
2. In the thread there stands that the data is in microseconds accuracy, but when i look at the file in editor it seems to be seconds. For example the last 3 lines of the file are:
2013-12-17,
15:47:30,715.700000,1210000
2013-12-17,
15:47:30,715.700000,1000000
2013-12-17,
15:47:30,715.700000,780000
3. Not so important but maybe someone knows here/ has experience with Excel displaying timestamps: As far as i found Excel (i use 2010) is not able to display a better resolution than milliseconds. But some of the timestamps in my Sheet have 4 digits after the seconds e. g. 18.07.2013 17:48:56:4856
how is that even possible?
It is also possible that there are some oddities around May 23rd 2013 and around today December 17th, purely because I collected the data from 3 sources, and those were the boundaries - I'm fairly sure that there shouldn't be a problem, but if you wan't to be really safe then you can avoid those two days.
If i understand you right this file is a combination of 3 datasources. The 3 Datasources are not mashed up on a day by day fashion but more like this:
from 07/17/2010 : Datasource1 (Mark Karpeles?)
05/23/2013: Datasource2 (API from mtgox? Bitcoincharts?)
12/17/2013: Datasource3?
so should i cut out just May 23rd and Dec 17. or +- some days before and after?There are a couple oddities with the Money_Trade__ values, but I don't think this will be particularly relevant to you. Otherwise, I think the data is relatively accurate.
before i posted here i downloaded the files from Google BigQuery. I noticed then that there have been quite large jumps in the Trade ID's. Are you refering to that or could it be that with the change of the "primary key" of the database after Trade id 218868 things could have been messed up? No or? I mean the closed the exchange for 6 days back then, to set everything up right..