Author

Topic: . (Read 1827 times)

legendary
Activity: 1666
Merit: 1010
he who has the gold makes the rules
.
October 28, 2013, 08:28:05 PM
#17
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

if someone puts together this dataset i actually need (for my thesis):

total daily volume on the exchanges

average weighted daily price

OR

total daily trade volume in $
legendary
Activity: 1904
Merit: 1002
October 28, 2013, 08:22:27 PM
#16

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...




Interpolation is synthesizing new points between existing data.  This is not the same thing as as interweaving two data series based on a common time series.  I'm not sure what the best term is for that, but it isn't interpolation.

http://en.wikipedia.org/wiki/Interpolation
full member
Activity: 232
Merit: 100
October 28, 2013, 05:47:23 PM
#15

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.

im listening...


legendary
Activity: 1904
Merit: 1002
October 28, 2013, 05:42:26 PM
#14

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume




Okay, now I understand what you mean, which was the first suggestion in this thread.  However, I don't think you are using the term "interpolate" correctly.
full member
Activity: 232
Merit: 100
October 28, 2013, 05:38:30 PM
#13

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

kjj
legendary
Activity: 1302
Merit: 1026
October 28, 2013, 01:51:59 PM
#12
Imputed data isn't.  Ditto for interpolated.

If your model involves anything resembling regression, cleaning the data in any way will cause your model to vastly overestimate the certainty and accuracy of the output.

This kind of thing is a pain to model.  The spreads between the exchanges distort the price signal that you are looking for, but not totally.

You could use (sign,magnitude) of changes instead of absolute values, which will remove the pure-arbitrage signal from the price signal, but that distorts the price signal.  (sign,log(magnitude)) might help a bit, but that's hard to say too.

Or, you can ignore the arbitrage signal, and just mash the prices together as they really are.  But that will result in a price that is constantly too high by a factor related to the difficulty of moving stuff around.

You are going to hate this, but the most valid way to go is to model each exchange and the relationships between them.  You'll have to give up on the notion of "the bitcoin price" and instead work with "the bitcoin price at locations X,Y and Z".

Oh, and I forgot that the arbitrage issues are not linear.  Your model is going to get screwed every time the real world factors that cause the spreads change.
legendary
Activity: 1904
Merit: 1002
October 28, 2013, 01:07:41 PM
#11

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?



Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.
full member
Activity: 232
Merit: 100
October 28, 2013, 12:08:21 PM
#10

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.


interpolate ?

legendary
Activity: 1246
Merit: 1010
October 28, 2013, 10:30:22 AM
#9
I have long resisted the inclusion of data from exchanges other than Gox because I never really understood how to include samples of different lengths into a model. But now that the volumes of Gox, Bistamp, and Btcchina have been comparable for so long I am forced to include their trade data into a model.

It may be sufficient to simply truncate the trade data to the shortest sample, but I really hate to throw away data. As well, I expect there will occasionally be cases where there will be missing data ongoing.

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

You'll want to use the exciting science of reverse imputation.  This complex mathematical technique uses the desired solution to inform the chosen imputation algorithm and data-source weighting coefficients.   Grin  Come on, get with its GUARANTEED to make Bitcoin look awesome!  We know this from seeing the CPI numbers.

sr. member
Activity: 260
Merit: 250
snack of all trades
October 28, 2013, 10:18:03 AM
#8
A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.
legendary
Activity: 1904
Merit: 1002
October 25, 2013, 09:27:04 PM
#7

Yes, thx. I have been using Bitcoin Charts data, it is very convenient. The problem I have run into is getting Forex data. But these folks may have solved that problem for me: http://www.quandl.com/help/api

Oh, duh.  Interesting site.
sr. member
Activity: 364
Merit: 253
October 25, 2013, 08:45:33 PM
#6
If truncated, will it not be hard to include realtime data streams?
legendary
Activity: 1904
Merit: 1002
October 25, 2013, 08:42:43 PM
#5
sr. member
Activity: 260
Merit: 250
snack of all trades
October 25, 2013, 07:23:55 PM
#4
I know R pretty well, if you need any assistance.
hero member
Activity: 625
Merit: 501
x
October 25, 2013, 04:29:11 PM
#3
You just made me google 'imputation'  Cheesy

I think the approach taken would have a lot to do with what you're trying to do with the data.

Separate overlaid graphs serve some purposes (such as showing historical arbitrage trends) - combined results serve others (such as gross volume trends).  Hell, there'd be value in overlaying individual market graphs with the combined values too.  

Point being - I would think as long as your raw data from various markets are stored discretely, you can build up whatever logical data combinations you wish as a layer stacked on top, with views (mostly graphs here) representing whatever concepts you care to.

It's tough to be more specific without knowing the specific approach you're using to gather and interpret your data, or what kinds of outputs you're trying to create.  You draw some pretty strong, solid (read: valuable) conclusions from the data you analyze. If you think I could be of more help with some more data, feel free to elaborate here or shoot me a PM.  I don't think I'm as math-heavy as you, and typically do most number-crunching via C# or Excel.
legendary
Activity: 1904
Merit: 1002
October 25, 2013, 02:45:30 PM
#2
Match them up by timestamp, and then do your volume binning.
jr. member
Activity: 57
Merit: 10
October 25, 2013, 02:43:12 PM
#1
.
Jump to: