. | Bitcointalksearch.org

prophetx

legendary

Activity: 1666

Merit: 1010

he who has the gold makes the rules

Quote from: snackman on October 28, 2013, 10:18:03 AM

A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

if someone puts together this dataset i actually need (for my thesis):

total daily volume on the exchanges

average weighted daily price

OR

total daily trade volume in $

notme

legendary

Activity: 1904

Merit: 1002

Quote from: bucktotal on October 28, 2013, 05:47:23 PM

Quote from: notme on October 28, 2013, 05:42:26 PM

Quote from: bucktotal on October 28, 2013, 05:38:30 PM

Quote from: notme on October 28, 2013, 01:07:41 PM

Quote from: bucktotal on October 28, 2013, 12:08:21 PM

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly.

im listening...

Interpolation is synthesizing new points between existing data. This is not the same thing as as interweaving two data series based on a common time series. I'm not sure what the best term is for that, but it isn't interpolation.

http://en.wikipedia.org/wiki/Interpolation

bucktotal

full member

Activity: 232

Merit: 100

Quote from: notme on October 28, 2013, 05:42:26 PM

Quote from: bucktotal on October 28, 2013, 05:38:30 PM

Quote from: notme on October 28, 2013, 01:07:41 PM

Quote from: bucktotal on October 28, 2013, 12:08:21 PM

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly.

im listening...

notme

legendary

Activity: 1904

Merit: 1002

Quote from: bucktotal on October 28, 2013, 05:38:30 PM

Quote from: notme on October 28, 2013, 01:07:41 PM

Quote from: bucktotal on October 28, 2013, 12:08:21 PM

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

Okay, now I understand what you mean, which was the first suggestion in this thread. However, I don't think you are using the term "interpolate" correctly.

bucktotal

full member

Activity: 232

Merit: 100

Quote from: notme on October 28, 2013, 01:07:41 PM

Quote from: bucktotal on October 28, 2013, 12:08:21 PM

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

sorry if im probably missing something obv but if i had multiple data streams (say 3 exchanges) with different sample rates or missing values or whatever, i would interpolate each data stream to a common timeseries and then combine (average) the data. and would probably weight the inputs by volume

kjj

legendary

Activity: 1302

Merit: 1026

Imputed data isn't. Ditto for interpolated.

If your model involves anything resembling regression, cleaning the data in any way will cause your model to vastly overestimate the certainty and accuracy of the output.

This kind of thing is a pain to model. The spreads between the exchanges distort the price signal that you are looking for, but not totally.

You could use (sign,magnitude) of changes instead of absolute values, which will remove the pure-arbitrage signal from the price signal, but that distorts the price signal. (sign,log(magnitude)) might help a bit, but that's hard to say too.

Or, you can ignore the arbitrage signal, and just mash the prices together as they really are. But that will result in a price that is constantly too high by a factor related to the difficulty of moving stuff around.

You are going to hate this, but the most valid way to go is to model each exchange and the relationships between them. You'll have to give up on the notion of "the bitcoin price" and instead work with "the bitcoin price at locations X,Y and Z".

Oh, and I forgot that the arbitrage issues are not linear. Your model is going to get screwed every time the real world factors that cause the spreads change.

notme

legendary

Activity: 1904

Merit: 1002

Quote from: bucktotal on October 28, 2013, 12:08:21 PM

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

Interpolation can help with missing values, but I'm not sure what it has to do with combining multiple inputs streams.

bucktotal

full member

Activity: 232

Merit: 100

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

interpolate ?

thezerg

legendary

Activity: 1246

Merit: 1010

Quote from: chodpaba on October 25, 2013, 02:43:12 PM

I have long resisted the inclusion of data from exchanges other than Gox because I never really understood how to include samples of different lengths into a model. But now that the volumes of Gox, Bistamp, and Btcchina have been comparable for so long I am forced to include their trade data into a model.

It may be sufficient to simply truncate the trade data to the shortest sample, but I really hate to throw away data. As well, I expect there will occasionally be cases where there will be missing data ongoing.

I wonder, how have some of you dealt with multiple data streams, and how to match them up, either through truncation, imputation, or some other means.

You'll want to use the exciting science of reverse imputation. This complex mathematical technique uses the desired solution to inform the chosen imputation algorithm and data-source weighting coefficients. Grin

Come on, get with its GUARANTEED to make Bitcoin look awesome! We know this from seeing the CPI numbers.

snackman

sr. member

Activity: 260

Merit: 250

snack of all trades

A weighted (by volume) average of the prices from the major exchanges - Gox, Bitstamp, BTC China, btc-e, even EUR/GBP exchanges and localbitcoins - would be nice.

notme

legendary

Activity: 1904

Merit: 1002

Quote from: ?? on ??

Quote from: notme on October 25, 2013, 08:42:43 PM

http://bitcoincharts.com/about/markets-api/

Yes, thx. I have been using Bitcoin Charts data, it is very convenient. The problem I have run into is getting Forex data. But these folks may have solved that problem for me: http://www.quandl.com/help/api

Oh, duh. Interesting site.

balanghai

sr. member

Activity: 364

Merit: 253

If truncated, will it not be hard to include realtime data streams?

notme

legendary

Activity: 1904

Merit: 1002

http://bitcoincharts.com/about/markets-api/

snackman

sr. member

Activity: 260

Merit: 250

snack of all trades

I know R pretty well, if you need any assistance.

Chainsaw

hero member

Activity: 625

Merit: 501

x

You just made me google 'imputation' Cheesy

I think the approach taken would have a lot to do with what you're trying to do with the data.

Separate overlaid graphs serve some purposes (such as showing historical arbitrage trends) - combined results serve others (such as gross volume trends). Hell, there'd be value in overlaying individual market graphs with the combined values too.

Point being - I would think as long as your raw data from various markets are stored discretely, you can build up whatever logical data combinations you wish as a layer stacked on top, with views (mostly graphs here) representing whatever concepts you care to.

It's tough to be more specific without knowing the specific approach you're using to gather and interpret your data, or what kinds of outputs you're trying to create. You draw some pretty strong, solid (read: valuable) conclusions from the data you analyze. If you think I could be of more help with some more data, feel free to elaborate here or shoot me a PM. I don't think I'm as math-heavy as you, and typically do most number-crunching via C# or Excel.

notme

legendary

Activity: 1904

Merit: 1002

Match them up by timestamp, and then do your volume binning.

chodpaba

jr. member

Activity: 57

Merit: 10

.

Topic: . (Read 1827 times)