Author

Topic: [BEGINNER WORKSHOP]: bitcoincharts + postgres = cool sql queries (Read 5760 times)

hero member
Activity: 784
Merit: 1000
Wonder if you would be interested in estimating early adopters' reserves again? I have not finished downloading the blockchain yet. Sad

this thread could be interesting for you: https://bitcointalksearch.org/topic/distribution-of-bitcoin-wealth-by-owner-316297

Yeah, that's cool, thanks.
donator
Activity: 2772
Merit: 1019
Wonder if you would be interested in estimating early adopters' reserves again? I have not finished downloading the blockchain yet. Sad

this thread could be interesting for you: https://bitcointalksearch.org/topic/distribution-of-bitcoin-wealth-by-owner-316297
donator
Activity: 2772
Merit: 1019
api url has change to http://api.bitcoincharts.com/v1/csv/mtgoxUSD.csv, fixed op, ignore previous post.
donator
Activity: 2772
Merit: 1019
EDIT: ignore this post, see next post

tcatm has limited the trade api. max 20000 trades can be downloaded per request.

here's a bash script that updates local trades table

Code:
#!/bin/bash

db="mtgox"
user="postgres"
symbol="mtgoxUSD"
PSQL="psql -q -t -U ${user} ${db}"
DL="curl -s"

function sql() {
  rc=$(echo "$1" | $PSQL)
}

function extend() {
  sql "drop table if exists import;"
  sql "create table import (id serial, unixtime int, price numeric(32,10), volume numeric(32,8), type smallint);"
  
  len=$(( 4 * 60 * 60 ))
  while true; do
    sql "select max(unixtime) from trades;"
    start=$rc
    start_human=$(date -d "@$start")
    echo "--- $start_human ----------------------------------------------------------"
    end=$(( $start + $len ))
    echo "start ($start) + len ($len) = end ($end)"
    
    # download
    $DL "http://bitcoincharts.com/t/trades.csv?symbol=$symbol&start=$start&end=$end" > trades.csv
    end_file=$(head -n 1 trades.csv | cut -d , -f 1)
    start_file=$(tail -n 1 trades.csv | cut -d , -f 1)
    echo "start_file ($start_file) - start ($start) = $(( $start_file - $start ))"
    echo "end_file ($end_file) - end ($end) = $(( $end_file - $end ))"
    if [ $start_file -eq $start ]; then
      echo start times match, updating trades table
      # put into import table
      sql "delete from import;"
      sql "\copy import(unixtime,price,volume) from 'trades.csv' delimiters ',' csv;"
      sql "delete from trades where unixtime >= $start;"
      sql "insert into trades (unixtime, t, price, volume) select unixtime, TIMESTAMP 'epoch' + unixtime * INTERVAL '1 second', price, volume from import order by id desc;"
      if [ $start_file -eq $end_file ]; then
        echo "end detected, sleeping for 10 minutes,...."
        sleep 10m;
      fi
    else
      echo "start_file != start, exiting, check code"
      exit 1
    fi
  done
}

extend

it's quite fresh (use accordingly), currently in process of syncing my table. It uses 4-hour blocks.

dont know if it quits nicely yet when done Wink

hoping someone can use it.

EDIT: added sync-end-detection. will sleep 10 minutes, then continue to sync.

EDIT2:

hmm, it stops prematurely. I think something is weird with the bitcoincharts api:

neither this: http://bitcoincharts.com/t/trades.csv?symbol=mtgoxUSD&start=1365709116&end=1365710916
nor this: http://bitcoincharts.com/t/trades.csv?symbol=mtgoxUSD&end=1365710916

deliver any data (except one trade in the first case) while I think they should.

while this: http://bitcoincharts.com/t/trades.csv?symbol=mtgoxUSD&start=1365709116

delivers the most recent trade data (20000 most recent trades) ignoring "start" (at least that is consistent with the api docs).

the api docs: http://bitcoincharts.com/about/markets-api/ say I should only use the "end" parameter. That doesn't work, though (http://bitcoincharts.com/t/trades.csv?symbol=mtgoxUSD&end=1365710916 doesn't deliver)

ideas?
hero member
Activity: 784
Merit: 1000
Wonder if you would be interested in estimating early adopters' reserves again? I have not finished downloading the blockchain yet. Sad

Yeah, I'm generally interested. Just need a couple of quiet hours and the muse to kiss me. Been thinking about this.

this http://statistics.ecdsa.org/ would tell us the info if it was up-to-date. Maybe we should pester ThomasV (?).


Yeah, I have been using this, it's great but a bit outdated.
donator
Activity: 2772
Merit: 1019
Wonder if you would be interested in estimating early adopters' reserves again? I have not finished downloading the blockchain yet. Sad

Yeah, I'm generally interested. Just need a couple of quiet hours and the muse to kiss me. Been thinking about this.

this http://statistics.ecdsa.org/ would tell us the info if it was up-to-date. Maybe we should pester ThomasV (?).
hero member
Activity: 784
Merit: 1000
Wonder if you would be interested in estimating early adopters' reserves again? I have not finished downloading the blockchain yet. Sad
donator
Activity: 2772
Merit: 1019
the params in that datatype represent scale and precision, scale is the total number of digits (ie including decimals) precision is the number of decimals

with the current bitcoin spec 16,8 is enough to store the largest theoretical possible transaction amount of 21million, and the smallest value of one satoshi.

The datatype scale and precision only affect the stored value, aggregate functions operate correctly disregarding scale and precision of the datatype (within the limits of the architecture).

eg the max value that can be stored in the field is 99999999.99999999,if you sum this with 0.00000001 then the result is 100000000 (ie no overflow occurs)

Good to know. Thanks. Changed the op to use 16,8
legendary
Activity: 2576
Merit: 1087
the params in that datatype represent scale and precision, scale is the total number of digits (ie including decimals) precision is the number of decimals

with the current bitcoin spec 16,8 is enough to store the largest theoretical possible transaction amount of 21million, and the smallest value of one satoshi.

The datatype scale and precision only affect the stored value, aggregate functions operate correctly disregarding scale and precision of the datatype (within the limits of the architecture).

eg the max value that can be stored in the field is 99999999.99999999,if you sum this with 0.00000001 then the result is 100000000 (ie no overflow occurs)
donator
Activity: 2772
Merit: 1019
EDIT4: anyone knowledgable with postgres have a suggestion what datatype to best use for the monetary values?

NUMERIC(16,8) should do it




In case postgres uses that datatype for aggregate functions, 8 places before the decimal point will be enough.
legendary
Activity: 2576
Merit: 1087
EDIT4: anyone knowledgable with postgres have a suggestion what datatype to best use for the monetary values?

NUMERIC(16,8) should do it


legendary
Activity: 1176
Merit: 1010
Borsche

I usually only ever do this like once a month.

If someone makes a script or something, please share. I'd do it for money because my own need for it is low Wink

The bitcoinchart.com api takes a start-time (unix time), that should be passed as "select max(unixtime) from trades;". I'm not sure how the \copy command behaves (wether or not it overwrites data or how that can be configured)



Ok so thats easy then, as "copy from" would append data to the table without touching existing rows.
donator
Activity: 2772
Merit: 1019
Molecular what are the programs you used for trading? Also give me your donation address please?

used? you mean traidor.py?

1JANa7gQ2VE7Wkv3o4917ECy8NWYtUx5F5
hero member
Activity: 784
Merit: 1000
Thanks, useful. How do you fetch updates, or you recreate the full table every time?

I usually only ever do this like once a month.

If someone makes a script or something, please share. I'd do it for money because my own need for it is low Wink

The bitcoinchart.com api takes a start-time (unix time), that should be passed as "select max(unixtime) from trades;". I'm not sure how the \copy command behaves (wether or not it overwrites data or how that can be configured)



How is the size of the CSV file? Also I don't recommend people updating their databases too frequently, bitcoinchart is already under quite a bit of load and went 503 from time to time.

142 MB

tcatm recently told me at ccc it wasn't a problem for the server. I'm all for figuring out a way to do updates, though.


Molecular what are the programs you used for trading? Also give me your donation address please?
donator
Activity: 2772
Merit: 1019
Thanks, useful. How do you fetch updates, or you recreate the full table every time?

I usually only ever do this like once a month.

If someone makes a script or something, please share. I'd do it for money because my own need for it is low Wink

The bitcoinchart.com api takes a start-time (unix time), that should be passed as "select max(unixtime) from trades;". I'm not sure how the \copy command behaves (wether or not it overwrites data or how that can be configured)



How is the size of the CSV file? Also I don't recommend people updating their databases too frequently, bitcoinchart is already under quite a bit of load and went 503 from time to time.

142 MB

tcatm recently told me at ccc it wasn't a problem for the server. I'm all for figuring out a way to do updates, though.
hero member
Activity: 784
Merit: 1000
Thanks, useful. How do you fetch updates, or you recreate the full table every time?

I usually only ever do this like once a month.

If someone makes a script or something, please share. I'd do it for money because my own need for it is low Wink

The bitcoinchart.com api takes a start-time (unix time), that should be passed as "select max(unixtime) from trades;". I'm not sure how the \copy command behaves (wether or not it overwrites data or how that can be configured)



How is the size of the CSV file? Also I don't recommend people updating their databases too frequently, bitcoinchart is already under quite a bit of load and went 503 from time to time.
donator
Activity: 2772
Merit: 1019
Thanks, useful. How do you fetch updates, or you recreate the full table every time?

I usually only ever do this like once a month.

If someone makes a script or something, please share. I'd do it for money because my own need for it is low Wink

The bitcoinchart.com api takes a start-time (unix time), that should be passed as "select max(unixtime) from trades;". I'm not sure how the \copy command behaves (wether or not it overwrites data or how that can be configured)

legendary
Activity: 1176
Merit: 1010
Borsche
Thanks, useful. How do you fetch updates, or you recreate the full table every time?
hero member
Activity: 784
Merit: 1000
Sorry, I think there is something wrong with the 2010 volume in USD...
Hmm? You do realize BTC was worth nothing at that time?

Never below $0.01 after Gox opened, $0.00001 something is outright impossible

yes, you're correct. something wrong... might also affect the other years... checking.

EDIT: oh goddamnit, I accidentally used an old script for this. The datatypes are wrong, everything was imported as integer with 0 decimal places.

I'm sorry. I will fix this... will take a while.

EDIT2 corrected OP. you can fix things by doing this:

Code:
mtgox=# drop table trades;
DROP TABLE
mtgox=# create table trades (id serial, unixtime int, t timestamp, price numeric(32,10), volume numeric(32,8));
NOTICE:  CREATE TABLE will create implicit sequence "trades_id_seq" for serial column "trades.id"
CREATE TABLE
mtgox=# \copy trades(unixtime,price,volume) from 'trades.csv' delimiters ',' csv;
mtgox=# update trades set t = TIMESTAMP 'epoch' + unixtime * INTERVAL '1 second';
UPDATE 3563178

EDIT3: fixed my other 2 posts containing queries. The values for the years >=2011 changed only "slightly".

EDIT4: anyone knowledgable with postgres have a suggestion what datatype to best use for the monetary values?

Thanks, I guessed this would be a floating point problem. Smiley
donator
Activity: 2772
Merit: 1019
Sorry, I think there is something wrong with the 2010 volume in USD...
Hmm? You do realize BTC was worth nothing at that time?

Never below $0.01 after Gox opened, $0.00001 something is outright impossible

yes, you're correct. something wrong... might also affect the other years... checking.

EDIT: oh goddamnit, I accidentally used an old script for this. The datatypes are wrong, everything was imported as integer with 0 decimal places.

I'm sorry. I will fix this... will take a while.

EDIT2 corrected OP. you can fix things by doing this:

Code:
mtgox=# drop table trades;
DROP TABLE
mtgox=# create table trades (id serial, unixtime int, t timestamp, price numeric(32,10), volume numeric(32,8));
NOTICE:  CREATE TABLE will create implicit sequence "trades_id_seq" for serial column "trades.id"
CREATE TABLE
mtgox=# \copy trades(unixtime,price,volume) from 'trades.csv' delimiters ',' csv;
mtgox=# update trades set t = TIMESTAMP 'epoch' + unixtime * INTERVAL '1 second';
UPDATE 3563178

EDIT3: fixed my other 2 posts containing queries. The values for the years >=2011 changed only "slightly".

EDIT4: anyone knowledgable with postgres have a suggestion what datatype to best use for the monetary values?
hero member
Activity: 728
Merit: 500
Due to popular demand, here's a little intro on how to get mtGox trade data (all trades) into a sql database and run some queries on the data:


Its not all trades. Its close because the time interval is 1 second. To get all trades you need to get the data from:

t=0
https://mtgox.com/api/1/BTCUSD/trades?since=t

which will give you this:

https://mtgox.com/api/1/BTCUSD/trades?since=0


find the largest timestamp ( "date":1280801871 in this case) in that data, then change the t variable and make another query like:

t=max(date)+1
https://mtgox.com/api/1/BTCUSD/trades?since=t

which gives you:

https://mtgox.com/api/1/BTCUSD/trades?since=1280801872


then repeat

hero member
Activity: 784
Merit: 1000
Sorry, I think there is something wrong with the 2010 volume in USD...
Hmm? You do realize BTC was worth nothing at that time?

Never below $0.01 after Gox opened, $0.00001 something is outright impossible
vip
Activity: 1316
Merit: 1043
👻
Sorry, I think there is something wrong with the 2010 volume in USD...
Hmm? You do realize BTC was worth nothing at that time?
hero member
Activity: 784
Merit: 1000
Sorry, I think there is something wrong with the 2010 volume in USD...
hero member
Activity: 784
Merit: 1000
Gox should made no less than  $1 million last year, it seems.
donator
Activity: 2772
Merit: 1019
Code:
mtgox=# select count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd from trades;
   cnt   | vol_btc  |  vol_usd
---------+----------+-----------
 3563178 | 39030079 | 292616604
(1 row)

Weighted average price = 292616604/39030079 = $7.50/BTC

good idea, vwap by year:

Code:
mtgox=# select extract(year from t) as y, count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd, sum(volume*price) / sum(volume) as vwap from trades group by y order by y;
  y   |   cnt   |      vol_btc      |           vol_usd            |          vwap
------+---------+-------------------+------------------------------+------------------------
 2010 |    8757 |  2666237.77200000 |    433663.023527280000000000 | 0.16264979368362200219
 2011 | 1279900 | 13525744.44429303 |  86008826.163108379280100000 |   6.358897768425487151
 2012 | 1970995 | 20228998.28350803 | 152645161.420049849453300000 |   7.545858637226536507
 2013 |  303526 |  2692668.75713098 |  54049100.207568686632600000 |  20.072688133076431703
legendary
Activity: 1904
Merit: 1002
Code:
mtgox=# select count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd from trades;
   cnt   | vol_btc  |  vol_usd
---------+----------+-----------
 3563178 | 39030079 | 292616604
(1 row)

Weighted average price = 292616604/39030079 = $7.50/BTC
hero member
Activity: 648
Merit: 500
which leads me to the first "cool query":

Code:
mtgox=# select extract(year from t) as y, count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd from trades group by y order by y;
  y   |   cnt   | vol_btc  |  vol_usd
------+---------+----------+-----------
 2010 |    8757 |  2666295 |        45
 2011 | 1279900 | 13493417 |  85888519
 2012 | 1970995 | 20187125 | 152856841
 2013 |  303526 |  2683242 |  53871199

result: 2012-volume (in USD) was almost twice the 2011-volume Wink


and according to the current trend, less than 2013 volume in btc, but more than twice the volume in USD
donator
Activity: 2772
Merit: 1019
which leads me to the first "cool query":

Code:
mtgox=# select extract(year from t) as y, count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd from trades group by y order by y;
  y   |   cnt   |      vol_btc      |           vol_usd
------+---------+-------------------+------------------------------
 2010 |    8757 |  2666237.77200000 |    433663.023527280000000000
 2011 | 1279900 | 13525744.44429303 |  86008826.163108379280100000
 2012 | 1970995 | 20228998.28350803 | 152645161.420049849453300000
 2013 |  303526 |  2692668.75713098 |  54049100.207568686632600000

result: 2012-volume (in USD) was almost twice the 2011-volume Wink
donator
Activity: 2772
Merit: 1019
oh, forgot to update the timestamp:

Code:
mtgox=# update trades set t = TIMESTAMP 'epoch' + unixtime * INTERVAL '1 second';

will fix in OP
donator
Activity: 2772
Merit: 1019
thanks  Grin

i did not know bitcoincharts offers all trade data in csv, great service.

took me a while to find that, too. Actually, I think I talked to tcatm and he told me.
sr. member
Activity: 316
Merit: 250
thanks  Grin

i did not know bitcoincharts offers all trade data in csv, great service.
donator
Activity: 2772
Merit: 1019
Due to popular demand, here's a little intro on how to get mtGox trade data (all trades) into a sql database and run some queries on the data:

Depending on your system you might need a lot of stuff I wont list here (like postgres database server, wget/curl, ...). You'll just run into it when you need it.

Firstly we need to wget or curl the csv from bitcoincharts.com ("#>" means you should enter this in a shell):

Code:
#> curl "http://api.bitcoincharts.com/v1/csv/mtgoxUSD.csv" > trades.csv

This will write a file "trades.csv" containing all mtgox trades.

now create a database ("postgres=#" means enter this in psql shell):

Code:
#> psql -U postgres
postgres=# create database mtgox;
CREATE DATABASE

postgres=# \connect mtgox;
You are now connected to database "mtgox" as user "postgres".

mtgox=# \q
#>

now you have created a database called "mtgox" and you can open a psql shell to it directly next time:

we'll create a table for the trades and import the data:

Code:
#> psql -U postgres mtgox
mtgox=# create table trades (id serial, unixtime int, t timestamp, price numeric(16,8), volume numeric(16,8));
NOTICE:  CREATE TABLE will create implicit sequence "trades_id_seq" for serial column "trades.id"
CREATE TABLE

mtgox=# \copy trades(unixtime,price,volume) from 'trades.csv' delimiters ',' csv;

mtgox=# update trades set t = TIMESTAMP 'epoch' + unixtime * INTERVAL '1 second';
UPDATE 3563178

mtgox=#

done. now you can run queries:

Code:
mtgox=# select count(*) as cnt, sum(volume) as vol_btc, sum(volume*price) as vol_usd from trades;
   cnt   |      vol_btc      |           vol_usd
---------+-------------------+------------------------------
 3563178 | 39113649.25693204 | 293136750.814254195366000000   cnt   | vol_btc  |  vol_usd

have fun and please, paste your queries here if you have cool ones.
Jump to: