Pages:
Author

Topic: zipline / Quantopian - backtesting / trading framework - page 2. (Read 26867 times)

newbie
Activity: 9
Merit: 0
but there is also an issue with Pandas

https://github.com/pydata/pandas/issues/783

see also this notebook

http://nbviewer.ipython.org/4982660/

it seems to be a clean way to draw candlesticks

Stumbled on that issue before.  It looks like the user who suggested a solution, even though he didn't submit a patch, has put up some of his personal charting tools:

https://github.com/dalejung/trtools

Haven't tried them yet, seems like it needs a few dependencies like tables (and consequently hdf5).
member
Activity: 105
Merit: 10
If you want to draw candlestick plot in you IPython notebook you can use this


Code:
from matplotlib.finance import *
fig = plt.figure()
ax = fig.add_subplot(111, ylabel='price')
Date = range(1,len(data['BTC'])+1)
Open = data['BTC']['open'].values
High = data['BTC']['high'].values
Low = data['BTC']['low'].values
Close = data['BTC']['close'].values
Volume = data['BTC']['volume'].values
DOCHLV = zip(Date, Open, Close, High, Low, Volume)
candlestick(ax, DOCHLV, width=0.6, colorup='g', colordown='r', alpha=1.0)

but it needs to be improved

but there is also an issue with Pandas

https://github.com/pydata/pandas/issues/783

see also this notebook

http://nbviewer.ipython.org/4982660/

it seems to be a clean way to draw candlesticks
member
Activity: 105
Merit: 10
My idea to build pseudo tick price using candlestick can be expressed like that :

Code:
data['BTC'] = data['BTC'].resample('8H', how='mean')
data['BTC']['ModCol'] = np.mod(np.arange(0,len(data['BTC']),1),3)
data['BTC']['price'] = np.where(data['BTC']['ModCol']==0, data['BTC']['open'], np.nan).fillna(0) + np.where(data['BTC']['ModCol']==1, data['BTC']['low'].shift(1), np.nan).fillna(0) + np.where(data['BTC']['ModCol']==2, data['BTC']['high'].shift(2), np.nan).fillna(0)
in fact I build TimeSeries for price like

Code:
Open
Low
High
Open
Low
High
...
(assuming that there is no gap between close of previous candle and open of current candle)

There is probably a better way to do this !
(But I'm not very clever with code vectorization)

Edit:
in fact we should make a dataframe which simulates candlestick trying to build

Code:
Time               Open  High Low   Close
===========================================
t0               = Open  Open Open  Open
t0+timeframe/3   = Open  Open Low   Low
t0+2*timeframe/3 = Open  High Low   High
t1=t0+timeframe  = Open  High Low   Close

I'm sorry but I don't understand what you (MtQuid) are saying :
Quote
handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance.
I don't think handle_data manage how candlestick are being build over time...

@hugolp
problem with big data is not about storing them... it's about processing them...
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)


I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem?
newbie
Activity: 9
Merit: 0
Thanks for your code.
We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio ...)

I didn't find a good tutorial about zipline
maybe you have any pointer to provide me ?
we should also add entry / exit point efficiency.

There there are some examples:
https://github.com/quantopian/zipline/tree/master/zipline/examples

But I haven't found a ton of documentation.  At the moment if feels like one might be doing some code skimming and pydoc usage to look at the API.  I haven't tried compiling the documentation in the repo but the few files I looked at didn't seem to expand far beyond that.

Here's an example that extends MtQuid's notebook to try a method on the zipline mailing list:

http://nbviewer.ipython.org/ec53445ececcd94980b8

I'm not sure if those are correct or not, didn't check that the dates are actually UTC.

you was talking about trading fees...
but there is an other kind of fee that is not modeled here : spread
bid and ask price for a given BTC volume are differents !
difference is called spread = ask - bid
even if trading fees were 0% you will lose money to buy and sell BTC simultaneously

For now, we only have price... we don't know what spread value was for a
given datetime !

moreover unlike Forex market where spread is either fixed
or time dependant... in BTC market spread is volume dependant.
the higher BTC volume is, the higher spread is !!!

but that's probably only noticeable for very big BTC volume

Yeah, this is one thing that makes me the most concerned about backtesting. I don't know if there are any data sources that keep book history that could be used for this purpose either.  I can think of ways to maybe get a sense for it from the data by looking for alternating jumps in the data, but that'd be an approximation at best.

The only thing that came up in a quick search regarding order book history was this other thread which links to data from 2012:
https://bitcointalksearch.org/topic/mtgox-usd-depth-historic-data-for-your-pleasure-88054

I'd be interested in other theories on how to deal with this.  I'd be thinking maybe either some estimated factor, or binning the data and going by the low, or doing priced asks/bids rather than market orders?  Or, as you suggest perhaps we could collect some data and do a model based on volume?  One could also look at trades that alternatingly up/down to get an idea of the spread?  Any modeling would need some test data though.
member
Activity: 105
Merit: 10
Thanks for your code.
We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio, drawdown ...)

I didn't find a good tutorial about zipline
maybe you have any pointer to provide me ?
we should also add entry / exit point efficiency.

you was talking about trading fees...
but there is an other kind of fee that is not modeled here : spread
bid and ask price for a given BTC volume are differents !
difference is called spread = ask - bid
even if trading fees were 0% you will lose money to buy and sell BTC simultaneously

For now, we only have price... we don't know what spread value was for a
given datetime !

moreover unlike Forex market where spread is either fixed
or time dependant... in BTC market spread is volume dependant.
the higher BTC volume is, the higher spread is !!!

but that's probably only noticeable for very big BTC volume
newbie
Activity: 24
Merit: 0
You have to use panel because 'data' is a DataFrame dict of TimeSeries.  As far as I know a TimeSeries can only have one value for each time-stamped row, and that is why the previous notebook only passed the single ['price'] TimeSeries and not the rest.  We need multiple values/observations (open,high,low,close,volume...) per row in the TimeSeries so we use the panel method.  Reading the Quantopian forum and zipline commit logs you can see that this is the chosen and agreed upon method for passing around OHCL sets.
I just took adjusted from the load_bars_from_yahoo() source and use the defaulted values but I'll delete the code on Monday as Bitcoin is without splits and dividends.  I was in a rush to post before the roast.

You can use whatever data you want with the simulator but you will need to turn it into a panel if you want to be able to pass around multiple observations per tick, and also if you want to have the TradingAlgorithm be able to issue orders in the handle_data(), unless you build your own datasource tick generator wich might not be a bad thing.
Anyway, It is very easy now.
handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance.
Everything is now possible.
Add bitcoincharts json files with selectable time collapse and then we are are really cooking
..but that is work for Monday.

Edit:  notebook has been updated from the lastest DMA example shipped with zipline source. 
The bugs with extra values added and non showing graph arrows have been resolved.

Edit: we still need to work MtGox fees into the analysis but I'm doing that on a goxtool bot so I have the accurate code
member
Activity: 105
Merit: 10
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)

About latest version of MtQuid notebook...
http://nbviewer.ipython.org/5561936

I don't understand why using a Pandas panel

I also don't understand the goal of "ajusted"

I think we just need to resample data

A very basic idea (to test long strategy) could be to send price as follow

OPEN_dt0
LOW_dt0
HIGH_dt0
CLOSE_dt0
OPEN_dt1
LOW_dt1
HIGH_dt1
CLOSE_dt1
...

it allows to consider the worst case
so if we set stop loss and take profit
in simulator, price will first go in direction of stop loss and after into take profit direction

legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Almost there....
I also agree that we need to use a better data source and that should probably be bitcoincharts.

Why not mtgox itself?
newbie
Activity: 24
Merit: 0
Yeah I'm also wondering about the not seeing the buy/sell (^ and v)
I think this ties in with me having to add extra values to those two series.
I finished that stuff off drunk last night...but that charts show it would have made profit  Tongue

I've updated the notebook now to use OHLC and it works.
I took the code from load_bars_from_yahoo() so we can use stuff like data['BTC']['open'] within the handler.
Works well.

Almost there....
I also agree that we need to use a better data source and that should probably be bitcoincharts.

Bots just puke up machine language.
Time to talk to some humans down the pub.  Sunday Roast!!! Smiley
member
Activity: 105
Merit: 10
Hello,

I'm starting here a new thread about zipline / Quantopian
It's a python trading framework - event driven that can be use
for backtesting strategy.

https://bitcointalksearch.org/topic/m.2105722
http://vimeo.com/53064082

If you want to try it, you should run ipython with pylab inline
Code:
ipython notebook --pylab inline

MtQuid posts a Python Notebook here
https://bitcointalksearch.org/topic/m.2116508
http://nbviewer.ipython.org/5561936

I'm posting here to avoid to overload goxtool thread
(ncurse python software to trade BTC with MtGox)


I have some questions... about zipline...

First, I noticed that data (daily mtgox|BTC/USD data are coming from
http://www.quandl.com/api/v1/datasets/BITCOIN/MTGOXUSD.csv?trim_start=2012-01-01&sort_order=asc
( http://www.quandl.com/BITCOIN-Bitcoin-Charts/MTGOXUSD-Bitcoin-Markets-mtgoxUSD )
raw data from http://bitcoincharts.com/charts/chart.json?m=mtgoxUSD
 
 
Code:
                              open       high     low      close         volume       volume_usd       price
Date                                                                                                          
2013-05-11 00:00:00+00:00  117.70000  118.74000  113.00  113.47000   25532.277740   2952016.798507  115.619015
2013-05-10 00:00:00+00:00  112.79900  122.50000  111.54  117.70000   77443.672681   9140709.083964  118.030418
2013-05-09 00:00:00+00:00  113.20000  113.71852  108.80  112.79900   26894.458204   3003068.410660  111.661235
2013-05-08 00:00:00+00:00  109.60013  116.77700  109.50  113.20000   61680.324704   6990518.957611  113.334665
2013-05-07 00:00:00+00:00  112.25000  114.00000   97.52  109.60013  139626.724860  14898971.673747  106.705731

DatetimeIndex: 497 entries, 2013-05-11 00:00:00+00:00 to 2012-01-01 00:00:00+00:00
Data columns:
open          497  non-null values
high          497  non-null values
low           497  non-null values
close         497  non-null values
volume        497  non-null values
volume_usd    497  non-null values
price         497  non-null values
dtypes: float64(7)
                              open    high      low    close         volume      volume_usd     price
Date                                                                                                
2012-01-05 00:00:00+00:00  5.57383  7.2200  5.57401  6.94760  182328.193876  1130623.294233  6.201034
2012-01-04 00:00:00+00:00  4.88080  5.7000  4.75100  5.57383  131170.856663   688717.856619  5.250540
2012-01-03 00:00:00+00:00  5.21678  5.2900  4.65000  4.88080  125170.253872   619170.541604  4.946627
2012-01-02 00:00:00+00:00  5.26766  5.4700  4.80000  5.21678   69150.931963   360357.284302  5.211170
2012-01-01 00:00:00+00:00  4.72202  5.4999  4.61500  5.26766  108509.229901   553045.139811  5.096757

Note: in fact data need to be sort using ascending index
without that you will get this error message
Code:
AssertionError: Period start falls after period end.

I wonder what is "weighted price"... (renamed price)

this notebook seems to use this "weighted price" to simulate kind of tick data

it will be in my mind much better to simulate each price that have been seen on market (open high low close)
because if you are going long and you put a Stop Loss, it will be probably be hitten by low price.
(or if you are goind short it will probably be hitten by high price)

Second,
I have some problem to run notebook (I always get a (*) )
but I'm running without notebook
http://pastebin.com/jmfuNTKs

Third,
I wonder why I don't see buy/sell (^ and v)

Fourth,
what about day trading !!!
(with M15 timeframe !)
some data are here
https://bitcointalksearch.org/topic/crypto-currencies-historical-data-199979
or https://bitcointalksearch.org/topic/data-btcusd-data-in-csv-format-various-timeframes-196834
unfortunately I'm quite busy today ;-(

Kind regards
Pages:
Jump to: