Author

Topic: zipline / Quantopian - backtesting / trading framework (Read 26857 times)

member
Activity: 105
Merit: 10
zipline / quantopian developers seems to be interested by this thread

you can also share your experiences at
https://groups.google.com/forum/#!topic/zipline/M39VhqDRORM
member
Activity: 105
Merit: 10
legendary
Activity: 1008
Merit: 1007
Could you provide such data ?

Unfortunately, I've only been collecting it for the last 6 days. I had been collecting for 1 month last year but there was a big gap between that and now so I don't have anything worth while to give you :|

Best thing to do is to start collecting it now and then if you decide you need it, you've got it Smiley
member
Activity: 105
Merit: 10
Could you provide such data ?
legendary
Activity: 1008
Merit: 1007
don't worry... I don't consider you as disparaging us...
A volume dependant model for spread could help...
Maybe you can help us to get it (using historical orderbook depth)

I'm not sure how you could approximate the order-book well enough with just two values - it works fine for forex symbols because of the huge liquidity, but bitcoin is a different kettle of fish.

Like I say, I just store the whole order-book every 10 seconds - really it wants to be tick by tick, though for total accuracy Smiley
member
Activity: 105
Merit: 10
don't worry... I don't consider you as disparaging us...
A volume dependant model for spread could help...
Maybe you can help us to get it (using historical orderbook depth)
legendary
Activity: 1008
Merit: 1007
I don't want to be disparaging, but I found that just using OHLCV was giving very misleading results for bitcoin when testing algorithms.

The reason is liquidity. The top-of-the-book Bid/Ask values often don't have enough volume associated with them to be tradable   - in real life you'd get a partial fill on your orders, or no fill at all.

You really need the entire order-book, which is what I ended up capturing. This perfectly captures the liquidity of the market for the period tested. Requires a lot of data, but storage is cheap.

Smiley

Cheers, Paul.
member
Activity: 105
Merit: 10
Quote
Nice work.... if only we could go back in time

zipline uses Delorean...
http://delorean.readthedocs.org/en/latest/quickstart.html
maybe it could help  Wink

+1 for splitting work into 2 parts:
a data producer (which store OHLC values into database)
a data consumer which will read data, show prices, apply strategy

but in such a case I wonder how you can    
inform data consumer that new data are just coming in...

but maybe in your idea data consumer will only react every 5 minutes...

that's quite different from Metrader start function which is launch
every tick...

but when I'm saying that "Metrader start function is launch every tick..."
in fact I think that start function is launch every tick **if that's possible**
(if previous start function call is finished)
if previous start function call is not finished, even if a new tick is coming
start will not be executed again.
(but Metatrader shows on the GUI the new price from last tick,
but expert advisor start function is not executed again)

So I think a kind of mechanism with signal/slot is needed but we also need a kind of "lock"
mechanism.

I hope you understand what I mean...
newbie
Activity: 24
Merit: 0
New feature :
variable lot size (instead of fixed lot size)
self.trade_volume is now a function which returns volume to trade according
portfolio cash and BTCUSD price.

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

alpha= 138.7%
max DD = 7.5%

should I order the rolls-royce?  Grin

Nice work.... if only we could go back in time Smiley
I think there were some missing rows in the method I used to pull from bitcoincharts so I've made some modifications and a cache that stays up to date.
I've also include frequency information but still 'minute' runs are not working correctly.  We might have to tamper with zipline internals to get them going.
You should update your work to at least use the better bitcoincharts methods.

Now in my testing I keep an up to date cache of minute data from bitcoincharts and resample that to how I want it.

Fractional trades are still an issue.  
A fix would be to use satoshi but then the prices will all have to be suitable scaled and the results will just look a mess.
Or bump the initial portfolio value up to a few billion and increase the trade volume so that there will be no fractions but then slippage is very unrealistic.
Or modify zipline internals or use an inherited object

About passing live trade data to an algorithm:
I'm thinking of creating a new TradingAlgorithm object that is passed a DateTimeIndex and will take trade data from the phantom sqlite3 db and create OHCL and pass both simulated live data and OHCL to an algorithm under test.  So the wrapper gets a datetime and frequency from zipline and extracts the data from the phantom db, passes this to the algo inder test and then passes the results back to zipline.
Just an idea bu this might also be a way to fix the fractional trades issue.

Updated - http://nbviewer.ipython.org/5572250


Edit: In fact even if the zipline results records are always daily it does not matter because we can put minute results into something else, or accumulate them for a day.  Depends on what you want to log in the results.  Currently, I guess that, the same record gets overwritten a lot for each minute in a day so only the last record of the day is saved in the results
Also seams like there is a bug in my example because the trade volumes are messed up and I've lost a lot of money :/

Edit:  All working now.  I was not using ohlc resampling
member
Activity: 105
Merit: 10
New feature :
variable lot size (instead of fixed lot size)
self.trade_volume is now a function which returns volume to trade according
portfolio cash and BTCUSD price.

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

alpha= 138.7%
max DD = 7.5%

should I order the rolls-royce?  Grin
member
Activity: 105
Merit: 10
Quote
Am I correct to assume it should be
Code:
size=self.trade_volume

no, it's
Code:
size=self.invested
[/quote]
(because of commission fees)
 that's fixed in my gist now...

if you have some coding ability could you try to implement this
http://www.onestepremoved.com/backtesting-efficiency/

Quote
Really fascinating stuff, playing with these examples to try and get up to speed.

Overfitting a backtest, is really easy... but it does not reflect future results
https://www.google.fr/search?q=backtest+overfitting
newbie
Activity: 7
Merit: 0
Really fascinating stuff, playing with these examples to try and get up to speed.


c0inbuster, I get an error:
Code:
    print "{dt}: hit BreakEvent - moving Stop Loss from {SL1} to {SL2}".format(size=size, dt=data['BTC'].datetime, SL1=self.price_SL, SL2=self.price_BE_offset)
NameError: global name 'size' is not defined

Am I correct to assume it should be
Code:
size=self.trade_volume
?
member
Activity: 105
Merit: 10
A new poem for you guys...

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

Some features:
 - Stop Loss
 - Take Profit
 - Trailing Stop
 - BreakEven
(set as percentage of price)

Max drawdown is now 1.9 %
alpha: 23.52%

previous values was:
alpha:16.34%
max_drawdown:12.55%

But I'm sure that our backtesting is false...
because we are only using price (and never low_price)

ToDo:
variable lot size (depending of portfolio value) to keep same risk value for each trade
output data such as trade entry efficiency / exit efficiency
sub-daily timeframe trading (M15, M30, H1...)
(see zipline team ?)
add real time feed (goxtool or btcx code could help)
use a file to store Stop Loss/Take Profit values
(we should also consider that we could have several positions opened...
that's not the case in this strategy but it could be in a more sophisticated strategy)
so we need to store a trade identifier (trade number)
we should also add a MagicNumber to identify that a given trade has been opened
by a given strategy.
http://www.onestepremoved.com/magic-number/

use scipy optimize to be able to optimize parameters.

divide data into 2 parts
 - data for optimizing parameter
 - data for testing parameter
in order to ensure that settings are robust

walk forward analysis...
member
Activity: 105
Merit: 10
Quote
Is that a poem ?

english is not my mother tongue that's why I'm probably not very
 able to write as you could expect !

I have several problems:

I don't see anyone here feeding zipline with tick data and doing EMA with candlestick data
inside handle_data.
(so we will be able to calculate EMA on several timeframe)

I also don't know if zipline will support non daily timeframe, and at least the planned
to provide support for such feature.

I don't how we could feed zipline with real time data

Why not making analysis each time a tick is received...
This is exactly what Metatrader is doing in start() function
http://book.mql4.com/programm/special

@knowitnothing
I will have a look at your code btcx
member
Activity: 78
Merit: 10
if we feed zipline with tick data,
handle data (which is called several times)
will have to resample data several times...
that's why I don't know if it's a good idea....

Is that a poem ?

You are supposed to collect data in real time, which is very different from doing analysis in real time. There is very little value in doing the analysis in real time, actually.

So suppose right now you have all data ever produced by some exchange, and some new data come in. You aggregate it to your existing data, and, for example, each 5 minutes you update your analysis on this data. Note that there is also little value in using the entire history for doing something like EMAs, per definition of EMA. You still need to resample data, but only each 5 minutes. And resampling is done very efficiently by pandas.
member
Activity: 105
Merit: 10
Thanks MtGuid

@btc_lurker
if we feed zipline with tick data,
handle data (which is called several times)
will have to resample data several times...
that's why I don't know if it's a good idea....
newbie
Activity: 24
Merit: 0
I've not found a good tutorial on zipline so have just been reading the source code.

This new book pulls data from bitcoincharts
http://nbviewer.ipython.org/5572250

You can use non daily data but the results from TradingAlgorithm.run() are daily so you have to play around a bit at the end.
The simulation will run correctly though.

Can not place fractional orders.
To fix the issue of not being able to place fractional orders we will have to use MtGox order volumes which are satoshi.

And there are no buys or sells in results when using M15,H1 etc... even though the buy or sell takes place during the simulation.

member
Activity: 78
Merit: 10
Thanks a lot for your link...

So in your mind we could feed zipline with tick data...

But I wonder if we could have differents indicators with differents timeframe
(M30 and H1 for example)

for example a Moving average based on M30 candlestick chart
and an other indicator (RSI for example) based on a H1 candlestick chart

I didn't really read the thread, but it seems people is using pandas here. Are you aware of the resample method that is available ? If you have real time (ticker) data, you can resample it based on any granularity very easily using pandas.
member
Activity: 105
Merit: 10
Thanks a lot for your link...

So in your mind we could feed zipline with tick data...

But I wonder if we could have differents indicators with differents timeframe
(M30 and H1 for example)

for example a Moving average based on M30 candlestick chart
and an other indicator (RSI for example) based on a H1 candlestick chart
newbie
Activity: 9
Merit: 0
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)


I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem?

Certainly.  I think as it stands, I think the main problem with getting detailed trading data is that if you want to start from scratch it will take some time to pull it down from mtgox.  There is a sqlite database up to fairly recent trades here (and a python script that will attempt to pull in more recent trades):

http://cahier2.ww7.be/bitcoinmirror/phantomcircuit/

Edit: the script connects to mtgox.com rather than data.mtgox.com and should be updated in order to continue getting transactions.
newbie
Activity: 9
Merit: 0
but there is also an issue with Pandas

https://github.com/pydata/pandas/issues/783

see also this notebook

http://nbviewer.ipython.org/4982660/

it seems to be a clean way to draw candlesticks

Stumbled on that issue before.  It looks like the user who suggested a solution, even though he didn't submit a patch, has put up some of his personal charting tools:

https://github.com/dalejung/trtools

Haven't tried them yet, seems like it needs a few dependencies like tables (and consequently hdf5).
member
Activity: 105
Merit: 10
If you want to draw candlestick plot in you IPython notebook you can use this


Code:
from matplotlib.finance import *
fig = plt.figure()
ax = fig.add_subplot(111, ylabel='price')
Date = range(1,len(data['BTC'])+1)
Open = data['BTC']['open'].values
High = data['BTC']['high'].values
Low = data['BTC']['low'].values
Close = data['BTC']['close'].values
Volume = data['BTC']['volume'].values
DOCHLV = zip(Date, Open, Close, High, Low, Volume)
candlestick(ax, DOCHLV, width=0.6, colorup='g', colordown='r', alpha=1.0)

but it needs to be improved

but there is also an issue with Pandas

https://github.com/pydata/pandas/issues/783

see also this notebook

http://nbviewer.ipython.org/4982660/

it seems to be a clean way to draw candlesticks
member
Activity: 105
Merit: 10
My idea to build pseudo tick price using candlestick can be expressed like that :

Code:
data['BTC'] = data['BTC'].resample('8H', how='mean')
data['BTC']['ModCol'] = np.mod(np.arange(0,len(data['BTC']),1),3)
data['BTC']['price'] = np.where(data['BTC']['ModCol']==0, data['BTC']['open'], np.nan).fillna(0) + np.where(data['BTC']['ModCol']==1, data['BTC']['low'].shift(1), np.nan).fillna(0) + np.where(data['BTC']['ModCol']==2, data['BTC']['high'].shift(2), np.nan).fillna(0)
in fact I build TimeSeries for price like

Code:
Open
Low
High
Open
Low
High
...
(assuming that there is no gap between close of previous candle and open of current candle)

There is probably a better way to do this !
(But I'm not very clever with code vectorization)

Edit:
in fact we should make a dataframe which simulates candlestick trying to build

Code:
Time               Open  High Low   Close
===========================================
t0               = Open  Open Open  Open
t0+timeframe/3   = Open  Open Low   Low
t0+2*timeframe/3 = Open  High Low   High
t1=t0+timeframe  = Open  High Low   Close

I'm sorry but I don't understand what you (MtQuid) are saying :
Quote
handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance.
I don't think handle_data manage how candlestick are being build over time...

@hugolp
problem with big data is not about storing them... it's about processing them...
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)


I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem?
newbie
Activity: 9
Merit: 0
Thanks for your code.
We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio ...)

I didn't find a good tutorial about zipline
maybe you have any pointer to provide me ?
we should also add entry / exit point efficiency.

There there are some examples:
https://github.com/quantopian/zipline/tree/master/zipline/examples

But I haven't found a ton of documentation.  At the moment if feels like one might be doing some code skimming and pydoc usage to look at the API.  I haven't tried compiling the documentation in the repo but the few files I looked at didn't seem to expand far beyond that.

Here's an example that extends MtQuid's notebook to try a method on the zipline mailing list:

http://nbviewer.ipython.org/ec53445ececcd94980b8

I'm not sure if those are correct or not, didn't check that the dates are actually UTC.

you was talking about trading fees...
but there is an other kind of fee that is not modeled here : spread
bid and ask price for a given BTC volume are differents !
difference is called spread = ask - bid
even if trading fees were 0% you will lose money to buy and sell BTC simultaneously

For now, we only have price... we don't know what spread value was for a
given datetime !

moreover unlike Forex market where spread is either fixed
or time dependant... in BTC market spread is volume dependant.
the higher BTC volume is, the higher spread is !!!

but that's probably only noticeable for very big BTC volume

Yeah, this is one thing that makes me the most concerned about backtesting. I don't know if there are any data sources that keep book history that could be used for this purpose either.  I can think of ways to maybe get a sense for it from the data by looking for alternating jumps in the data, but that'd be an approximation at best.

The only thing that came up in a quick search regarding order book history was this other thread which links to data from 2012:
https://bitcointalksearch.org/topic/mtgox-usd-depth-historic-data-for-your-pleasure-88054

I'd be interested in other theories on how to deal with this.  I'd be thinking maybe either some estimated factor, or binning the data and going by the low, or doing priced asks/bids rather than market orders?  Or, as you suggest perhaps we could collect some data and do a model based on volume?  One could also look at trades that alternatingly up/down to get an idea of the spread?  Any modeling would need some test data though.
member
Activity: 105
Merit: 10
Thanks for your code.
We should output portfolio analysis (alpha, beta, sharpe ratio, sortino ratio, drawdown ...)

I didn't find a good tutorial about zipline
maybe you have any pointer to provide me ?
we should also add entry / exit point efficiency.

you was talking about trading fees...
but there is an other kind of fee that is not modeled here : spread
bid and ask price for a given BTC volume are differents !
difference is called spread = ask - bid
even if trading fees were 0% you will lose money to buy and sell BTC simultaneously

For now, we only have price... we don't know what spread value was for a
given datetime !

moreover unlike Forex market where spread is either fixed
or time dependant... in BTC market spread is volume dependant.
the higher BTC volume is, the higher spread is !!!

but that's probably only noticeable for very big BTC volume
newbie
Activity: 24
Merit: 0
You have to use panel because 'data' is a DataFrame dict of TimeSeries.  As far as I know a TimeSeries can only have one value for each time-stamped row, and that is why the previous notebook only passed the single ['price'] TimeSeries and not the rest.  We need multiple values/observations (open,high,low,close,volume...) per row in the TimeSeries so we use the panel method.  Reading the Quantopian forum and zipline commit logs you can see that this is the chosen and agreed upon method for passing around OHCL sets.
I just took adjusted from the load_bars_from_yahoo() source and use the defaulted values but I'll delete the code on Monday as Bitcoin is without splits and dividends.  I was in a rush to post before the roast.

You can use whatever data you want with the simulator but you will need to turn it into a panel if you want to be able to pass around multiple observations per tick, and also if you want to have the TradingAlgorithm be able to issue orders in the handle_data(), unless you build your own datasource tick generator wich might not be a bad thing.
Anyway, It is very easy now.
handle_data() is your brains and it can view all the OHCL per tick and issue order(buy/sell) with the results tracked for easy analysis of performance.
Everything is now possible.
Add bitcoincharts json files with selectable time collapse and then we are are really cooking
..but that is work for Monday.

Edit:  notebook has been updated from the lastest DMA example shipped with zipline source. 
The bugs with extra values added and non showing graph arrows have been resolved.

Edit: we still need to work MtGox fees into the analysis but I'm doing that on a goxtool bot so I have the accurate code
member
Activity: 105
Merit: 10
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)

About latest version of MtQuid notebook...
http://nbviewer.ipython.org/5561936

I don't understand why using a Pandas panel

I also don't understand the goal of "ajusted"

I think we just need to resample data

A very basic idea (to test long strategy) could be to send price as follow

OPEN_dt0
LOW_dt0
HIGH_dt0
CLOSE_dt0
OPEN_dt1
LOW_dt1
HIGH_dt1
CLOSE_dt1
...

it allows to consider the worst case
so if we set stop loss and take profit
in simulator, price will first go in direction of stop loss and after into take profit direction

legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Almost there....
I also agree that we need to use a better data source and that should probably be bitcoincharts.

Why not mtgox itself?
newbie
Activity: 24
Merit: 0
Yeah I'm also wondering about the not seeing the buy/sell (^ and v)
I think this ties in with me having to add extra values to those two series.
I finished that stuff off drunk last night...but that charts show it would have made profit  Tongue

I've updated the notebook now to use OHLC and it works.
I took the code from load_bars_from_yahoo() so we can use stuff like data['BTC']['open'] within the handler.
Works well.

Almost there....
I also agree that we need to use a better data source and that should probably be bitcoincharts.

Bots just puke up machine language.
Time to talk to some humans down the pub.  Sunday Roast!!! Smiley
member
Activity: 105
Merit: 10
Hello,

I'm starting here a new thread about zipline / Quantopian
It's a python trading framework - event driven that can be use
for backtesting strategy.

https://bitcointalksearch.org/topic/m.2105722
http://vimeo.com/53064082

If you want to try it, you should run ipython with pylab inline
Code:
ipython notebook --pylab inline

MtQuid posts a Python Notebook here
https://bitcointalksearch.org/topic/m.2116508
http://nbviewer.ipython.org/5561936

I'm posting here to avoid to overload goxtool thread
(ncurse python software to trade BTC with MtGox)


I have some questions... about zipline...

First, I noticed that data (daily mtgox|BTC/USD data are coming from
http://www.quandl.com/api/v1/datasets/BITCOIN/MTGOXUSD.csv?trim_start=2012-01-01&sort_order=asc
( http://www.quandl.com/BITCOIN-Bitcoin-Charts/MTGOXUSD-Bitcoin-Markets-mtgoxUSD )
raw data from http://bitcoincharts.com/charts/chart.json?m=mtgoxUSD
 
 
Code:
                              open       high     low      close         volume       volume_usd       price
Date                                                                                                          
2013-05-11 00:00:00+00:00  117.70000  118.74000  113.00  113.47000   25532.277740   2952016.798507  115.619015
2013-05-10 00:00:00+00:00  112.79900  122.50000  111.54  117.70000   77443.672681   9140709.083964  118.030418
2013-05-09 00:00:00+00:00  113.20000  113.71852  108.80  112.79900   26894.458204   3003068.410660  111.661235
2013-05-08 00:00:00+00:00  109.60013  116.77700  109.50  113.20000   61680.324704   6990518.957611  113.334665
2013-05-07 00:00:00+00:00  112.25000  114.00000   97.52  109.60013  139626.724860  14898971.673747  106.705731

DatetimeIndex: 497 entries, 2013-05-11 00:00:00+00:00 to 2012-01-01 00:00:00+00:00
Data columns:
open          497  non-null values
high          497  non-null values
low           497  non-null values
close         497  non-null values
volume        497  non-null values
volume_usd    497  non-null values
price         497  non-null values
dtypes: float64(7)
                              open    high      low    close         volume      volume_usd     price
Date                                                                                                
2012-01-05 00:00:00+00:00  5.57383  7.2200  5.57401  6.94760  182328.193876  1130623.294233  6.201034
2012-01-04 00:00:00+00:00  4.88080  5.7000  4.75100  5.57383  131170.856663   688717.856619  5.250540
2012-01-03 00:00:00+00:00  5.21678  5.2900  4.65000  4.88080  125170.253872   619170.541604  4.946627
2012-01-02 00:00:00+00:00  5.26766  5.4700  4.80000  5.21678   69150.931963   360357.284302  5.211170
2012-01-01 00:00:00+00:00  4.72202  5.4999  4.61500  5.26766  108509.229901   553045.139811  5.096757

Note: in fact data need to be sort using ascending index
without that you will get this error message
Code:
AssertionError: Period start falls after period end.

I wonder what is "weighted price"... (renamed price)

this notebook seems to use this "weighted price" to simulate kind of tick data

it will be in my mind much better to simulate each price that have been seen on market (open high low close)
because if you are going long and you put a Stop Loss, it will be probably be hitten by low price.
(or if you are goind short it will probably be hitten by high price)

Second,
I have some problem to run notebook (I always get a (*) )
but I'm running without notebook
http://pastebin.com/jmfuNTKs

Third,
I wonder why I don't see buy/sell (^ and v)

Fourth,
what about day trading !!!
(with M15 timeframe !)
some data are here
https://bitcointalksearch.org/topic/crypto-currencies-historical-data-199979
or https://bitcointalksearch.org/topic/data-btcusd-data-in-csv-format-various-timeframes-196834
unfortunately I'm quite busy today ;-(

Kind regards
Jump to: