Pages:
Author

Topic: zipline / Quantopian - backtesting / trading framework (Read 26825 times)

member
Activity: 105
Merit: 10
zipline / quantopian developers seems to be interested by this thread

you can also share your experiences at
https://groups.google.com/forum/#!topic/zipline/M39VhqDRORM
member
Activity: 105
Merit: 10
legendary
Activity: 1008
Merit: 1002
Could you provide such data ?

Unfortunately, I've only been collecting it for the last 6 days. I had been collecting for 1 month last year but there was a big gap between that and now so I don't have anything worth while to give you :|

Best thing to do is to start collecting it now and then if you decide you need it, you've got it Smiley
member
Activity: 105
Merit: 10
Could you provide such data ?
legendary
Activity: 1008
Merit: 1002
don't worry... I don't consider you as disparaging us...
A volume dependant model for spread could help...
Maybe you can help us to get it (using historical orderbook depth)

I'm not sure how you could approximate the order-book well enough with just two values - it works fine for forex symbols because of the huge liquidity, but bitcoin is a different kettle of fish.

Like I say, I just store the whole order-book every 10 seconds - really it wants to be tick by tick, though for total accuracy Smiley
member
Activity: 105
Merit: 10
don't worry... I don't consider you as disparaging us...
A volume dependant model for spread could help...
Maybe you can help us to get it (using historical orderbook depth)
legendary
Activity: 1008
Merit: 1002
I don't want to be disparaging, but I found that just using OHLCV was giving very misleading results for bitcoin when testing algorithms.

The reason is liquidity. The top-of-the-book Bid/Ask values often don't have enough volume associated with them to be tradable   - in real life you'd get a partial fill on your orders, or no fill at all.

You really need the entire order-book, which is what I ended up capturing. This perfectly captures the liquidity of the market for the period tested. Requires a lot of data, but storage is cheap.

Smiley

Cheers, Paul.
member
Activity: 105
Merit: 10
Quote
Nice work.... if only we could go back in time

zipline uses Delorean...
http://delorean.readthedocs.org/en/latest/quickstart.html
maybe it could help  Wink

+1 for splitting work into 2 parts:
a data producer (which store OHLC values into database)
a data consumer which will read data, show prices, apply strategy

but in such a case I wonder how you can    
inform data consumer that new data are just coming in...

but maybe in your idea data consumer will only react every 5 minutes...

that's quite different from Metrader start function which is launch
every tick...

but when I'm saying that "Metrader start function is launch every tick..."
in fact I think that start function is launch every tick **if that's possible**
(if previous start function call is finished)
if previous start function call is not finished, even if a new tick is coming
start will not be executed again.
(but Metatrader shows on the GUI the new price from last tick,
but expert advisor start function is not executed again)

So I think a kind of mechanism with signal/slot is needed but we also need a kind of "lock"
mechanism.

I hope you understand what I mean...
newbie
Activity: 24
Merit: 0
New feature :
variable lot size (instead of fixed lot size)
self.trade_volume is now a function which returns volume to trade according
portfolio cash and BTCUSD price.

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

alpha= 138.7%
max DD = 7.5%

should I order the rolls-royce?  Grin

Nice work.... if only we could go back in time Smiley
I think there were some missing rows in the method I used to pull from bitcoincharts so I've made some modifications and a cache that stays up to date.
I've also include frequency information but still 'minute' runs are not working correctly.  We might have to tamper with zipline internals to get them going.
You should update your work to at least use the better bitcoincharts methods.

Now in my testing I keep an up to date cache of minute data from bitcoincharts and resample that to how I want it.

Fractional trades are still an issue.  
A fix would be to use satoshi but then the prices will all have to be suitable scaled and the results will just look a mess.
Or bump the initial portfolio value up to a few billion and increase the trade volume so that there will be no fractions but then slippage is very unrealistic.
Or modify zipline internals or use an inherited object

About passing live trade data to an algorithm:
I'm thinking of creating a new TradingAlgorithm object that is passed a DateTimeIndex and will take trade data from the phantom sqlite3 db and create OHCL and pass both simulated live data and OHCL to an algorithm under test.  So the wrapper gets a datetime and frequency from zipline and extracts the data from the phantom db, passes this to the algo inder test and then passes the results back to zipline.
Just an idea bu this might also be a way to fix the fractional trades issue.

Updated - http://nbviewer.ipython.org/5572250


Edit: In fact even if the zipline results records are always daily it does not matter because we can put minute results into something else, or accumulate them for a day.  Depends on what you want to log in the results.  Currently, I guess that, the same record gets overwritten a lot for each minute in a day so only the last record of the day is saved in the results
Also seams like there is a bug in my example because the trade volumes are messed up and I've lost a lot of money :/

Edit:  All working now.  I was not using ohlc resampling
member
Activity: 105
Merit: 10
New feature :
variable lot size (instead of fixed lot size)
self.trade_volume is now a function which returns volume to trade according
portfolio cash and BTCUSD price.

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

alpha= 138.7%
max DD = 7.5%

should I order the rolls-royce?  Grin
member
Activity: 105
Merit: 10
Quote
Am I correct to assume it should be
Code:
size=self.trade_volume

no, it's
Code:
size=self.invested
[/quote]
(because of commission fees)
 that's fixed in my gist now...

if you have some coding ability could you try to implement this
http://www.onestepremoved.com/backtesting-efficiency/

Quote
Really fascinating stuff, playing with these examples to try and get up to speed.

Overfitting a backtest, is really easy... but it does not reflect future results
https://www.google.fr/search?q=backtest+overfitting
newbie
Activity: 7
Merit: 0
Really fascinating stuff, playing with these examples to try and get up to speed.


c0inbuster, I get an error:
Code:
    print "{dt}: hit BreakEvent - moving Stop Loss from {SL1} to {SL2}".format(size=size, dt=data['BTC'].datetime, SL1=self.price_SL, SL2=self.price_BE_offset)
NameError: global name 'size' is not defined

Am I correct to assume it should be
Code:
size=self.trade_volume
?
member
Activity: 105
Merit: 10
A new poem for you guys...

http://nbviewer.ipython.org/587a80f5e2eb9cf41d6d

Some features:
 - Stop Loss
 - Take Profit
 - Trailing Stop
 - BreakEven
(set as percentage of price)

Max drawdown is now 1.9 %
alpha: 23.52%

previous values was:
alpha:16.34%
max_drawdown:12.55%

But I'm sure that our backtesting is false...
because we are only using price (and never low_price)

ToDo:
variable lot size (depending of portfolio value) to keep same risk value for each trade
output data such as trade entry efficiency / exit efficiency
sub-daily timeframe trading (M15, M30, H1...)
(see zipline team ?)
add real time feed (goxtool or btcx code could help)
use a file to store Stop Loss/Take Profit values
(we should also consider that we could have several positions opened...
that's not the case in this strategy but it could be in a more sophisticated strategy)
so we need to store a trade identifier (trade number)
we should also add a MagicNumber to identify that a given trade has been opened
by a given strategy.
http://www.onestepremoved.com/magic-number/

use scipy optimize to be able to optimize parameters.

divide data into 2 parts
 - data for optimizing parameter
 - data for testing parameter
in order to ensure that settings are robust

walk forward analysis...
member
Activity: 105
Merit: 10
Quote
Is that a poem ?

english is not my mother tongue that's why I'm probably not very
 able to write as you could expect !

I have several problems:

I don't see anyone here feeding zipline with tick data and doing EMA with candlestick data
inside handle_data.
(so we will be able to calculate EMA on several timeframe)

I also don't know if zipline will support non daily timeframe, and at least the planned
to provide support for such feature.

I don't how we could feed zipline with real time data

Why not making analysis each time a tick is received...
This is exactly what Metatrader is doing in start() function
http://book.mql4.com/programm/special

@knowitnothing
I will have a look at your code btcx
member
Activity: 78
Merit: 10
if we feed zipline with tick data,
handle data (which is called several times)
will have to resample data several times...
that's why I don't know if it's a good idea....

Is that a poem ?

You are supposed to collect data in real time, which is very different from doing analysis in real time. There is very little value in doing the analysis in real time, actually.

So suppose right now you have all data ever produced by some exchange, and some new data come in. You aggregate it to your existing data, and, for example, each 5 minutes you update your analysis on this data. Note that there is also little value in using the entire history for doing something like EMAs, per definition of EMA. You still need to resample data, but only each 5 minutes. And resampling is done very efficiently by pandas.
member
Activity: 105
Merit: 10
Thanks MtGuid

@btc_lurker
if we feed zipline with tick data,
handle data (which is called several times)
will have to resample data several times...
that's why I don't know if it's a good idea....
newbie
Activity: 24
Merit: 0
I've not found a good tutorial on zipline so have just been reading the source code.

This new book pulls data from bitcoincharts
http://nbviewer.ipython.org/5572250

You can use non daily data but the results from TradingAlgorithm.run() are daily so you have to play around a bit at the end.
The simulation will run correctly though.

Can not place fractional orders.
To fix the issue of not being able to place fractional orders we will have to use MtGox order volumes which are satoshi.

And there are no buys or sells in results when using M15,H1 etc... even though the buy or sell takes place during the simulation.

member
Activity: 78
Merit: 10
Thanks a lot for your link...

So in your mind we could feed zipline with tick data...

But I wonder if we could have differents indicators with differents timeframe
(M30 and H1 for example)

for example a Moving average based on M30 candlestick chart
and an other indicator (RSI for example) based on a H1 candlestick chart

I didn't really read the thread, but it seems people is using pandas here. Are you aware of the resample method that is available ? If you have real time (ticker) data, you can resample it based on any granularity very easily using pandas.
member
Activity: 105
Merit: 10
Thanks a lot for your link...

So in your mind we could feed zipline with tick data...

But I wonder if we could have differents indicators with differents timeframe
(M30 and H1 for example)

for example a Moving average based on M30 candlestick chart
and an other indicator (RSI for example) based on a H1 candlestick chart
newbie
Activity: 9
Merit: 0
Because MtGox only provide (to my knowledge) only an API to download each trade
(and it's a very big file !!!)


I understand its a very big file and that using 15 minute candles reduces the size of the database considerably, but having each trade can help simulate the spread much better (obviously having the orderbook would be ideal, but thats even more data). And with today hard drives being so cheap is it really a problem?

Certainly.  I think as it stands, I think the main problem with getting detailed trading data is that if you want to start from scratch it will take some time to pull it down from mtgox.  There is a sqlite database up to fairly recent trades here (and a python script that will attempt to pull in more recent trades):

http://cahier2.ww7.be/bitcoinmirror/phantomcircuit/

Edit: the script connects to mtgox.com rather than data.mtgox.com and should be updated in order to continue getting transactions.
Pages:
Jump to: