Pages:
Author

Topic: New Trading Engine reaches 300k TPS in first tests (Read 2863 times)

newbie
Activity: 11
Merit: 0
Vladimir, when are you launching the exchange?

Here is how LMAX exchange measures latency http://www.lmax.com/popup-how-we-measure-trade-latency What are your numbers with an average mtgox-sized orderbook?
legendary
Activity: 980
Merit: 1008
Well, I can't really explain that. Mt. Gox' own REST API says only 23 trades happened in that second.

And I had my Websocket script running as well, and the log also agrees with 23 trades in that second:

Code:
15:04:52 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.0257 0.01 (110.39)
15:04:53 110.02067 0.0201 (110.39)
15:04:53 110.0 3.3287 (110.38)
15:04:55 110.0 0.01 (110.38)
legendary
Activity: 1008
Merit: 1007
What time zone is that?

Also, I just realized I'm using the BTCUSD feed. So I'm not counting trades in currencies other than USD. I'll see if I can correct that.

EDIT: Which feed are you using? The feed

I was just watching http://bitcoin.clarkmoody.com/, when I saw those trades appear on the right. Time-zone is GMT, and it's USD.

Cheers, Paul.
legendary
Activity: 980
Merit: 1008
What time zone is that?

Also, I just realized I'm using the BTCUSD feed. So I'm not counting trades in currencies other than USD. I'll see if I can correct that.

EDIT: Which feed are you using? The feed

https://data.mtgox.com/api/1/BTCUSD/trades?since=1371042292857025

Gives me this for 13:04:53 UTC ("date" = 1371042293):

Code:
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293021786)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293053245)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293085150)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293115210)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293144788)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293173219)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293201979)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293235885)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293264817)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293292672)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293321157)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293349655)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293377220)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293406493)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293434023)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293461087)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293488903)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293516575)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293542567)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293571294)
2013-06-12 13:04:53 110.0257 0.01 (tid: 1371042293597826)
2013-06-12 13:04:53 110.02067 0.0201 (tid: 1371042293652055)
2013-06-12 13:04:53 110 3.3287 (tid: 1371042293687396)
legendary
Activity: 1008
Merit: 1007
Although I just saw this:

Code:
14:04:53 110.00000 3.3287
14:04:53 110.02067 0.0201
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.00000 3.3287
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.00000 3.3287
14:04:53 110.02570 0.0100
14:04:53 110.02067 0.0201
14:04:53 110.02067 0.0201
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100
14:04:53 110.02570 0.0100

Which is 67 trades / sec...
legendary
Activity: 1008
Merit: 1007
I just ran my script again to analyze data from June. Again, it shows no more than 42 trades in a single second: https://raw.github.com/runeksvendsen/mtgoxtps/master/June1-12.png

My opinion is that it's highly unlikely to not see more than 40~ trades execute in a single second in the course of 12 days, if that is not a hard limit. You only have to put in a large buy order at a dollar or more above the current price, for the matching engine to have to traverse several hundred sell orders.

I find it hard to argue with that Smiley
legendary
Activity: 2142
Merit: 1010
Newbie
Modules that require ACID and traditional databases are easily scaled horizontally as needed.

U can't scale horizontally for free, u have to sacrifice something. For an exchange it can't be Consistency, so is it Availability or Partition tolerance?
legendary
Activity: 980
Merit: 1008
One the other hand, it will enable more-than-lackluster performance for your exchange. I imagine the 40 TPS Mt. Gox can sustain is related to them using a database for order storage. This really isn't sufficient for an exchange of their size. And it results in hours of lag when things really heat up.

Actually, I find this theory that MtGox can only sustain 40TPS very speculative. I submit that is not the matching engine which is the bottle-neck at MtGox, but rather the constant DDOS they experience tying up their trading interface front-end.

Cheers, paul.
I haven't heard Mt. Gox claim they experience constant DDoS attacks. They've been quite intermittent. And I haven't heard anything recently.

The fact of the matter is that no more than 40 trades were processed per second during the huge sell-off. I made a script that analyzes trade data and this was the result: https://github.com/runeksvendsen/mtgoxtps/blob/master/final.png

But you're right that this might have been caused by DDoS attacks. Although I find it striking that not one second passed with more than 40 trades being executed. It kind of looks like a hard limit.

Additionally, Mt. Gox went down for maintenance after the huge sell-off, and proclaimed they were implementing improvements to their trading engine (one was disallowing unfunded orders). And it's clear that around April 14, their maximum TPS increased from around 20 to 40 TPS.

I just ran my script again to analyze data from June. Again, it shows no more than 42 trades in a single second: https://raw.github.com/runeksvendsen/mtgoxtps/master/June1-12.png

My opinion is that it's highly unlikely to not see more than 40~ trades execute in a single second in the course of 12 days, if that is not a hard limit. You only have to put in a large buy order at a dollar or more above the current price, for the matching engine to have to traverse several hundred sell orders.

Looking at the order book on bitcoin.clarkmoody.com now, with the lowest ask at 109.55899, it would only take a single buy order of 840 BTC @ $110 to fill 70 sell orders. I feel confident saying that this has definitely happened in the past twelve days, and the fact that no second has passed with more than 42 trades points strongly to an upper limit of that TPS figure in their engine design.

legendary
Activity: 1008
Merit: 1007
One the other hand, it will enable more-than-lackluster performance for your exchange. I imagine the 40 TPS Mt. Gox can sustain is related to them using a database for order storage. This really isn't sufficient for an exchange of their size. And it results in hours of lag when things really heat up.

Actually, I find this theory that MtGox can only sustain 40TPS very speculative. I submit that is not the matching engine which is the bottle-neck at MtGox, but rather the constant DDOS they experience tying up their trading interface front-end.

Cheers, paul.
legendary
Activity: 980
Merit: 1008
You want to follow the pattern used by the LMAX Disruptor which will do what you want to achieve low-latency, redundancy, and backups, and fast fail-over.

I really wouldn't want to do that myself. Moving the problem of database consistency into the program seems like a very dangerous thing to me; there is already enough to go wrong without making life so much more complex by trying to implement LMAX.

Cheers, Paul.
One the other hand, it will enable more-than-lackluster performance for your exchange. I imagine the 40 TPS Mt. Gox can sustain is related to them using a database for order storage. This really isn't sufficient for an exchange of their size. And it results in hours of lag when things really heat up.
legendary
Activity: 1008
Merit: 1007
You want to follow the pattern used by the LMAX Disruptor which will do what you want to achieve low-latency, redundancy, and backups, and fast fail-over.

I really wouldn't want to do that myself. Moving the problem of database consistency into the program seems like a very dangerous thing to me; there is already enough to go wrong without making life so much more complex by trying to implement LMAX.

Cheers, Paul.
hero member
Activity: 770
Merit: 566
fractally
You want to follow the pattern used by the LMAX Disruptor which will do what you want to achieve low-latency, redundancy, and backups, and fast fail-over.
full member
Activity: 254
Merit: 100
@Vladimir
What is expected time for BTCGlobal to start trading BTC? Smiley
sr. member
Activity: 309
Merit: 250
But of course, we would need some sort of database on the other side of this trading engine, to keep track of account balances, which are directly affected by executed trades. Is that the issue?

No.

No. You don't need to include db here as persistance can be asynchronous.

read here http://martinfowler.com/articles/lmax.html

legendary
Activity: 1008
Merit: 1007
But of course, we would need some sort of database on the other side of this trading engine, to keep track of account balances, which are directly affected by executed trades. Is that the issue?

No.

The issue is that there are several separate database entities which all need to be updated simultaneously (i.e. at least):

* orders
* user balances
* transactions

When the matching engine begins a run, it will pull a sorted list from the orders of bids/asks. During this operation the orders must not be updated or you will have inconsistent data to start with. Next the matching engine will start matching best bids and asks together. If the best bid and ask can be matched, an atomic transaction must be begun in which:

* The balances of both users are locked and checked for enough money
* Both bid/ask orders are locked
* The trade is actually performed
* One order is deleted, the other is updated to reflect any partial fill
* Balances of both users are updated to reflect the trade
* An order-transaction is created which exactly describes the trade

If this all goes well and no errors have occurred (which would cause a rollback) this atomic-transaction is committed to the database. As you can see there are a lot of things which must all happen or all not happen in order for the database to remain consistent during just one order match.

Without this, you'll be in a world of hurt as soon as anything goes wrong, since your database will be inconsistent and it will be hard to figure out what went wrong.

Cheers, Paul.
legendary
Activity: 980
Merit: 1008
IMO performance should be a secondary consideration. Your first priority should be consistency, since you are dealing with money; balances, transactions and orders (in the DB) should be either fully processed, or all not processed - even in the case of power failure. This is what ACID gives you, as I'm sure you know.
I might be a bit inexperienced in the field, but this is how I envision a complete order matching engine:

The first part consists of processes on a CPU core that convert various trading API formats to a standard format. Say from a Websocket API, a REST API and a FIX API to some common format that the matching engine can understand. This "standard format"-stream is simply dumped to the disk, one trade after another, in sequential order while the matching engine is simultaneously fed the same stream in the same order.

You would have three (or more for added resilience) matching engines running on three separate cores all being fed the incoming order stream. These matching engines would always be in the same state, since they are the same program and they are being fed the same stream of incoming orders.

Once every x hours you halt one of the processes, and dump the current state to the disk. This is a "recovery snapshot", that enables you to recreate the current state simply by restoring the state from this snapshot, and replaying the orders (which have been dumped to the disk) from the time stamp of the snapshot into the matching engine.

So two cores are always running with the same matching engine and order stream, never interrupted. If one core fails, we switch to the other core. The third core is for creating snapshots at regular intervals, from which the current state can be restored by restoring "state at time x" from the snapshot and replaying orders "later than x" into the running matching engine instance.

But of course, we would need some sort of database on the other side of this trading engine, to keep track of account balances, which are directly affected by executed trades. Is that the issue?
legendary
Activity: 1008
Merit: 1007
IMO performance should be a secondary consideration. Your first priority should be consistency, since you are dealing with money; balances, transactions and orders (in the DB) should be either fully processed, or all not processed - even in the case of power failure. This is what ACID gives you, as I'm sure you know.
newbie
Activity: 11
Merit: 0
Vladimir: how many TPS with ACID persistence db can you handle? Order matching algorithm benchmark tells us nothing about overall time to complete buy/sell order.
legendary
Activity: 980
Merit: 1008
Looks very cool!

You mention it's written in Factor. Is it entirely in Factor, or do you use other languages?

Also, how many lines of code are we talking about?

Last question: the benchmarking you do in the video, is it essentially creating random limit orders and submitting them to the matching engine? If so, have you tried simuluating with a sequence of orders from a real exchange, so it's closer to real life? I mean, generating random orders will put a lot of orders on the book, instead of executing them, but I imagine a real order stream will contain more orders at or around the bid/ask price, thus causing more trades to take place, and possibly slowing it down below 300k TPS, but I don't know much about matching engines, just a hunch.
member
Activity: 98
Merit: 10
Without framing the test, this is not very meaningful. I assume you are running on one local machine, you're not dealing with latency issues, not with protocol issues, etc. For instance if you go through a web-browser how much latency does that add - perhaps 10x? Or perhaps 10^4 x? 300k probably means all of this is in RAM?

Why not instead have a bunch of distributed agents submitting messages and then measure performance with some helpful metrics? Much more interesting than average latency is latency distribution of worst cases.
Pages:
Jump to: