Author

Topic: Gox Lag at 11 minutes WTF (Read 1285 times)

legendary
Activity: 2097
Merit: 1070
March 07, 2013, 09:59:20 AM
#19
Does anyone know of any MtGox status pages or a place where they place important announcements ?

When the whole system gets into this state I think they should suspend all operations for a period of time before resuming.

MtGox need to make some serious announcements addressing the cause of these issues and tell us what they are going to do to prevent it happening again.

For all we know it will happen again in the next few days when we reach another all time high. That's what I'm expecting right now.
legendary
Activity: 1400
Merit: 1013
March 07, 2013, 09:44:27 AM
#18
Although I'm sure someone could construct some sort of scenario.
I can construct a scenario in which Mt Gox benefits from the high trade volume generated by a panic.

They are operating on a fractional reserve of bitcoins due to losses from an undisclosed hack and need the extra fees to cover their losses.
legendary
Activity: 1176
Merit: 1010
Borsche
March 07, 2013, 09:43:38 AM
#17

Interesting. Although it's hard to understand who this would benefit. Although I'm sure someone could construct some sort of scenario.

I'm sure with the volume we were saying lately, 30% price swing caused by technical inability to process orders can benefit somebody by quite a large pile of cash Smiley
legendary
Activity: 1036
Merit: 1000
March 07, 2013, 09:30:55 AM
#16
If gox shares more details about their setup, I can try to speculate further about the issues.

You may want to bring this to Service Discussion.
legendary
Activity: 1372
Merit: 1008
1davout
March 07, 2013, 09:28:06 AM
#15
Correct me if I'm wrong, but both Mt. Gox and all the API clients in the world are using the same socket.io server.
You're wrong Smiley

Setting up a socket.io stack isn't trivial.

It got easier with the last version of nginx, but as far as I recall it has always required some fiddling around.

Socket.io doesn't support SSL natively AFAIK, so their setup probably uses stunnel to run the WS traffic in an SSL tunnel.

And stunnel is not exactly what I'd call stable.
sr. member
Activity: 378
Merit: 250
March 07, 2013, 08:04:20 AM
#14
Correct me if I'm wrong, but both Mt. Gox and all the API clients in the world are using the same socket.io server.  So the lag experienced yesterday, due to a probably very large amount of API calls, was probably mainly caused by the non-MtGox API calls overloading the server.

A very first step would be to ensure that when the trading server load reach a certain point, the non-MtGox requests get discarded (not delayed, but discarded) past a certain request queue size to prevent server overload and keep priority to MtGox site trading to take place.  And when network congestion occurs, do similar and give higher QOS (Quality Of Service, a priority scheme in networking protocols) ensuring that requests from MtGox site get absolute priority.

With the above, yes, probably the third-party sites (and user applications) relying on the API will get longer lags, and possibly "can't get result, server overloaded" type of errors, but this ensure that a trade entered at MtGox get executed in a timely fashion, and also ensure that the information displayed on MtGox site is accurate (including their mtgoxlive applet).

After all, having a lag of 5 minutes or not having results is pretty much the same.  At least, this ensure, for 3rd party apps, that when a request pass through (does not get discarded), it do receive a response in a timely fashion.

This is a more advanced networking topic, but pretty much basic concept, and a must to ensure stability.  Think about it, when you call somewhere with your phone, it most certainly now somewhere along the way get transformed to IP traffic (same as with regular Internet data).  And that voice data pass on the same wires/fibers as the regular PC to PC data.  You would not want to hear your interlocutor at the other end burping and lagging as soon as you have network congestion.  So that is treated the same way.  Absolute real-time priority for voice data.  The network could be 100% overloaded, lot of regular data lost, even denial of service, but not a single voice data packet will be lost or even delayed.  The network switches ensure that - 1) the voice data have absolute priority, bypassing any data with lower priority, thus eliminating lag, and - 2) the regular data get discarded if network congestion occurs and not the voice data.
legendary
Activity: 1692
Merit: 1018
March 07, 2013, 07:41:17 AM
#13
Isn't it possible that someone is flooding the system with orders/cancels? Iirc this is done in Wall St regularly because it offers some sort of advantage to the guy doint it.

The advantage it gives is to drown competing bots in useless information.  Place and cancel hundreds of orders per second while at the same time strategically placing legitimate orders.  In share market high frequency trading, the apologists for Goldman Sachs, etc say this increases liquidity and is good for everyone.  They claim it means a buyer is always available for a seller.  They miss the entire point that the HFT is designed to solely benefit the operator, and it's a huge cluster fuck when dozens of bots compete with the proven possbility to flash crash markets.

Maybe we're beginning to see this at MtGox.

legendary
Activity: 1204
Merit: 1002
RUM AND CARROTS: A PIRATE LIFE FOR ME
March 07, 2013, 07:32:09 AM
#12
Isn't it possible that someone is flooding the system with orders/cancels? Iirc this is done in Wall St regularly because it offers some sort of advantage to the guy doint it.

If that is the problem, an easy solution would be to put a limit in the amount of orders per minute per account, which I don't think is implemented now.

Interesting. Although it's hard to understand who this would benefit. Although I'm sure someone could construct some sort of scenario.
legendary
Activity: 1372
Merit: 1008
1davout
March 07, 2013, 06:39:28 AM
#11
If gox shares more details about their setup, I can try to speculate further about the issues.
You can speculate as much as you want, even no additional information is disclosed Smiley
newbie
Activity: 43
Merit: 0
March 07, 2013, 06:34:41 AM
#10
Anything I can say at this point would be sheer speculation... Hmm, maybe we're still on topic Wink To the best of my judgment Mt. Gox is facing two main issues, one with streaming API availability, and one within the core brokerage service.

The socket.io streaming API issues are not new, they have existed for a long time, but clearly have gotten worse. I would speculate that gox have a flaky stack handling socket.io. WebSockets being unable to connect, or connecting with no data, means the endpoint is not negotiating the connection properly. This isn't a complicated task, but rather trivial. I suspect that gox has a misconfiguration on that stack, since there is no need for lots of hardware for these types of services (proper evented, IO-loop based, servers can easily thousands of clients per instance), just for proper configuration of the real-time server.

Second, the core trading engine. Lots of things can go wrong on this level. Clients should be assumed to be able to send an arbitrarily large amount of orders (and if not, state the limit clearly to users). Orders are probably kept in-memory somewhere, and traversed by the trading engine to match orders, and execute them. If some malicious user is DoSing the trading engine, gox should have a setup in place to mitigate such malicious actions. Otherwise, they simply have never dealt with the scale of trading going on during the past few days and clearly have a bottleneck, either with the trading engine itself (i.e. matching algorithm) or with the supporting infrastructure.

If gox shares more details about their setup, I can try to speculate further about the issues.
newbie
Activity: 9
Merit: 0
March 07, 2013, 06:18:43 AM
#9
I had an interesting conversation today among the other senior software engineers where I work.  We talked about exactly this topic.

I'm interested in your opinion too.  What exactly would you do to avoid what we saw today.   There is no doubt that the infrastructure at gox is failing-by-design, so I'd love to hear your ideas too.

Lets start with the event broker.  The lag is an obvious clue that its doing more work that it really has to.   What aspects of the design of this sort of system would you say are the easiest to change, the "lowest-hanging-fruit" which is probably at the core of the problem.

Even now that things are settling down, we see lag go from zero (idle) to over 20 seconds every few minutes.  This is a big clue to the issues, I think.

--D
newbie
Activity: 43
Merit: 0
March 07, 2013, 06:01:42 AM
#8
Mt.Gox situation over the past few days is a clusterfuck.

I would offer my experience in building and maintaining large-scale real-time web operations, except I wouldn't know who to talk to at Gox. If anyone from Gox is listening, please get in touch. I want to, and can, help.
legendary
Activity: 1008
Merit: 1000
March 07, 2013, 05:55:44 AM
#7
{"result":"success","return":{"lag":1081055125,"lag_secs":1081.055125,"lag_text":"18 minutes"}}

This issue on it's own has the potential to cause a crash.
https://bitcointalksearch.org/topic/mtgox-api-138679
legendary
Activity: 2097
Merit: 1070
March 07, 2013, 05:48:21 AM
#6
{"result":"success","return":{"lag":1081055125,"lag_secs":1081.055125,"lag_text":"18 minutes"}}

This issue on it's own has the potential to cause a crash.
full member
Activity: 180
Merit: 100
March 07, 2013, 03:14:53 AM
#5
{"result":"success","return":{"lag":1081055125,"lag_secs":1081.055125,"lag_text":"18 minutes"}}
full member
Activity: 180
Merit: 100
March 07, 2013, 03:14:26 AM
#4
{"result":"success","return":{"lag":1052062808,"lag_secs":1052.062808,"lag_text":"17 minutes"}}
legendary
Activity: 1600
Merit: 1014
March 07, 2013, 03:13:23 AM
#3
{"result":"success","return":{"lag":987460168,"lag_secs":987.460168,"lag_text":"16 minutes"}}
legendary
Activity: 1600
Merit: 1014
March 07, 2013, 03:12:53 AM
#2
{"result":"success","return":{"lag":874191746,"lag_secs":874.191746,"lag_text":"14 minutes"}}
legendary
Activity: 2097
Merit: 1070
March 06, 2013, 11:31:58 AM
#1
Just now at https://mtgox.com/api/1/generic/order/lag

It says this : {"result":"success","return":{"lag":685162131,"lag_secs":685.162131,"lag_text":"11 minutes"}}

I found a good way of cancelling an order you don't want to go through is to simply transfer all your BTC out of the account.

This immediately cancels an order with no lag time whatsoever.
Jump to: