UK server replaced by NL2 server. btcguild.com, uk.btcguild.com and the old eu.btcguild.com are now all pointed at NL2 to get the load off of US Central/NL1.
Why did you shut DNS load balancing off? Was it doing it purely on a connection basis [thus servers that were less capable than others were getting too much traffic]? I think each server could have a service interface available to respond about the percentage of total capacity and that could be used for the DNS load balancing. I have never been fond of DNS load balancing in general since a lot of software will believe there are no connections to a server that has actually gone down [but the machine is responsive] and start directing all traffic at it. In your case though, I am not sure there is much of a better way. Perhaps a dedicated service that does nothing but smartly resolve DNS queries and include worker/account information [to keep all workers for a given user pointed at the same pool server .. a bit of caching no doubt] and it does nothing else, it could be reactive to outages. Maybe then, you make the IP address of all the servers private and if individuals have issues with being directed to a server that is giving them issues, they could have a UI option to "prefer" a specific server. Granted, people can figure out the IP they are attaching too, but that is not much use if you do maintenance and change the address for whatever reason [which I suspect isn't that uncommon based on Deepbit]. That would give you the ability to remove a server from the balancer and then take it offline for maintenance or whatever and then bring it back up. It would also allow the service to detect when a server has gone dark and direct traffic elsewhere [hopefully mining software will resolve the name again rather than use the resolved IP if it gets the infamous "RPC Server Not Responding"]. The only problem I see with such a setup is that your balancer would have to poll the servers with some frequency, probably not a big problem, but removing a server from the pool seems to require you to restart all the servers push pool services [i.e. a drawback of that software I think].
So, a question, with some ideas, and ... whatever you care to take or leave
. I never have worked on a system that is handling so many requests from multiple servers all over the world and trying to centralized the core pool work [like a massive cluster] in one location, although the hub and spoke scenario (server centric) is an old one in client server development and doesn't apply to your case. Plenty of distributed computing for retail and health, but nothing with the number of connections you are taking [although the server resources per connection used with enterprise application servers are typically much larger than getworks ... RPC+LP
].