Bitcoind (with pushpoold) has been patched to allow 1,000GH/s (on a nice server) and solve the too many open connections problem by Jeff Garzik -
http://forum.bitcoin.org/index.php?topic=22585.0 . It has other nice added features as well. You say you released this mostly due to scalability. What is your theoretical limit with your setup?
By the time the patched bitcoind was announced I was already well into the development process... However I still think there's some significant scalability benefits. I've just a run a test to try and get some metrics. The difficult part is eliminating the bottleneck (I don't have a patched bitcoind atm).
Here's the test setup:
Ubuntu 10.4 64 bit running inside vmware on winXP 64 host - Dual core 3Ghz - 8gb ram (3 gb available to ubuntu).
2 * bitcoind, one running on host and one on ubuntu vm.
mysql running in ubuntu vm
To avoid the bottleneck I set the cache size to 20000 getworks per source and waited until the caches were 1/2 full so all works were served from memory. But the server was still pulling work from the bitcoind's in the background while the tests were running.
Set the difficult to below easy, average 2 hashes required to solve a block.
Client test also run inside ubuntu vm.
50 concurrent clients continuously issuing getworks.
Each second 10 of those solve a block (using CPU only) and submit.
Clients shared 10 different sets of worker credentials.
Result:
Avg rate of work delivery was about 1200/sec until the cache ran out then it dropped dramatically due to not being able to get work from daemon fast enough.
about 200 works were submitted and flushed to database.
With 30 block solve attempts/sec:
Avg work delivery 1000/sec
580 works flushed to database
At this point I remembered starcraft 2 was running in the background and killed it...
With 67 block solve attempts per second
Avg work delivery 1200/sec
1000 works flushed to database
To increase the number of block solves/sec anymore I'd need to rewrite some of the test code to synchronize it, due to the way each iteration decides whether to solve a block it becomes unreliable and have no real way of seeing how many block solves it's attempting per sec.
There's some obvious weakness in the test
Unrealistic advantages:
all network communication was local
Unrealistic disadvantages:
Everything (including clients and block hashing) running on same machine.
Hopefully someone who's got more real world experience with pushpool can give those numbers some context.