Pages:
Author

Topic: [ATTN: POOL OPERATORS] PoolServerJ - scalable java mining pool backend - page 11. (Read 31109 times)

full member
Activity: 207
Merit: 100
I think I will try testing this soon to see if it will work for me.  My pool is running up against the 450-500 GH/s pushpool barrier.

I would also be very interested to hear what you are doing about it eleuthria, if you have a single pushpool instance scaling up to 1,100 GH/s per server, or, it sounds like you might have 3 instances running on the same server, I guess with routing/port stuff that you have to set up to do that?

sr. member
Activity: 266
Merit: 254
0.2.3 released now with source code available. 

This is the first release I'd consider usable in a production environment.  Memory leak issues have been resolved and some hefty speed increases have been gained on both the getwork side and the share submit side.


[0.2.3]

- fix: Memory usage issues for work-worker map and duplicate map
- switched duplicate maps and work-worker maps to trove THashMap implementation and added customised hashing strategy which has yielded about a 50% boost to getwork speed
- share database flushing now paused delayed for 4 seconds after block change to free up resources for fullfilling longpoll requests
- implemented several bulkloader engines for share db writes which has boosted db write speed by 5-20 times depending on which engine is used.
- partial implementation of mining extensions (reject-reason complete, rollntime and noncerange still work in progress).
- implemented rudimentary plugin interface to allow customized engine for Authentication, worker fetching, share logging (to database only, file logging is still hardcoded)
- major refactoring and reorganization of packages in preparation for source release.

Known issues
 - Under heavy load shutdown may fail to trigger on the first try.  Sending a 2nd shutdown signal
 seems to reliably complete shutdown.
 - All new bulkloaders are not considered stable.  Use at your own risk.
sr. member
Activity: 266
Merit: 254
Well I just needed a shove I guess.  I was going to wait a few weeks while I finished some other thing but since there's demand for it I'm refactoring the code to split out the parts that I'm not ready to release yet (which you won't need anyway).  So I should be able to publish the source very soon.  I'd really like to see the results of other people's tests so I'm prioritising this.

As part of the refactoring I'm splitting out some parts to engine classes that can be easily overridden for the things that are most likely to need customisation.  Namely: Worker db lookups, authentication, share database writes.  That should hopefully minimize the need to fork the code so you won't get clobbered by every update.

I'll do this all as part of release 0.2.3 which is a pretty major improvement.

- fixed memory hog issues so memory usage has stabilised.
- switched duplicate maps and work-worker maps to trove THashMap implementation and added customised hashing strategy which has yielded about a 50% boost to getwork speed
- implemented several bulkloader engines for share db writes which has boosted db write speed by 5-20 times depending on which engine is used.
- partial implementation of mining extensions (reject-reason complete, rollntime and noncerange still work in progress).
full member
Activity: 175
Merit: 102
+1.  My DB is heavily customized for much the same purposes.  The addition of bits of information can reduce future lookups by millions.
Adding one index or changing one line of code on our webpages can cause our mysql server to go from 15 million rows scanned per minute to 5 million.  It's nuts!

Cheesy The joys of database administration and design Smiley
hero member
Activity: 630
Merit: 500
+1.  My DB is heavily customized for much the same purposes.  The addition of bits of information can reduce future lookups by millions.
Adding one index or changing one line of code on our webpages can cause our mysql server to go from 15 million rows scanned per minute to 5 million.  It's nuts!
full member
Activity: 175
Merit: 102
Right now BTC Guild is forced to do some very awkward things to get pushpoold to scale to 1 TH/s per server.  I'm very interested in this project, but until the source is out I won't be able to give it any real-world testing since my DB Schema is very customized for faster reads/multi-server support.

+1.  My DB is heavily customized for much the same purposes.  The addition of bits of information can reduce future lookups by millions.
legendary
Activity: 1750
Merit: 1007
Right now BTC Guild is forced to do some very awkward things to get pushpoold to scale to 1 TH/s per server.  I'm very interested in this project, but until the source is out I won't be able to give it any real-world testing since my DB Schema is very customized for faster reads/multi-server support.
sr. member
Activity: 266
Merit: 254
Oh wow, that's impressive.  You might want to contact the big pool owners to see if they are interested.  Currently the new cap on pushpoold seems to be 1,000GH/s on a fully tweaked setup.

So the bottleneck seem to be submissions rather that getworks?  Can you point me to where this is discussed?  Particularly the 1000gh/s limit?
hero member
Activity: 630
Merit: 500
Oh wow, that's impressive.  You might want to contact the big pool owners to see if they are interested.  Currently the new cap on pushpoold seems to be 1,000GH/s on a fully tweaked setup.
sr. member
Activity: 266
Merit: 254
It has other nice added features as well.  You say you released this mostly due to scalability.  What is your theoretical limit with your setup?

I can answer your question properly now.  On the test setup I used it achieved a max long term avg of 2154GH/s.  

Bear in mind that the clients were running on the same machine and performing CPU hashes (approx 2 per submit).  The test was pretty clearly CPU bound from all the observations I made so I'm quite sure if the client wasn't running on the same machine it could have been close to double that.  The server performs 1 hash/submit.  So with additional overhead it's probably a reasonable guess that they were using about 50% of the cpu each (which ps indicated as well).  The next hurdle is database write speed.  Which seemed to on the borderline of becoming the next bottleneck.  I'm going to test concurrent writes to see if that yields an improvement.
sr. member
Activity: 266
Merit: 254
Due to the issues noted in the changelog I’ve decided it was probably premature to call PoolServerJ a beta release.  Until those issues are resolved it is reverted to alpha status.

[0.2.2]

- fix: safe restart throwing concurrent modification exceptions due to requests still being serviced during shutdown
- added property enableSafeRestart to allow disabling
- trimmed unnecessary data out of work-worker map entries, reduced overall memory usage by 80%
- enable sharesToStdout and requestsToStdout options
- added useEasiestDifficulty property to enabled stress testing of work submits

Known issues
- Under extreme load database flush thread can become overwhelmed particularly if the server
is started under high load conditions and doesn’t have time for JIT compiler to kick in.
This is currently managed by throttling work submit connections if cache entries exceeds
maxCacheSize.  Need to implement concurrent database writes.
- Block change notification currently doesn’t take into account the state of the database
cache.  This should wait until all shares from the previous block have been flushed to
DB before notifying of block change.
- Cache flushing should be temporarily suspended after a block change to allow resources
to focusing on completing longpoll requests.
- Memory usage.  work-worker map entries size has been reduced by 80% but it still presents
a theoretical limit.  With 768mb max heap size the server can handle about 1.2 million
cached works requests before OOM error occurs.  At about 1 million performance seriously
degrades due to frequent garbage collection.  The map contains shares issue in current
block and previous block.  So this will only be an issue for very busy servers.  Current
workaround is to assign more memory to JVM.  Fix will require a revisit of the work map
strategy to either trim the map entries even more dramatically or find a safe method of
pruning the map.
sr. member
Activity: 266
Merit: 254
I'm a little bit confused as to the new bottleneck you got to.  Are you saying bitcoind is your bottleneck?  If so, that's why Joel Katz hacked on it.  A month ago the large pool operators were screaming at pushpoold as the bottleneck, including Eleuthria with his awesome experiments and communication with his members.  Then some people started to discover it was actually bitcoind, hence the patches. 

Yes it was bitcoind that was the bottleneck, I built the patched version yesterday and it was a vast improvement. 

Quote
With all due respect, I hope you aren't trying to scale the wrong side of the puzzle.  It may be faster than pushpoold, but all for naught if bitcoind can't feed the monster.

I don't know if it's faster than pushpoold.  It quite different in a lot ways.  It just handles some things differently, the key difference being it can use multiple bitcoinds if they are proving to be a bottleneck.  So under some scenarios it will probably be faster and under others it might be slower.
sr. member
Activity: 266
Merit: 254
2 more features off the top of my head:

* Google app engine compatibility
* Hash offloading to a GPU (I have no idea if that is a major bottleneck of your program though) - I recently read something about doing SSL calculations on a GPU, getwork verifications etc. should also be an easy task.

I looked at google app engine but it has a couple of gotcha's. 

- Time per request: max 30 sec

Which effectively rules out longpolling.

- Memory cap 128MB.

Depends on how hard you are running it but if you run a large pool you may find this restrictive.  A smallish pool will comfortably run in about 50mb but once you start scaling you are consuming memory two ways, first is caching work.  Second is mapping work delivered to workers for current block and last block.  I can't actually remember why I did this, it was partly to ensure no duplicate work but there was one other reason. 

Another option for scalable hosting would be Amazon elastic beanstalk.  It would require some refactoring to strip out the embedded server and turn it into a fully compliant webapp, I think it's backend is Jetty (have to double check), if it that's good because poolserverj uses some of jettys advanced features (which are outside the servlet spec) to avoid thread saturation and handle long polling.  If there's a reasonable demand then I'll look into doing it.

Interesting idea about the GPU calc but I wonder if in this scenario it would really yield much benefit?  Under a mining scenario the overhead of translating the text to a byte array, sending to GPU and starting calculation happens once then the gpu does many billions of calcs.  In this scenario that overhead would be incurred and the GPU would do a single calculation.  The text-byte array conversion happens anyway I suppose but I still wonder whether the net gain would be comparable to overheard?  I'm not a GPU programmer so it's hard to say.
legendary
Activity: 2618
Merit: 1006
2 more features off the top of my head:

* Google app engine compatibility
* Hash offloading to a GPU (I have no idea if that is a major bottleneck of your program though) - I recently read something about doing SSL calculations on a GPU, getwork verifications etc. should also be an easy task.
sr. member
Activity: 266
Merit: 254
[0.2.1]

- fix: maxCacheSize was being ignored and hardcoded to 50
- fix: thread pool size for work fetchers not being set correctly.
- enabled allowedManagementAddresses property
- added source.local..disabled=true option so you don’t have to comment out every line of a source to disable it.
hero member
Activity: 630
Merit: 500
I'm sorry, Joel Katz did the patching, not Jeff Garzik, oops!

Unrealistic advantages:
all network communication was local
I'm glad you mentioned the above.  While latency and bandwidth is usually not the bottleneck of a pool server, virtual networking is almost unlimited compared to a typical WAN link.  Not to downplay your development and testing or anything!  Very good info here!

I'm a little bit confused as to the new bottleneck you got to.  Are you saying bitcoind is your bottleneck?  If so, that's why Joel Katz hacked on it.  A month ago the large pool operators were screaming at pushpoold as the bottleneck, including Eleuthria with his awesome experiments and communication with his members.  Then some people started to discover it was actually bitcoind, hence the patches.  With all due respect, I hope you aren't trying to scale the wrong side of the puzzle.  It may be faster than pushpoold, but all for naught if bitcoind can't feed the monster.
sr. member
Activity: 266
Merit: 254
Bitcoind (with pushpoold) has been patched to allow 1,000GH/s (on a nice server) and solve the too many open connections problem by Jeff Garzik - http://forum.bitcoin.org/index.php?topic=22585.0 .  It has other nice added features as well.  You say you released this mostly due to scalability.  What is your theoretical limit with your setup?

By the time the patched bitcoind was announced I was already well into the development process... However I still think there's some significant scalability benefits.  I've just a run a test to try and get some metrics.  The difficult part is eliminating the bottleneck (I don't have a patched bitcoind atm).

Here's the test setup:

Ubuntu 10.4 64 bit running inside vmware on winXP 64 host - Dual core 3Ghz - 8gb ram (3 gb available to ubuntu).
2 * bitcoind, one running on host and one on ubuntu vm.
mysql running in ubuntu vm

To avoid the bottleneck I set the cache size to 20000 getworks per source and waited until the caches were 1/2 full so all works were served from memory.  But the server was still pulling work from the bitcoind's in the background while the tests were running.

Set the difficult to below easy, average 2 hashes required to solve a block.

Client test also run inside ubuntu vm.  
50 concurrent clients continuously issuing getworks.  
Each second 10 of those solve a block (using CPU only) and submit.
Clients shared 10 different sets of worker credentials.

Result:

Avg rate of work delivery was about 1200/sec until the cache ran out then it dropped dramatically due to not being able to get work from daemon fast enough.
about 200 works were submitted and flushed to database.

With 30 block solve attempts/sec:
Avg work delivery 1000/sec
580 works flushed to database

At this point I remembered starcraft 2 was running in the background and killed it...

With 67 block solve attempts per second
Avg work delivery 1200/sec
1000 works flushed to database

To increase the number of block solves/sec anymore I'd need to rewrite some of the test code to synchronize it, due to the way each iteration decides whether to solve a block it becomes unreliable and have no real way of seeing how many block solves it's attempting per sec.

There's some obvious weakness in the test

Unrealistic advantages:
all network communication was local

Unrealistic disadvantages:
Everything (including clients and block hashing) running on same machine.

Hopefully someone who's got more real world experience with pushpool can give those numbers some context.
hero member
Activity: 630
Merit: 500
Bitcoind (with pushpoold) has been patched to allow 1,000GH/s (on a nice server) and solve the too many open connections problem by Jeff Garzik - http://forum.bitcoin.org/index.php?topic=22585.0 .  It has other nice added features as well.  You say you released this mostly due to scalability.  What is your theoretical limit with your setup?
sr. member
Activity: 266
Merit: 254
Shares evaluation (stale/valid/"winning") done by backend, not bitcoind and only shares that are >=current_difficutly actually get relayed to bitcoind[/li][/list]

BTW I'm pretty sure pushpool does that.  I probably got the idea when I was peering at pushpool source (and getting a headache because I've never learned C).
Pages:
Jump to: