Pages:
Author

Topic: [ATTN: POOL OPERATORS] PoolServerJ - scalable java mining pool backend - page 9. (Read 31109 times)

legendary
Activity: 1750
Merit: 1007
Giving PSJ a bit of a bump here.

BTC Guild's new PPS pool has elected to use PoolServerJ during its beta run.  So far the performance has been outstanding (looking at stale/reject rates of < 0.2%).  It eats a decent chunk of RAM at boot, but the increase in RAM usage is very small as the pool scales.

Excellent work shadders, and hopefully soon we can post that a PSJ pool has found its first block Smiley.
full member
Activity: 142
Merit: 100
The PoolServerJ works great! Keep development going on, i see really huge potential! Definitely worth some donations!
sr. member
Activity: 266
Merit: 254
It appears the last critical update introduced a new bug which if triggered causes a memory leak due the cache flushing thread crashing.  If you're using 0.2.8 upgrade is urgently recommended.

[0.2.9]
- fix: FastEqualsSolution not serializable causing exception dumping workmap during safe restart
- fix: nullpointer exception crashing cache cleaner thread leading to eventual OOM error.
- added generic try catch to all threads to catch unknown exceptions and prevent them stopping. - need to add 'shutdownOnCriticalError' option then these errors can be handled by shutting down the server and a wrapper script can restart.
legendary
Activity: 2618
Merit: 1006
On the other hand you could also try to use jython and run phoenix too! Wink

...or include a fixed "test" getwork with known solution(s)?

It might however be interesting to have something like a "benchmark"/test of the whole system that consists of several miners (where you know the solutions already beforehand), getwork sources (emulated bitcoinds that always return the same sequence of getworks as well as emulated pools for the proxy mode) where you can then also do regression tests for the performance of the whole software.
newbie
Activity: 31
Merit: 0
Still having great results with poolj
member
Activity: 98
Merit: 10
Thanks for fixing the bugs really fast! I appreciate it. And get well soon. Smiley

I also thought about creating a test suite to check for these kind of problems (could be useful for other custom pools, like AFAIK bitp.it). I have been digging through the Diablominer source some time ago, and I guess it's possible to use it (if the license allows to do so). It will probably get "a little" cluttered..
sr. member
Activity: 266
Merit: 254

- There seems to be a (or even two more) different bug(s) related to counting invalid shares as valids.

- Are "duplicate" rejects not getting logged on purpose? Smiley

Hi Gentakin,

The 0.2.8 release should address all these issue.  I've moved all the duplicate handling inside the same synchronized block so this should alleviate the race condition.  There was a reason for duplicate not being logged... Developer forgot... Fixed now though.  I've rewritten the whole duplicate check portion.  I found a couple of other possible problems in addition to the 3 you found so thanks for bringing my focus to this part of the code, it was very much needed.  It was originally added as a bit of an afterthought and not really tested thoroughly.

The tests you've been doing would make great unit tests... If only I could work out how to integrate phoenix into a unit test.  DiabloMiner might be an option for integration as a test platform since it's java based.
sr. member
Activity: 266
Merit: 254
This release contains some fix's and updates essential for anyone running a live pool.  Please read the changelog for details.

[0.2.8]
- implement 'include' in properties to allow seperation of config blocks into different files for easy changeover
- add check for duplicate solution on submit
- changed 'stale-work' to 'stale' for pushpool compatibility
- change property name 'useEasiestDifficulty' to 'useRidiculouslyEasyTargetForTesingButDONTIfThisIsARealPool' to make is clear this isn't the same as pushpools 'rpc.target.rewrite'
- crude support for share counter table updates rather than full submits.
- fix: cache size set to 1/2 maxCacheSize due to partially implement dynamic cache sizing
- fix: shares accepted below difficulty due to endian issues setting difficulty target. (thanks luke-jr for the help)
- fix: duplicate work checks not working properly due to race conditions.  Moved atomic duplicate check/update operations into synchronized block.
- fix: duplicate work not being logged
- updated json-rpc and utils libs
- update sample properties file to reflect recent changes.
sr. member
Activity: 266
Merit: 254
spam away.. It's very helpful.  The faster we can find all the bugs the sooner it's ready for production use.  I'll have a detailed look at all this now.  I would have last night but got hit by the flu from hell and had to go to bed...
member
Activity: 98
Merit: 10
That's great! And thanks, Luke.



Edit to prevent double post:
There seems to be a (or even two more) different bug(s) related to counting invalid shares as valids. This time, I changed Phoenix to submit every share 10 times with a delay of 1 sec between sends (I'm not sure if the delay works, it might send them instantly..). The first share was counted as "valid" two times, and "invalid" two times as well, so a total of 4 shares were recognized. I'm not sure if phoenix stopped sending the share after the 4th time, or otherwise 6 shares were sent but not registered as valid/invalid.

I assume phoenix then found a second nonce in the same getwork, this time producing 10 valid shares. That bug is probably in WorkProxy.java, line ~186.
Code:
if (entry.solutions == null) {
entry.solutions = new ArrayList(10);
entry.solutions.add(data);
} else if (entry.solutions.contains(data)) {
return buildGetWorkSubmitResponse((Integer) request.getId(), false, "duplicate");
}
When the second share of the same getwork is received, "entry.solutions != null" and "entry.solutions.contains(data) == false", so the second share is accepted (this is correct behaviour). However, unless this happens elsewhere in the code, the second nonce is *not* added to entry.solutions, so it can be sent infinitely often and will award a valid share every time.

Moving "entry.solutions.add(data);" below that if/else if should fix this. There's still the problem with the first nonce in a getwork accepted multiple times, but not infinitely often.


Update:
python does not respect my time.sleep(1), so  all the nonces are sent without delay. The problem with identical shares being accepted multiple times could be a race condition. This is from the phoenix log:
Quote
[19/08/2011 18:19:31] Result: e36b1881 accepted        
[19/08/2011 18:19:31] Result: e36b1881 accepted        // and indeed, this share was logged twice by PoolServerJ as "valid"
[19/08/2011 18:19:31] Result: e36b1881 rejected        
[19/08/2011 18:19:31] Result: e36b1881 rejected        
[19/08/2011 18:19:31] Result: e36b1881 rejected        
[.. 5 more rejects]
The rejects are not logged in PoolServerJ, so I guess they fail because of connection issues (sending 10 shares at the same moment might cause problems in phoenix).

Yet another update:
My suggestion to move "entry.solutions.add(data);" below the if/else if is wrong, it should probably go pretty far down the validation method, maybe after checking the hash. That should fix the double-accepts.
Are "duplicate" rejects not getting logged on purpose? Smiley (Omg, I'm spamming this thread, sorry!)
sr. member
Activity: 266
Merit: 254
Edit:
Are you sure Res.getEasyDifficultyTargetAsInteger()=115792089237316195423570985008687907853269984665640564039457584007908834672640 is correct?
I thought it would be 0xffff0000000000000000000000000000000000000000000000000000 for a difficulty-1-share, or 26959535291011309493156476344723991336010898738574164086137773096960 according to wolfram alpha.

I worked it out (thanks to luke-jr for the handholding).. I had the endianess around the wrong way when setting the easy target string and converting to a BigInteger.  I've made a fix now and will create a new release shortly.
sr. member
Activity: 266
Merit: 254
hmmm... if useEasiestDifficulty != true then

easyDifficultyTargetString = "ffffffffffffffffffffffffffffffffffffffffffffffffffffffff00000000"
easyDifficultyTargetAsInteger = new BigInteger(easyDifficultyTargetString, 16);

Which is basically just parsing the hex string...

There's no other reference to either variable through any codepath.... I'll have a more detailed look in the morning but I' think that string is wrong?  Not sure where I got that from...
member
Activity: 98
Merit: 10
I'm a little confused as well, because I see the hashing code in WorkProxy.java. But since I'm not so experienced with bitcoin hash targets, BigInter.compareTo and your .getEasyDifficultyTargetAsInteger(), I'm not sure what's wrong!

This is from my properties file, and it's the only line that sets useEasiestDifficulty: (And it is actually a hg clone from yesterday, so I think it was set to false all the time):
Quote
useEasiestDifficulty=false

Maybe some more details on what I do/get:
 * Start PoolServerJ with empty shares table
 * Start cheating Phoenix (see below)
 * Wait until Phoenix reports "[18/08/2011 15:15:21] Result: b464f013 accepted" (this will appear only once, but phoenix will send 20 getwork-answers)
 * See PoolServerJ log something like this 20 times: "Forced work submission upstream: 00000001a55604a05d5f30c4d1784f38bd1e1389f3[..]" with slightly changed nonce each time
 * Now PoolServerJ logs 20x "work submit success, result: false"
 * A few seconds later, PoolServerJ flushes the shares to MySQL and indeed, 20 valid shares are now in the table

(I'm not sure if this indicates there's another problem somewhere, but a lot of "RETRY" is sometimes mixed into the debug log output)


I'm using a pretty old SVN checkout of phoenix (I prefer poclbm, but I had a phoenix start script for localhost sitting around, so I used it), it's r101 I believe. However the newest git should be fine as well:
https://github.com/jedi95/Phoenix-Miner/
In KernelInterface.py, change foundNonce.

Original code from git:
Code:
       if self.checkTarget(hash, nr.unit.target):
            formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
            d = self.miner.connection.sendResult(formattedResult)

Cheating code (as always with python, tabs/indents are important, not sure if this was copied correctly):
Code:
       if self.checkTarget(hash, nr.unit.target):
            for bad in range(nonce, nonce+20):
           formattedResult = pack('<76sI', nr.unit.data[:76], bad)
           d = self.miner.connection.sendResult(formattedResult)


Edit:
Are you sure Res.getEasyDifficultyTargetAsInteger()=115792089237316195423570985008687907853269984665640564039457584007908834672640 is correct?
I thought it would be 0xffff0000000000000000000000000000000000000000000000000000 for a difficulty-1-share, or 26959535291011309493156476344723991336010898738574164086137773096960 according to wolfram alpha.
sr. member
Activity: 266
Merit: 254
So when phoenix finds a share, it takes the nonce and submits it, but then increases the nonce by 1 and submits that "share" as well, and then continues until it has reached nonce+20.

I actually find 20 shares in my table when only one should be valid, and they all have "our_result == 1".
Maybe the problem is with Res.getEasyDifficultyTargetAsInteger()?

Thanks for the report.  I'm a bit befuddled to be honest since the validation code hashes the solution before checking.  Any chance you could send me the modded phoenix file?  And let me know what version?  Or if you can tell me the version jand which file it's in if the code you quoted is verbatim.

You didn't happen to have useEasiestDifficulty=true in your properties file did you?  If so could you set it false and test again? useEasiestDifficulty sets difficult so only 2 hashes are required on avg to get a solution.  It's intended for load testing only and should have been set false in the sample properties file but as with the license I forgot to update Wink

I think for next version I might change the property to useRidiculouslyEasyTargetButDontIfThisIsARealPool so it's more obvious that's it's not the same thing as rewritedifficulty in pushpool.
member
Activity: 98
Merit: 10
Yes, I've seen those libraries. It's very nice of you to open source PoolServerJ! It seems to be a nice piece of software. No need to send me a permission, I simply asked because I was wondering what "no derivative works" means (and why the source code is public then). Now it's all clear, the license has not been updated and it used to be closed source. Smiley

I've tested PoolServerJ and so far I'm impressed. The possibility to write your own sharelogger and simply plug it into PoolServerJ with a config file directive is nice. I implemented one that logs invalid shares to their own table (BTW: if "entry.reason == null && entry.ourResult" evaluates to true, is it safe to assume that this share is valid and should be counted for rewards?).

However I'd like to report a bug, or at least I think it is a bug.
I changed phoenix to do this:
Code:
        if self.checkTarget(hash, nr.unit.target):
            for bad in range(nonce, nonce+20):
            formattedResult = pack('<76sI', nr.unit.data[:76], bad)
            d = self.miner.connection.sendResult(formattedResult)
So when phoenix finds a share, it takes the nonce and submits it, but then increases the nonce by 1 and submits that "share" as well, and then continues until it has reached nonce+20.

I actually find 20 shares in my table when only one should be valid, and they all have "our_result == 1".
Maybe the problem is with Res.getEasyDifficultyTargetAsInteger()?
sr. member
Activity: 266
Merit: 254
This is interesting. I'm wondering, since your license says "no derivative works", what exactly am I allowed to do?

If I wanted to start my own pool and needed to change some parts of PoolServerJ (and let's assume I can't do this with a plugin, so I actually need to change YOUR code), this seems to be a "derivative work". Can I do this? I don't think so. Or maybe it's just forbidden to share my derivate work with others?

Good catch, that's an oversight on my part... as of v0.2.3 the code was published to https://bitbucket.org/shadders/bitcoin-poolserverj

As far as I'm concerned what's published is OSS.  I haven't updated the licenses yet, will do so in the next release which should be in the next couple of days.  If you want to be covered in the meantime pm me an email address and I'll send you a 'permission to create derivatives' email.  I would just write it here but I don't think a forum post carrys much legal standing.

The only parts that aren't fully OSS with published source are a few utility libraries of my own that I use for numerous unrelated projects.  The reason I haven't published those is because I don't want the hassle of maintaining a public repo for them all.  You are welcome to decompile and inspect code, if you really want the source contact me and I'll send you a copy of the current snapshots.  The one's I'm referring to are all the libs in the /lib/lib_non-maven folder with the exception of the trove library (which is OSS and not written by me).
member
Activity: 98
Merit: 10
This is interesting. I'm wondering, since your license says "no derivative works", what exactly am I allowed to do?

If I wanted to start my own pool and needed to change some parts of PoolServerJ (and let's assume I can't do this with a plugin, so I actually need to change YOUR code), this seems to be a "derivative work". Can I do this? I don't think so. Or maybe it's just forbidden to share my derivate work with others?
newbie
Activity: 31
Merit: 0
I've been having exceptional results while testing this. Very low stales compared to pushpoold, like ~0.1% or less average. Very stable compared to pushpoold. Fast restarts and it picks up the shares it had in the air when you do have to restart. And it runs fast and doesn't eat endless memory, the GC process seems work great. Really fantastic so far.

Wanted to clarify - The only reason I've restarted poolj so far was because I wanted to, not had to.
sr. member
Activity: 266
Merit: 254
0.2.7 released

[0.2.7]

- fix: accidentally hardcoding our_result = false
- enabled reporting winning share to blockChainTracker which enables notifyBlockChangeMethod to report if the block was won or not.
- fix: UniqueDataPortiton.equals not comparing properly.  preventing hashmap lookups making valid work submits report as unknown.
- added method “flushWorkers” to mgmt interface which works just like “flushWorker” except it take a “names” param with a comma delimited list
sr. member
Activity: 266
Merit: 254
Thanks heaps to the people that have been testing it... We've uncovered a few major bugs that have been rectified and will be in next release 0.2.7

If you're using 0.2.6 or below then please be aware one of those bugs prevents valid shares from being accepted.

I have noticed during all the debug/troubleshooting processes I've gone though with various people the size of the config file is pretty cumbersome.... What's peoples thoughts on making it optional to split it... e.g. use a master properties file with an 'include' that allows you seperate for e.g. database, sources into different files.  That way you could have a main config, then plug in a different database config for different scenarios...  What do people think?  Is it worthwhile?
Its not really difficult to do timewise... But the benefit would be mainly for debug/test scenarios, allowing you switch over to test setups but with more confidence that you other configuration is consistent.
Pages:
Jump to: