Hello. I see you are using pident for choosing where to hop. I feel obliged to remind you that accuracy is fairly low (
http://pident.artefact2.com/accuracy ).
But hey, you said you tweaked it so it gathers more data… If you managed to improve the accuracy, could you care to share the source so that I can merge it? I know the WTFPL doesn't require you to do that, but it would be nice. If your public domain miner relies on a closed-source web API, then it's of no use
The only reason it isn't released anywhere is because it's not up to my own standards for sharing (it has a bit of site-specific code in it, and isn't very well organized, because I changed how I implemented it halfway through). If I have a chance, I'll try to refactor it a bit and fork the pident repository on github. However, even if I do, it probably wouldn't be too mergeable. I tried to work entirely on top of pident, rather than modifying it, for two reasons:
- First, this made it easier to merge any changes you made.
- Second, I gave up trying to figure out exactly where in the code the scoring was happened. I think I figured it out since then, but as you'll see below, I'm not sure the work I've done is directly applicable to improving the scores themselves.
Regardless, I can at the very least attempt to describe my current thought process and algorithm.
The first instinct is to just take whichever pool has the highest score and assume they found the block. As you've said, however, the accuracy of that method isn't incredibly high. For the purposes of pool hopping, we don't want to guess which pool found the block. Rather, we want to take all the available information, and assign probabilities to each pool. Using these probabilities, we can calculate the utility of mining at each pool.
At first, I just used each pool's percentage of the overall network, which is accurate, but I figured it should be possible to do better with additional information, namely pident's scoring. The trick is figuring out how to massage the scores pident produces into probabilities. To that end, I sampled an equal number of blocks from each pool's set, producing a block universe where each pool is equally likely to find a block. pident's scoring works based on identifying unique characteristics of each pool's blocks. It (to my knowledge) has no idea that some pools are simply more likely to find blocks, so modeling on this sampled block universe makes sense to me.
Treating the scores, then, as relative confidences that each pool found the given block, we rescale them so they sum to 1, so they at least have a chance of representing probabilities. Once I did this, and plotted the likelihood that each range of scores meant a pool did find the block, there was a definite pattern. (In the particular data set I looked at, a rescaled score of 0.06-0.07 implied a ~13.5% chance that the pool found that block.) By the time we reach scores of 0.1, the probability is almost 100%. Ultimately, I modeled the data using the atan function (which matches the curve). (Does this make theoretical sense? Who knows, but as long as it works practically, I'll go with it.) The idea here is to scale the scores so that a score of 0.5 implies a 50% probability. I eyeballed the function, and it's not very forum friendly, but it matches the data fairly closely.
The end result of all this massaging of the original scores produced by pident is a rough probability that the block belongs to each pool, but only in the magical "every pool is equally likely" universe. To get back to the real world, we multiply each score by the pool's percentage of the overall hash rate. The last step is rescale these final values so they again sum to 1. That's the final probability I use in calculating the data in the API.
The next step is to test these numbers and make sure they at least come close to matching the ultimate results. I'll probably wait a few days so I have some out-of-sample data before I embark on that quest, however.
Other next steps:
- Getting the code postable. I have no desire to keep it that close to my vest.
- Deciding on the best way to implement the hopping strategy implied by the API. poclbm is kind of a pain. I contemplated modifying cgminer, but that's quite a chore. If recent improvements to bitHopper have made it more usable (e.g. unbroken LP support, and not spamming pools with high-frequency getwork requests), it might be best to implement it there. I'd have to compare its average reject percentage with poclbm.