I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.
The magic lies in using a
Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in
~750 MB of memory), the required memory is
(-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about
~76 MB.
This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%. For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.
Here's a overview of how the algorithm works:- Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
- Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
- Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
- For each nonce in the search space, check if its hash exists in the "main" bloom filter. If it is, add it's entry to the "clash" bloom filter.
- The "main" bloom is no longer required and can be discarded.
- For each nonce in the search check if its hash exists in the "clash" bloom filter. If it does, add < hash, nonce > to a candidate list for investigation.
- Sort the list of candidates by hash.
- For each pair in the candidate list, see if the previous element has the same hash. If it does, add it to the output list. This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
- Return the output list as normal.
For your testing pleasure, I also built a working proof of concept.(Most of the magic
is right here. The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")
Unmodified source:
https://i.imgur.com/k2cNrmd.pngSource using bloom filters:
https://i.imgur.com/w8Enf9e.pngIn exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters. The overall result is a
net efficiency gain of approximately 2.5The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances. If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.
Given that
bloom filters, SHA512, and
sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.