Momentum Proof-of-Work Bounty - 30 BTC [PAID]

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: p2p on January 29, 2016, 07:36:33 AM

What's the status of Momentum POW? Was it used in Bitshares? Is it used in any coin?

It is broken beyond repair according to

https://eprint.iacr.org/2015/946.pdf

Their Proposition 9 states a sublinear time-memory tradeoff.
For instance, collisions can be found in memory O(sqrt(N)) and time O(N^(5/4)),
instead of memory and time O(N).

This Proposition is based on the golden collision search technique, described in Section 4.2 of

http://people.scs.carleton.ca/~paulv/papers/JoC97.pdf

p2p

newbie

Activity: 32

Merit: 0

What's the status of Momentum POW? Was it used in Bitshares? Is it used in any coin?

BootstrapCoinDev

full member

Activity: 154

Merit: 100

Quote from: gigawatt on November 07, 2013, 06:56:28 PM

I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.

The magic lies in using a Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in ~750 MB of memory), the required memory is (-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about ~76 MB.

This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%. For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.

Here's a overview of how the algorithm works:

Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
For each nonce in the search space, check if its hash exists in the "main" bloom filter. If it is, add it's entry to the "clash" bloom filter.
The "main" bloom is no longer required and can be discarded.
For each nonce in the search check if its hash exists in the "clash" bloom filter. If it does, add < hash, nonce > to a candidate list for investigation.
Sort the list of candidates by hash.
For each pair in the candidate list, see if the previous element has the same hash. If it does, add it to the output list. This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
Return the output list as normal.

For your testing pleasure, I also built a working proof of concept.
(Most of the magic is right here. The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")

Unmodified source: https://i.imgur.com/k2cNrmd.png

Source using bloom filters: https://i.imgur.com/w8Enf9e.png

In exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters. The overall result is a net efficiency gain of approximately 2.5
The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances. If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.

Given that bloom filters, SHA512, and sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.

Yes, vey interesting concept. With so many problems with folks pushing unfair bots into this, right now ... it looks pretty bad. There are so many coins right now that are easy & inexpensive to setup and start running fast, making money & doing more lifting up this whole thing. With the money you make and helping others know what can be done. As long as you feel comfortable with it, then go for it!

tromp

legendary

Activity: 1000

Merit: 1120

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: smolen on November 12, 2013, 11:36:59 PM

Quote from: gigawatt on November 07, 2013, 06:56:28 PM

The magic lies in using a Bloom filter to store the intermediate hashes.

For sake of history

Bloom filters were first mentioned in this thread in mysteriously disappeared message by iruu, it was right before your post.

Quote

Algorithm:
We have pw parallel workers and one m-bit bloom filter, initialized to zero.

This bounty has been award to gigawatt for his excellent work. Great progress all!

smolen

hero member

Activity: 524

Merit: 500

Quote from: gigawatt on November 07, 2013, 06:56:28 PM

The magic lies in using a Bloom filter to store the intermediate hashes.

For sake of history

Bloom filters were first mentioned in this thread in mysteriously disappeared message by iruu, it was right before your post.

Quote

Algorithm:
We have pw parallel workers and one m-bit bloom filter, initialized to zero.

gigawatt

full member

Activity: 168

Merit: 100

Quote from: bytemaster on November 07, 2013, 07:51:22 PM

All further discussion in this thread should be moved here: http://bitsharestalk.org/index.php?topic=22.0

I do not have time to follow two threads or repeat myself... :-)

gigawatt: join us over at bitsharestalk, I will pay you a 0.5 BTC tip for your contribution.

I scooted my conversation. My address is in my signature. Cheesy

bytemaster

hero member

Activity: 770

Merit: 568

fractally

All further discussion in this thread should be moved here: http://bitsharestalk.org/index.php?topic=22.0

I do not have time to follow two threads or repeat myself... :-)

gigawatt: join us over at bitsharestalk, I will pay you a 0.5 BTC tip for your contribution.

Etlase2

hero member

Activity: 798

Merit: 1000

Quote from: bytemaster on November 07, 2013, 07:34:41 PM

I wonder what would happen if we used NESTED momentum proof of work?

Change the nonce-space of the outer proof of work to a pair of 16 bit numbers that result in a X-bit collision?

Now you have a more memory intensive inner hash that is still quick to validate, but would significantly complicate GPU miners.

AnonyMint suggested this very thing but with scrypt at one point. However, I think it would have made verifying even more difficult. He would know better. He would probably also be a reasonable judge of this corollary to that idea.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

I wonder what would happen if we used NESTED momentum proof of work?

Change the nonce-space of the outer proof of work to a pair of 16 bit numbers that result in a X-bit collision?

Now you have a more memory intensive inner hash that is still quick to validate, but would significantly complicate GPU miners.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

I suppose the performance of your algorithm would also suffer if instead of SHA512(X), SCRYPT(X) was used because the cost of doing this step twice would be much higher and less GPU friendly.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

gigawatt,
   Thank you for providing the first innovative algorithm for reducing the memory requirements. Let me attempt to post mitigating factors to your algorithm.

   From a CPU miner perspective, your reduction in memory comes at the expense of performance so does break the algorithmic complexity of the algorithm.

   From a GPU perspective you have to populate a bloom filter with 2^26 results... based upon my understanding of how bloom filters operate this would require updating a common data-structure from every thread and the resulting memory race conditions could create false negatives. If you have to do this step sequentially, then you might as well use a CPU with memory.

   So do you have any solid algorithms that can populate a bloom filter with 2^26 results in parallel?

gigawatt

full member

Activity: 168

Merit: 100

I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.

The magic lies in using a Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in ~750 MB of memory), the required memory is (-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about ~76 MB.

This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%. For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.

Here's a overview of how the algorithm works:

Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
For each nonce in the search space, check if its hash exists in the "main" bloom filter. If it is, add it's entry to the "clash" bloom filter.
The "main" bloom is no longer required and can be discarded.
For each nonce in the search check if its hash exists in the "clash" bloom filter. If it does, add < hash, nonce > to a candidate list for investigation.
Sort the list of candidates by hash.
For each pair in the candidate list, see if the previous element has the same hash. If it does, add it to the output list. This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
Return the output list as normal.

For your testing pleasure, I also built a working proof of concept.
(Most of the magic is right here. The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")

Unmodified source: https://i.imgur.com/k2cNrmd.png

Source using bloom filters: https://i.imgur.com/w8Enf9e.png

In exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters. The overall result is a net efficiency gain of approximately 2.5
The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances. If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.

Given that bloom filters, SHA512, and sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.

smolen

hero member

Activity: 524

Merit: 500

Re-posting here, just in case

Quote from: smolen on November 06, 2013, 12:36:28 PM

- hash nonce space into birthdays space in parallel, get array of (nonce, birthday)
- sort it with bitonic
- find dublicates
Bonus: don't store birthdays at all, calculate them ad hoc during the sort
This is not an attack on the algorithm itself, it's just a way to fit existing algorithm into GPU

bytemaster

hero member

Activity: 770

Merit: 568

fractally

I have updated the terms of the bounty... http://bitsharestalk.org/index.php?topic=22.0

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Further discussion on the Momentum Proof-of-Work should be moved here: http://bitsharestalk.org/index.php?topic=21.msg24#msg24

bytemaster

hero member

Activity: 770

Merit: 568

fractally

The memory access pattern is 'random' so CPU cache does not help improve the locality of memory access.

azwccc

sr. member

Activity: 243

Merit: 250

What exactly is the "memory bus speed"? Is the RAM speed like PC12800 or the QPI become the bottleneck?

Will larger CPU cache size be able to help a little bit in the performance?

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: FreeTrade on October 19, 2013, 06:44:26 AM

This assumes ruling out the trivial A=B, of course.

Yeah, I think we're on top of that one Wink

bytemaster

hero member

Activity: 770

Merit: 568

fractally

note I added to my last reply.. I certainly don't want you wishing you didn't reply.

Topic: Momentum Proof-of-Work Bounty - 30 BTC [PAID] (Read 14110 times)