Pages:
Author

Topic: Momentum Proof-of-Work Bounty - 30 BTC [PAID] (Read 14110 times)

legendary
Activity: 990
Merit: 1108
January 29, 2016, 08:53:40 AM
#98
What's the status of Momentum POW? Was it used in Bitshares? Is it used in any coin?

It is broken beyond repair according to

https://eprint.iacr.org/2015/946.pdf

Their Proposition 9 states a sublinear time-memory tradeoff.
For instance, collisions can be found in memory O(sqrt(N)) and time O(N^(5/4)),
instead of memory and time O(N).

This Proposition  is based on the golden collision search technique, described in Section 4.2 of

http://people.scs.carleton.ca/~paulv/papers/JoC97.pdf

p2p
newbie
Activity: 32
Merit: 0
What's the status of Momentum POW? Was it used in Bitshares? Is it used in any coin?
full member
Activity: 154
Merit: 100
I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.

The magic lies in using a Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in ~750 MB of memory), the required memory is (-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about ~76 MB.

This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%.  For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.


Here's a overview of how the algorithm works:

  • Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
  • Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
  • Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
  • For each nonce in the search space, check if its hash exists in the "main" bloom filter.  If it is, add it's entry to the "clash" bloom filter.
  • The "main" bloom is no longer required and can be discarded.
  • For each nonce in the search check if its hash exists in the "clash" bloom filter.  If it does, add < hash, nonce > to a candidate list for investigation.
  • Sort the list of candidates by hash.
  • For each pair in the candidate list, see if the previous element has the same hash.  If it does, add it to the output list.  This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
  • Return the output list as normal.


For your testing pleasure, I also built a working proof of concept.
(Most of the magic is right here.  The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")

Unmodified source: https://i.imgur.com/k2cNrmd.png

Source using bloom filters: https://i.imgur.com/w8Enf9e.png

In exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters.  The overall result is a net efficiency gain of approximately 2.5
The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances.  If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.

Given that bloom filters, SHA512, and sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.


Yes, vey interesting concept. With so many problems with folks pushing unfair bots into this, right now ... it looks pretty bad. There are so many coins right now that are easy & inexpensive to setup and start running fast, making money & doing more lifting up this whole thing. With the money you make and helping others know what can be done. As long as you feel comfortable with it, then go for it!
legendary
Activity: 990
Merit: 1108
hero member
Activity: 770
Merit: 566
fractally
The magic lies in using a Bloom filter to store the intermediate hashes.
For sake of history Smiley Bloom filters were first mentioned in this thread in mysteriously disappeared message by iruu, it was right before your post.
Quote
Algorithm:
We have pw parallel workers and one m-bit bloom filter, initialized to zero.

This bounty has been award to gigawatt for his excellent work.   Great progress all! 
hero member
Activity: 524
Merit: 500
The magic lies in using a Bloom filter to store the intermediate hashes.
For sake of history Smiley Bloom filters were first mentioned in this thread in mysteriously disappeared message by iruu, it was right before your post.
Quote
Algorithm:
We have pw parallel workers and one m-bit bloom filter, initialized to zero.
full member
Activity: 168
Merit: 100
All further discussion in this thread should be moved here: http://bitsharestalk.org/index.php?topic=22.0

I do not have time to follow two threads or repeat myself... :-)

gigawatt:  join us over at bitsharestalk, I will pay you a 0.5 BTC tip for your contribution.


I scooted my conversation.  My address is in my signature. Cheesy
hero member
Activity: 770
Merit: 566
fractally
All further discussion in this thread should be moved here: http://bitsharestalk.org/index.php?topic=22.0

I do not have time to follow two threads or repeat myself... :-)

gigawatt:  join us over at bitsharestalk, I will pay you a 0.5 BTC tip for your contribution.
hero member
Activity: 798
Merit: 1000
I wonder what would happen if we used NESTED momentum proof of work? 

Change the nonce-space of the outer proof of work to a pair of 16 bit numbers that result in a X-bit collision?

Now you have a more memory intensive inner hash that is still quick to validate, but would significantly complicate GPU miners.

AnonyMint suggested this very thing but with scrypt at one point. However, I think it would have made verifying even more difficult. He would know better. He would probably also be a reasonable judge of this corollary to that idea.
hero member
Activity: 770
Merit: 566
fractally
I wonder what would happen if we used NESTED momentum proof of work? 

Change the nonce-space of the outer proof of work to a pair of 16 bit numbers that result in a X-bit collision?

Now you have a more memory intensive inner hash that is still quick to validate, but would significantly complicate GPU miners.
hero member
Activity: 770
Merit: 566
fractally
I suppose the performance of your algorithm would also suffer if instead of SHA512(X),  SCRYPT(X) was used because the cost of doing this step twice would be much higher and less GPU friendly.
hero member
Activity: 770
Merit: 566
fractally
gigawatt,
     Thank you for providing the first innovative algorithm for reducing the memory requirements.  Let me attempt to post mitigating factors to your algorithm.

     From a CPU miner perspective, your reduction in memory comes at the expense of performance so does break the algorithmic complexity of the algorithm.

     From a GPU perspective you have to populate a bloom filter with 2^26 results... based upon my understanding of how bloom filters operate this would require updating a common data-structure from every thread and the resulting memory race conditions could create false negatives.   If you have to do this step sequentially, then you might as well use a CPU with memory.  

     So do you have any solid algorithms that can populate a bloom filter with 2^26 results in parallel?  
full member
Activity: 168
Merit: 100
I've managed to make a huge step forward in proving Momentum being not nearly as good for proof-of-work as intended.

The magic lies in using a Bloom filter to store the intermediate hashes.
As a result, instead of using 12 bytes per hash/nonce in a Semi-Ordered Map (which results in ~750 MB of memory), the required memory is (-1 * (2^26 * ln(0.01)) / (ln(2)^2)), or about ~76 MB.

This number can be reduced arbitrarily if we're willing to have a false positive rate greater than 1%.  For example, if we allowed up to a 50% chance of having a false positive, the memory requirement drops to ~11 MB.


Here's a overview of how the algorithm works:

  • Make a "main" bloom filter of size 2^26 with a false positive rate 1%: ~76 MB
  • Make a tiny "clash" bloom filter of size 2^16 and false positive rate 2^-26: ~0.7 MB
  • Make a vector of pairs< hash, nonce > to store candidate birthday collisions.
  • For each nonce in the search space, check if its hash exists in the "main" bloom filter.  If it is, add it's entry to the "clash" bloom filter.
  • The "main" bloom is no longer required and can be discarded.
  • For each nonce in the search check if its hash exists in the "clash" bloom filter.  If it does, add < hash, nonce > to a candidate list for investigation.
  • Sort the list of candidates by hash.
  • For each pair in the candidate list, see if the previous element has the same hash.  If it does, add it to the output list.  This step removes false positives by comparing the actual hash instead of the bloom filter's idea of a hash.
  • Return the output list as normal.


For your testing pleasure, I also built a working proof of concept.
(Most of the magic is right here.  The bloom filter is a modified version of "bloom.cpp" called "bigbloom.cpp")

Unmodified source: https://i.imgur.com/k2cNrmd.png

Source using bloom filters: https://i.imgur.com/w8Enf9e.png

In exchange for lowering the memory requirements by a factor of ~10, the algorithm runs at about 1/4th speed, mainly due to the doubling calls to SHA512 and the hash calls within the bloom filters.  The overall result is a net efficiency gain of approximately 2.5
The reduction in memory requirement means that if we could fit N instances of Momentum in GPU memory, we can instead fit 10*N instances.  If we up the false positive rate in exchange for more time spent sorting, we can achieve ratios of up to 70*N.

Given that bloom filters, SHA512, and sorting data are all parallel/GPU friendly, we can conclude that Momentum as a proof-of-work isn't nearly as GPU-hard as initially intended.

hero member
Activity: 524
Merit: 500
Re-posting here, just in case Smiley
- hash nonce space into birthdays space in parallel, get array of (nonce, birthday)
- sort it with bitonic
- find dublicates
Bonus: don't store birthdays at all, calculate them ad hoc during the sort
This is not an attack on the algorithm itself, it's just a way to fit existing algorithm into GPU
hero member
Activity: 770
Merit: 566
fractally
I have updated the terms of the bounty... http://bitsharestalk.org/index.php?topic=22.0
hero member
Activity: 770
Merit: 566
fractally
Further discussion on the Momentum Proof-of-Work should be moved here: http://bitsharestalk.org/index.php?topic=21.msg24#msg24
hero member
Activity: 770
Merit: 566
fractally
The memory access pattern is 'random' so CPU cache does not help improve the locality of memory access.
sr. member
Activity: 243
Merit: 250
What exactly is the "memory bus speed"? Is the RAM speed like PC12800 or the QPI become the bottleneck?

Will larger CPU cache size be able to help a little bit in the performance?
legendary
Activity: 1470
Merit: 1030
This assumes ruling out the trivial A=B, of course.

Yeah, I think we're on top of that one Wink
hero member
Activity: 770
Merit: 566
fractally
note I added to my last reply..  I certainly don't want you wishing you didn't reply.
Pages:
Jump to: