Momentum Proof-of-Work Bounty - 30 BTC [PAID] - page 4.

smolen

hero member

Activity: 524

Merit: 500

Quote from: bytemaster on October 20, 2013, 09:22:04 AM

The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 08:09:44 AM

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm.

Ok, and why is that important to you. I'm probably thinking in narrower terms (just for an altcoin) so I don't see the virtue in this - can you explain more please.

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

It sounds like you're employing two methods to make the coin GPU resistant - contention for access to the memory/hashtable and requiring GPU cache. I think this actually doubles the 'attack' space . . . suggest focusing on one, and I recommend the first one (y'now, your new brilliant idea), as the second has already seen a lot of effort.

Unpredictable run length foils GPU threads by having many of the data-parallel threads idle waiting on the longer-running threads.

Fortunately I defined the white paper in generic terms so we can use what ever hash method you like. I will point out though that there are means of avoiding synchronization on the hash table through pipelining. The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm.

Ok, and why is that important to you. I'm probably thinking in narrower terms (just for an altcoin) so I don't see the virtue in this - can you explain more please.

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

It sounds like you're employing two methods to make the coin GPU resistant - contention for access to the memory/hashtable and requiring GPU cache. I think this actually doubles the 'attack' space . . . suggest focusing on one, and I recommend the first one (y'now, your new brilliant idea), as the second has already seen a lot of effort.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: prophetx on October 19, 2013, 09:53:29 PM

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

Quote from: FreeTrade on October 19, 2013, 01:22:22 PM

Quote from: bytemaster on October 19, 2013, 01:07:11 PM

Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.

Okay, we better inform the author of the white paper who wrote,

>For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform.

Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization.

LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

You may wish to reconsider foiling efforts that create the most efficient market outcomes and use of resources.

Imagine a world where all transactions and communications are conducted through p2p networks. This world needs a massive amount of computing power, networks which are costlier to operate lose out to networks which can be operated cost efficiently. Also keep in mind that GPU farms keep out botnets.

Efficiency and decentralization... are generally at odds with each other until we can come up with an alternative to proof of work. Ripple and their consensus approach seems like a promising alternative.

prophetx

legendary

Activity: 1666

Merit: 1010

he who has the gold makes the rules

Quote from: bytemaster on October 19, 2013, 01:42:06 PM

Quote from: FreeTrade on October 19, 2013, 01:22:22 PM

Quote from: bytemaster on October 19, 2013, 01:07:11 PM

Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.

Okay, we better inform the author of the white paper who wrote,

>For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform.

Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization.

LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

You may wish to reconsider foiling efforts that create the most efficient market outcomes and use of resources.

Imagine a world where all transactions and communications are conducted through p2p networks. This world needs a massive amount of computing power, networks which are costlier to operate lose out to networks which can be operated cost efficiently. Also keep in mind that GPU farms keep out botnets.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 19, 2013, 01:22:22 PM

Quote from: bytemaster on October 19, 2013, 01:07:11 PM

Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.

Okay, we better inform the author of the white paper who wrote,

>For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform.

Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization.

LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 19, 2013, 01:07:11 PM

Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.

Okay, we better inform the author of the white paper who wrote,

>For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform.

Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 19, 2013, 01:02:32 PM

Quote from: bytemaster on October 19, 2013, 12:43:23 PM

about 2 GB in 5 minutes...

Hmm - I think you're going to want something that uses 1GB in 1 second, run 300 times. If you're not keeping the memory bus or hashtable running at maximum capacity, you can just add extra cores until it is up to full capacity. That surely is what we're trying to avoid.

Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 19, 2013, 12:43:23 PM

about 2 GB in 5 minutes...

Hmm - I think you're going to want something that uses 1GB in 1 second, run 300 times. If you're not keeping the memory bus or hashtable running at maximum capacity, you can just add extra cores until it is up to full capacity. That surely is what we're trying to avoid.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 19, 2013, 12:39:34 PM

Quote from: bytemaster on October 19, 2013, 12:19:59 PM

I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.

Ok, so 50,000 x 60 x 5 x 36 (bits) - 540000000 bits - So writing 64MB to memory over the course of 5 minutes . . . . not so taxing. I think we'd want to be maxing out the memory bus by creating hash collision candidates such that we're writing in the region of 1GB/Sec to memory.

In my test benchmark, running on a single CPU core I utilize about 2 GB in 5 minutes... though I am not using the most efficient storage possible. For starters, I am using a hash table with 64bit (birthday) 64 bit value (nonce) so that is already about 4x as much memory as you calculated. Now I suspect that some of that 2 GB is from the hash table not being 'full'. There is obviously a Memory/CPU/synchronization tradeoff that can be made with the hash table.

Going to 8 cores on an i7 will probably use at least 8GB of RAM which is consumer grade right now.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 19, 2013, 12:19:59 PM

I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.

Ok, so 50,000 x 60 x 5 x 36 (bits) - 540000000 bits - So writing 64MB to memory over the course of 5 minutes . . . . not so taxing. I think we'd want to be maxing out the memory bus by creating hash collision candidates such that we're writing in the region of 1GB/Sec to memory.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 19, 2013, 10:27:01 AM

The "+" operator is concatenation rather than addition so the math you are doing is not valid.

Ah - of course! Thank you.

Quote from: bytemaster on October 19, 2013, 10:27:01 AM

BirthdayHash is only memory intensive enough to limit the cache available to the GPU. Scrypt-like hash functions have some benefit at fighting ASIC designs as long as their validation time does not exceed 1ms. By combining Scrypt-like functions with Momentum you get the best of all currently known techniques.

So if we assume 1ms to generate a BirthdayHash, this gives us 1000x60x5 hashes for a single core during the target time for a block - 300,000 hashes - very little memory required. The CPU and Memory will be idling, waiting for more completed hashes from the GPU - you want it the other way round, a super fast hash to fill up the memory fast.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: PassTheAmmo on October 19, 2013, 11:11:54 AM

Quote from: bytemaster on October 19, 2013, 11:00:10 AM

My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.

Ok. At that speed nodes in a DHT would only need a few kb/s of internet bandwidth to reach the full potential of the DHT. How much gain that would give would depend on the difficulty of finding a match. I think it would be cool if you made it even slower (if only by repeating it enough times) to make it suitable for one huge DHT. I don't know if it's possible, but for other reasons than the ones you brought up. Anyway, good luck, I think that you could have a great proof of work function with some modifications. I'm outta here!

It all comes down to a tradeoff with validation time. DHT like KAD would require much more bandwidth than the 'raw data' being published.

PassTheAmmo

newbie

Activity: 23

Merit: 0

Quote from: bytemaster on October 19, 2013, 11:00:10 AM

My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.

Ok. At that speed nodes in a DHT would only need a few kb/s of internet bandwidth to reach the full potential of the DHT. How much gain that would give would depend on the difficulty of finding a match. I think it would be cool if you made it even slower (if only by repeating it enough times) to make it suitable for one huge DHT. I don't know if it's possible, but for other reasons than the ones you brought up. Anyway, good luck, I think that you could have a great proof of work function with some modifications. I'm outta here!

bytemaster

hero member

Activity: 770

Merit: 568

fractally

FYI... I do not consider the algorithm broken if software enhancements can improve performance. I fully expect that with all of the smart minds out there that the algorithmic performance on commodity hardware will improve by several orders of magnitude.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: PassTheAmmo on October 19, 2013, 10:51:30 AM

Quote from: bytemaster on October 19, 2013, 10:38:34 AM

Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.

I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision.

My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: PassTheAmmo on October 19, 2013, 10:51:30 AM

Quote from: bytemaster on October 19, 2013, 10:38:34 AM

Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.

I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision.

There is a trade off between latency and rate of finding a collision. I am actually using something closer to 36 bits. Beyond 36 bits and the time it takes for a Core i7 to find a single collision would be too long. Remember the collision rate is only the rate at which lottery tickets are found. Difficulty is the rate at which blocks could be found. So this algorithm could be tuned to require no more than 4 GB of RAM after which it produces a collision for every single BirthdayHash... however, there are still benefits for storing and comparing all matches and not just single collisions because every potential match is a lottery ticket. Therefore the number of bits in the collision is almost irrelevant beyond a certain point.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Is there a way that we can quantify the cost vs gain associated with developing specialized solutions to SHA, Scrypt, and Momentum?

SHA is merely a game of transistors per chip and has no limits associated with a memory bus. The speed of light is not a significant factor as the chips only have to 'communicate' matching results to the outside world.

Scrypt is effectively the same as SHA with the exception that your density per ASIC is reduced. Once again there are no speed-of-light limitations because only matching results need to be communicated outside the chip itself.

The proof of the nature of these algorithms is the existence of a mining pool which effortlessly aggregates any number of computational units at a global scale to produce what is effectively one giant computer.

Attempt to build a mining pool super computer for Momentum and you will quickly discover that bandwidth and the speed-of-light make it so expensive as to be infeasible. So while there may be a theoretical quadratic return in algorithm performance for increasing the amount of RAM you have, the speed of light and density limitations kick in and limit performance beyond a certain point.

PassTheAmmo

newbie

Activity: 23

Merit: 0

Quote from: bytemaster on October 19, 2013, 10:38:34 AM

Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.

I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision.

Topic: Momentum Proof-of-Work Bounty - 30 BTC [PAID] - page 4. (Read 14110 times)