Momentum Proof-of-Work Bounty - 30 BTC [PAID] - page 3.

BitshireHashaway

full member

Activity: 182

Merit: 100

Quote from: Nathaniel B on October 18, 2013, 10:03:44 PM

Once the RAM is full everything is a read so it'll be perfectly parallel with some negligible constant time overhead.

Improvements against scrypt for the birthdayhash will help a bit, but like you said any tricks generally translate fine into ASICs.

For your example, since the mac pro has 16 times more RAM and hashing power it will be exactly 16^2 or 256 times better. Well worth the 50x cost. Since this is quadratic it will just get worse when you get into 128GB servers.

I like the elegance of having the RAM be the main resource for hashing, but I think we have a ways to go before making it practical. I'll keep thinking about it.

Nathaniel

The problem with memory is that setups will then turn to server motherboards, some of which can hold an insane amount of RAM. http://www.newegg.com/Product/Product.aspx?Item=N82E16813182346 This motherboard has 24 Ram Slots for $490.00

For best value, we can go with 16gb memory sticks at $150.00 each also available on newegg. http://www.newegg.com/Product/Product.aspx?Item=N82E16820226412

With 24 of these, you would have 384gb of Ram. Granted, your cost would be ~$6000 when you finally put the system together. However, with 384gb of ram, this system is 144 times as good as your Mac Pro for around 1.2x the cost. People would just have systems like this lying around.

Also, that doesn't even count one of Intel's Server Motherboards which can hold theoretically up to 1.5TB of ram (48 slots at 32gb each). However, at current prices your looking at 30 grand just in ram for that.

However, that being said, I would support a memory based coin. I kind of like the idea of having my house covered with servers each having hundreds of gb of ram versus the idea of my house covered with Asics

.

smolen

hero member

Activity: 524

Merit: 500

Quote from: bytemaster on October 20, 2013, 09:51:10 AM

Quote from: smolen on October 20, 2013, 09:49:10 AM

Quote from: bytemaster on October 20, 2013, 09:22:04 AM

The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

This is why I hobble the GPU with AES and unpredictable run time. It destroys the massive parallelism by forcing most of the threads to idle.

Sorry for some confusion, I'll clarify. Your idea is to make GPU useless by using GPU-hard hash function; all hashes can be uploaded from GPU to the host, but they are produced very slow, to give GPU miner no (economic) advantage. Mine is to make hash function so fast that the produced results cannot be uploaded to the host in reasonable time. So GPU will have ability to produce hashes, but not enough memory to find collision, and it would be cheaper to calculate hashes on CPU than to upload them from GPU.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 20, 2013, 02:21:06 PM

Due to moores law, the memory requirements should be targeted 1-2 years in the future and then have some means to adjust over time.

I'm thinking its better to set the requirements low for a single process and more capable computers can just run multiple processes . . . it'll scale for powerful computers (linearly with cost of hardware and energy hopefully) but it won't scale for GPUs as they'll quickly run out of both memory and memory bus access.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 02:38:53 PM

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.

That's a feature for you. It's a bug for me.

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware. I would suggest using 32 bit nonces and letting the time be what it will.

For an unlimited search on a single H, faster computers have a disproportionate advantage because they climb the probability graph much quicker. Limiting the time allowed on a single H, slower computers spend the same proportion of time in the higher part of the probability graph as faster computers.

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.

Don't know much about that - pretty sure I wouldn't be using SHA-256 though.

I think we are in basic agreement... I didn't say unlimited search, I said 32 bit limited search... though I suppose you could tie the bit limit to the memory limit so would recommend 29 bits minimum. This would allow you to fill 4 GB of memory worth of unique nonces. Within a year or two every cell phone will have 4 GB of memory and 64 bit quad-core processors with graphics units able to fill it.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.

That's a feature for you. It's a bug for me.

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware. I would suggest using 32 bit nonces and letting the time be what it will.

For an unlimited search on a single H, faster computers have a disproportionate advantage because they climb the probability graph much quicker. Limiting the time allowed on a single H, slower computers spend the same proportion of time in the higher part of the probability graph as faster computers.

Quote from: bytemaster on October 20, 2013, 02:17:14 PM

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.

Don't know much about that - pretty sure I wouldn't be using SHA-256 though.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

By targeting the memory and CPU abilities of the common computer as the means of reaching maximum performance you have eliminated ASIC and super computers. Due to moores law, the memory requirements should be targeted 1-2 years in the future and then have some means to adjust over time.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 01:59:52 PM

So I think the fundamental idea is sound, I don't make arguments against it, and I don't see convincing ones here. We do differ in the details.

My current thoughts for my preferred implementation are as follows -

1. Require between 256MB - 1GB memory
->To allow for use on very low end equiment

2. Limit A and B such that all possible matches are exhausted in less than 1 second - H would need to change every second and could include new transactions.
->So that faster computers are not disproportionately advantaged over slower ones.

3. Change
Hash(H + A + B) < TargetDifficulty
to
Hash(H + A + B + MemoryHash(A) + MemoryHash(B)) < TargetDifficulty
->To prevent GPUs evaluating this step before adding B to memory.

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.

By limiting the nonce search space you eliminate super computers from dominating (good) while still making the cost of an ASIC dominated by RAM (good).

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware. I would suggest using 32 bit nonces and letting the time be what it will.

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.

FreeTrade

legendary

Activity: 1470

Merit: 1030

So I think the fundamental idea is sound, I don't make arguments against it, and I don't see convincing ones here. We do differ in the details.

My current thoughts for my preferred implementation are as follows -

1. Require between 256MB - 1GB memory
->To allow for use on very low end equiment

2. Limit A and B such that all possible matches are exhausted in less than 1 second - H would need to change every second and could include new transactions.
->So that faster computers are not disproportionately advantaged over slower ones.

3. Change
Hash(H + A + B) < TargetDifficulty
to
Hash(H + A + B + MemoryHash(A) + MemoryHash(B)) < TargetDifficulty
->To prevent GPUs evaluating this step before adding B to memory.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote

As the number of stored birthdays approach the number of days per year the efficiency approaches that of the Birthday Hash alone which if it were SHA256 would be little different than Bitcoin.

I am actually going to back off on this a bit, it would still be much better than bitcoin because checking the RAM is a point of centralization that would significantly complicate the degree to which the algorithm could be made parallel.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote

The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.

This is a very insightful comment. I would like to address that these curves show the efficiency of the algorithm as memory increases. As the number of stored birthdays approach the number of days per year the efficiency approaches that of the Birthday Hash alone which if it were SHA256 would be little different than Bitcoin.

What this also implies is that if you use 32 bit birthday hash with SHA256() it would be nearly identical to Bitcoin except that it would require a constant 4 GB of memory to be filled before reaching full speed and you would have extra overhead associated with accessing that memory. Because 4 GB of memory is cheap, this would not be enough to stop ASICs or effect any real change aside from some slight momentum required to get up to speed after changing the block.

If you use a 36 bit birthday hash then it would require 256GB constant memory to get up to speed.
If you use a 256 bit birthday hash then you may never find a solution.

Conclusion:
ASIC or not, the momentum POW does make it more difficult to game markets based on block chains.
The number of bits in the birthday hash may want to be one factor in adjusting difficulty.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 20, 2013, 12:49:17 PM

Yes, that is true. All it means is that the ideal mining rig is now outfitted with MAX ram and the highest end GPU that is able to fill it.

It would significantly increase the memory requirements for ASIC designs, that is certainly true. Ok, I think you might be on to something here. The limit on performance will now be based upon how much RAM you can put on a given machine. I would like to point out that once you run out of RAM you still gain performance by adding more processing power, it just grows linearly instead of non-linerally.

So there's always going to be an ideal mining rig but the best we can hope for is that its cost scales linearly with its performance. The memory bus is the bottleneck that is hardest to expand - that's why we want to keep it full as much as possible.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 12:35:25 PM

Quote from: bytemaster on October 20, 2013, 12:26:24 PM

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

Oh I disagree.

With traditional proof-of-work systems likes SHA256 or Scrypt it was possible to gain
performance through parallelism alone. However, regardless of how efficiently an ASIC
can run the individual birthday hash steps in parallel, your use of memory must scale to
store the result of every parallel run or you will lose an algorithmic advantage that
cannot be overcome by increasing levels of parallelism

Yes, that is true. All it means is that the ideal mining rig is now outfitted with MAX ram and the highest end GPU that is able to fill it.

It would significantly increase the memory requirements for ASIC designs, that is certainly true. Ok, I think you might be on to something here. The limit on performance will now be based upon how much RAM you can put on a given machine. I would like to point out that once you run out of RAM you still gain performance by adding more processing power, it just grows linearly instead of non-linerally.

Nathaniel B

newbie

Activity: 19

Merit: 0

Quote from: bytemaster on October 19, 2013, 10:38:34 AM

Quote from: PassTheAmmo on October 19, 2013, 10:33:59 AM

Quote from: bytemaster on October 19, 2013, 10:18:48 AM

I had considered the potential for such a design. DHT will not perform well enough, you would need 100BT ethernet or perhaps gigabit before the hash table could be filled quickly enough. From what I can tell a SUPER COMPUTER with a ton of RAM and high-speed interconnects would give you the best performance. Here is the difference, such a super computer could be built from commodity parts.

Well, if the hash is so fast you could just collaborate by splitting the birthday space in N ranges and each computer just throws away any result that isn't in its range. This trades hashing speed for memory, so works with rather than against your pow.

What range do you suggest for N and how fast is the hash?
This should be possible to test.

Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.

The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 20, 2013, 12:26:24 PM

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

Oh I disagree.

With traditional proof-of-work systems likes SHA256 or Scrypt it was possible to gain
performance through parallelism alone. However, regardless of how efficiently an ASIC
can run the individual birthday hash steps in parallel, your use of memory must scale to
store the result of every parallel run or you will lose an algorithmic advantage that
cannot be overcome by increasing levels of parallelism

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 12:20:42 PM

Quote from: bytemaster on October 20, 2013, 12:17:05 PM

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used. Saturating the PCI bus is not best approach here.

Or you might use the hash to describe 8 32 bit birthdays, or 7 36 bit birthdays . . . modifying it as you wanted to modify the difficulty. Not a problem.

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: bytemaster on October 20, 2013, 12:17:05 PM

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used. Saturating the PCI bus is not best approach here.

Or you might use the hash to describe 8 32 bit birthdays, or 7 36 bit birthdays . . . modifying it as you wanted to modify the difficulty. Not a problem.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: FreeTrade on October 20, 2013, 11:54:36 AM

Quote from: smolen on October 20, 2013, 09:49:10 AM

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

So . . . some numbers . . . let's say SHA-256 because a lot of numbers are readily available - that's a 256 bit hash result, 32 bytes. If we wanted to a gb/sec from that, we want 1024x1024x1024/32 hashes . . . . or 32 M/h per second. There are faster hashes, and actually 0.5/gig per second would probably more than suffice.

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used. Saturating the PCI bus is not best approach here.

FreeTrade

legendary

Activity: 1470

Merit: 1030

Quote from: smolen on October 20, 2013, 09:49:10 AM

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

So . . . some numbers . . . let's say SHA-256 because a lot of numbers are readily available - that's a 256 bit hash result, 32 bytes. If we wanted to a gb/sec from that, we want 1024x1024x1024/32 hashes . . . . or 32 M/h per second. There are faster hashes, and actually 0.5/gig per second would probably more than suffice.

bytemaster

hero member

Activity: 770

Merit: 568

fractally

Quote from: smolen on October 20, 2013, 09:49:10 AM

Quote from: bytemaster on October 20, 2013, 09:22:04 AM

The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

This is why I hobble the GPU with AES and unpredictable run time. It destroys the massive parallelism by forcing most of the threads to idle.

Topic: Momentum Proof-of-Work Bounty - 30 BTC [PAID] - page 3. (Read 14110 times)