Pages:
Author

Topic: Momentum Proof-of-Work - page 2. (Read 8023 times)

newbie
Activity: 19
Merit: 0
October 18, 2013, 07:23:58 PM
#19
Just saw retep's post. I think we reached similar conclusions for somewhat different reasons. Both probably valid.

Here's the link to my argument in the project thread: https://bitcointalksearch.org/topic/m.3365654

Still yours is an interesting idea that I'll keep in mind.


Nathaniel
legendary
Activity: 1120
Merit: 1160
October 18, 2013, 07:07:00 PM
#18
Yes it is very simple and elegant After the Fact...  but I have posted bounties trying to find better proof of work and spent weeks trying to find a way to make a fast to validate memory hard problem.    Then I had to find a way to make it 'scale' the difficulty smoothly.   Lots of little things to make something simple, elegant, and algorithmically secure.

Interesting idea, but for crypto-coins a proof-of-work scheme that isn't a random lottery - that is if not every attempt at creating a valid PoW has an equal chance of succeeding - can be really problematic because it means faster miners have an advantage. You give an example of a system where miners wouldn't want to add a transaction to the block they were mining because they'd have to start over. Such a system would mean that whom ever had the fastest implementation of the scheme would find the majority of the blocks, which really rewards people with highly-tuned custom hardware implementations - exactly what you are trying to avoid.

I'm also extremely skeptical of the idea that you've actually created a ASIC resistant scheme. You mention parallelism in your paper as a possible problem, but brush it off assuming that a hash table would be the optimal implementation, and lock contention and "atomic operations" would prevent a highly parallel implementation; I'm very, very skeptical that you're correct.

Fundamentally your design has two basic primatives: the generator function G(m, k)=H(m + k) producing candidate digests, and the content addressable memory that stores potential digests and allows for matches to be searched for. The problem is that a solution is defined as successful if any a and b are found such that G(m, a) & (2^d)-1 == G(m, b) & (2^d)-1 for some difficulty d, a small positive integer. (a minor correction form your paper; you forgot to include the masking operation that makes finding a and b possible at all)

As you hint an ideal operation will run multiple generators in parallel - the problem is that an optimal implementation of the content addressable memory is very probably not a simple hash table. Here we have a situation with really weak demands on the CAM: it's ok if it doesn't always find a match, it's ok if there is no global synchronization, it's ok if sometimes it returns a false positive, and worst of all it doesn't even have to actually store all the data! Dedicated silicon implementations of CAMs are already really common for things like network routers, and they have vastly better performance than lookup tables built from commodity memory and CPU's. They also use a plethora of complex and application specific tricks to get the performance they need, even going as far as to make use of analog computation and probabilistic retrieval.

For instance off the top of my head a very fast design with very good utilization of silicon would be to use a custom ASIC consisting of a number of generator units feeding their results into a probabilistic CAM. A nice trick we can take advantage of is that for each candidate digest the generator function produces, we only actually need to store it's index to recreate it. That is if G(m, i)=d_i, we only need to store i, and we can even cheat further by only storing some of the bits of i, doing a quick brute force search after the fact to figure out which actual i was the match.

Hardware CAMs are usually implemented as a series of cells, with some number of search lines connected to each cell in parallel. Each cell matches the search line against it's own contents, asserting a series of read-out lines if the contents match. Writing a value to a cell is similar to regular memory. Matches are done very quickly, a single cycle, but at the cost of high power consumption as the memory grows larger. In our case we want a match to be done on the value of G(m, i), and we want the CAM cell to return the index i.

Lets suppose the difficulty is such that we're finding 64-bit birthday collisions or 2^32 items for a 50% chance of collision. This means our index values will need to be about 32-bits, as we have to search from 0 to 2^32 for a 50% chance of finding a collision. Naively we've have to store 64 bits per value for the digest, and 32 bits for the index, or 96bits * 2^32 = 48GiB. But we can do a lot better... First of all suppose only store 24 bits of digest in each cell: by the time we've got 2^32 items we'll get on average 2^8 false positives - pretty manageable with some parallelism to test them all. Next we can split our gigantic CAM array into multiple independent arrays, say 256 of them. Now for that 2^8 false positives, we only need to store 16 bits - pretty good! As for the indexes, we can cut down on them too: lets drop the lowest 8 bits, and just bruteforce the solution for the previous candidate digest at a cost of 2^7 average operations. Sure, that wasn't free, but we're now down to just 48 bits per cell, or 24GiB total.

Now I'm not giving you exact numbers, but that's not the point: I'm showing you how what's optimal turns out to be a crazy-looking hyper-optimized set of custom ASICs. Heck, a quick read of the literature on CAM design says that you'd probably go even further in this case, doing really crazy stuff like using multi-level kinda analog DRAM technology in the CAM cells, making convoluted trade-offs between actually storing indexes, or just leaving them implied by what bank the results are being stored in, etc.

I really suspect that rather than creating a ASIC hard design where commodity hardware has the same profitability as anything else, you've actually done the exact opposite and created a PoW function where the optimal implementation costs a tonne of money to implement, involves custom ASICs, and has performance miles ahead of anything else. Even worse is that with the non-lottery-random "momentum" aspect of what you're doing whomever implements this crazy custom hardware first will not only have the highest hashing rate, but they'll also solve the PoW problems the fastest, and hence get nearly all blocks.

Finally note that if G() was made to be made non-parallizable the idea might work, for instance by defining it as G(i)=H(m+k) ^ G(i-1), but then you wouldn't be able to do verification cheaply and might as well just use scrypt.

tl;dr: Cool idea, but the end result is probably the exact opposite of what you want.
legendary
Activity: 1937
Merit: 1001
October 18, 2013, 05:46:45 PM
#17
I don't think we can change the proof of work without hard forking bitcoin...
So i guess we're looking at a new altcoin.. again xD
Or rather an entire avalanche of momentum-pow alt-chains -.-'
sr. member
Activity: 420
Merit: 250
October 18, 2013, 05:12:44 PM
#16
10TB of RAM
How does one even get this amount of RAM  Cheesy
Very interesting concept.

A rack of boxes with 1TB.  You can fairly easily do 1TB per system with a few different boards and the LRDIMM RAM.  Throw 10 of those boxes in a rack with Infiniband networking and you've got 10TB in a rack.  It's expensive, but not overwhelmingly so.  The RAM is the priciest part.  But you could do it in a standard 42U rack.
legendary
Activity: 1862
Merit: 1011
Reverse engineer from time to time
October 18, 2013, 03:23:34 PM
#15
The whitepaper states

Quote
Find nonce A and nonce B such that BirthdayHash(A+H) == BirthdayHash( B+H)

Does it mean all 256 bits must be the same?
+1. Would like to hear this as well.
legendary
Activity: 2142
Merit: 1010
Newbie
October 18, 2013, 02:36:03 PM
#14
The whitepaper states

Quote
Find nonce A and nonce B such that BirthdayHash(A+H) == BirthdayHash( B+H)

Does it mean all 256 bits must be the same?
hero member
Activity: 798
Merit: 1000
October 18, 2013, 01:56:45 PM
#13
The momentum factor prevents manipulation of the BitShares blockchain based market because miners face a major cost to 'stop mining mid block in an attempt to manipulate the market'.

I've read some of the bitshares stuff but haven't looked too deeply, but if I get this right there is no longer a nonce adjustment then with this PoW, right?

Quote
I wanted a fast validating proof of work.

There are other, very beneficial applications to this proof, which lead into...

Quote
I wanted the most decentralized proof of work I could come up with.

this. I believe you have made a very, very significant step in this direction.

Again, kudos. If this all checks out by people smarter than me, you have made a very significant contribution to the field of cryptocurrency.
legendary
Activity: 1162
Merit: 1007
October 18, 2013, 01:54:30 PM
#12
Very interesting Bytemaster.  I look forward to taking a closer look at the nuts & bolts.  Well done!
hero member
Activity: 770
Merit: 566
fractally
October 18, 2013, 01:43:51 PM
#11
Not to downplay your achievement, but your idea here seems almost simple after the fact. What led you down this line of thinking? It is certainly much less complicated than scrypt as PoW, which is a bonus. But scrypt does do a fairly good job as PoW in lieu of SHA256. What need did you see that needed filling? Just a (potentially) truly CPU-dominated, botnet resistant PoW?

Yes it is very simple and elegant After the Fact...  but I have posted bounties trying to find better proof of work and spent weeks trying to find a way to make a fast to validate memory hard problem.    Then I had to find a way to make it 'scale' the difficulty smoothly.   Lots of little things to make something simple, elegant, and algorithmically secure.
hero member
Activity: 770
Merit: 566
fractally
October 18, 2013, 01:39:05 PM
#10
Not to downplay your achievement, but your idea here seems almost simple after the fact. What led you down this line of thinking? It is certainly much less complicated than scrypt as PoW, which is a bonus. But scrypt does do a fairly good job as PoW in lieu of SHA256. What need did you see that needed filling? Just a (potentially) truly CPU-dominated, botnet resistant PoW?

The momentum factor prevents manipulation of the BitShares blockchain based market because miners face a major cost to 'stop mining mid block in an attempt to manipulate the market'.

I wanted a fast validating proof of work.

I wanted the most decentralized proof of work I could come up with.   An ASIC for scrypt could be developed much easier than for this proof of work.

I wanted the bottleneck to be a commodity part with dual use.  RAM and memory controllers.

hero member
Activity: 770
Merit: 566
fractally
October 18, 2013, 01:36:53 PM
#9
Nonsense, a home pc has limits on RAM, require other hardware etcetc. An ASIC would have exactly only what is required, in this case tons of memory.

Yes you could build an ASIC but the ROI equation would be radically different.  Especially when competing against an army of PCs with no additional capital costs because they are dual use.
hero member
Activity: 798
Merit: 1000
October 18, 2013, 01:36:33 PM
#8
Not to downplay your achievement, but your idea here seems almost simple after the fact. What led you down this line of thinking? It is certainly much less complicated than scrypt as PoW, which is a bonus. But scrypt does do a fairly good job as PoW in lieu of SHA256. What need did you see that needed filling? Just a (potentially) truly CPU-dominated, botnet resistant PoW?
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
October 18, 2013, 01:33:32 PM
#7
Nonsense, a home pc has limits on RAM, require other hardware etcetc. An ASIC would have exactly only what is required, in this case tons of memory.
hero member
Activity: 770
Merit: 566
fractally
October 18, 2013, 01:28:41 PM
#6
So we will have to wait for the new ASICs right?

In b4 "no ASICs can be made for that" bullshit  Roll Eyes

Sure you could create an ASIC, but the transistor count would be dominated by memory.  A single Core i7 can utilize multiple GB of memory within a 5 min block interval.    If you replace the Core i7 with something 10x faster, you would need 40 GB of ram to maintain the same efficiency.  If you don't grow the ram, your performance will only marginally better and nowhere near 10x.  With these numbers the gains from creating a specialized ASIC are much less than in bitcoin land to the point of not being profitable.   RAM is power hungry so ASIC manufactures would be competing against home PCs with 0 capital cost and sometimes free electricity.  
hero member
Activity: 798
Merit: 1000
October 18, 2013, 01:17:14 PM
#5
This memory-hard hash function scales memory to the point of requiring GB of memory to efficiently find solutions, but almost no memory to verify.

I am not well-versed enough to be able to determine the veracity of this from your whitepaper, but if accurate, this is a supremely excellent achievement.
legendary
Activity: 1148
Merit: 1008
If you want to walk on water, get out of the boat
October 18, 2013, 01:11:59 PM
#4
So we will have to wait for the new ASICs right?

In b4 "no ASICs can be made for that" bullshit  Roll Eyes
hero member
Activity: 770
Merit: 566
fractally
sr. member
Activity: 420
Merit: 250
October 18, 2013, 01:02:26 PM
#2
Where is the project thread?  It's an interesting concept...
hero member
Activity: 770
Merit: 566
fractally
October 18, 2013, 12:19:06 PM
#1
Bitcoin could end centralization of hash power by adopting a new proof of work I have created that would require 10 TB of RAM if someone were able to create Scrypt ASIC capable of 1 Gigahash.  This memory-hard hash function scales memory to the point of requiring GB of memory to efficiently find solutions, but almost no memory to verify.

http://invictus-innovations.com/s/MomentumProofOfWork.pdf

THere is a 30 BTC bounty in the project-development thread related to this proof-of-work.   I am posting in general discussion because of the general usefulness of such a proof of work in the bitcoin space.
Pages:
Jump to: