Pages:
Author

Topic: MemoryCoin 2.0 Proof Of Work (Read 21394 times)

newbie
Activity: 8
Merit: 104
July 18, 2014, 10:12:19 PM
Look at Adapteva as well.
legendary
Activity: 983
Merit: 1091
July 26, 2014, 09:34:19 AM
Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

It's not memory-hard though, as memory can trivially traded be traded off for computation time
(8192 times less memory with only 50x slowdown).
Which is a bit sad for a coin whose very name emphasizes memory:-(

Quote
CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).

CryptoNight is just hashcash with a hashfunction that bears some similarities to MMC2's verification
(note that the crucial overwriting of main memory blocks has no equivalent in MMC2).

MMC2 is not hashcash, but one of the few asymmetric PoWs (like Primecoin, Momentum, and Cuckoo Cycle) where proof attempts need to perform much more work than verification.

So overall, I'd say they're conceptually quite different.
member
Activity: 66
Merit: 10
Dev of yam miner
July 26, 2014, 12:37:11 AM
Yes, big memory buffer needed to avoid recalculation of 64k blocks.

This makes PoW really memory bound.

CryptoNight PoW is somehow conceptually similar to MMC 2.0 PoW (besides of much smaller block size and different - more aggressive - data dependency construct).


 
legendary
Activity: 983
Merit: 1091
July 25, 2014, 04:29:13 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

Hmm, looking at the actual implementation on github, I apparently misunderstood,
as the block-pointer depends on the updated state as well.

So the miner does need 1GB, at least to avoid having to regenerate each block 49 times on average
(as FPGA/ASICs would be expected to do).
legendary
Activity: 983
Merit: 1091
July 25, 2014, 01:46:27 PM
Ok, here's the latest modification to the PoW.

1. Generate 1GB PsuedoRandom data using SHA512
2. For each 64K block - Repeat 10 50 times
 2.1 Use the last 32bits as a pointer to another 64K block
 2.2 XOR the two 64K blocks together
 2.3 AES CBC encrypt the result using the last 256 bits as a key
3. Use the last 32bits%2^14 as the solution. If the solution==1968, block solved


If I understand this correctly, then the miner only needs to use 50*64KB of memory,
or 3.2MB, instead of the intended 1024MB.
Your next-block pointers create a random graph on 2^14 nodes with edges partitioned
into several cycles, each of which can be processed separately.

On each cycle, it only needs to maintain the block states of the last 50 starting positions.
So it SHA-generates the next block and updates each of those 50 states with it.
One of them has accumulated 50 updates and is checked and retired,
while another is started in its place, having only 1 update.
There's just 49 extra steps of SHA generation and updating when the cycle is completed
(easy to check with e.g. a bitmap of 2^14 bits).
According to random graph theory, the number of cycles is logarithmic in total number of nodes,
so this overhead is negligible.

In summary, the miner should generate blocks not in naive block index order, but
in next-block pointer order, and maintain 50 simultaneous block states, to drastically
reduce memory usage.

-John
hero member
Activity: 518
Merit: 521
July 18, 2014, 08:59:56 PM
In Upthread discussion, reorder claimed the GPU implementation was random access bound, and I claimed that with enough threads it would likely be AES computation bound instead.

I add now the point that I don't think he accounted for what percentage of the execution time is random access bound on the CPU?

Thus I am positing that as the number of the threads on the GPU increases, then the random access portion of the algorithm is actually detrimental to giving an advantage to the CPU.

Does this make sense?

And I had a good giggle over it.

That is why you are a nobody, because you don't comprehend and you laugh at those who do (or are in the process of) instead of participating in sharing and learning.
full member
Activity: 267
Merit: 100
May 28, 2014, 03:35:38 AM
I saw a whole lot of talk from AnthonyMint and a whole lack of any kind of proof. Just words, no good links, no algorithms of his own. Just a whole lot of crap-talk.

And I had a good giggle over it.
hero member
Activity: 518
Merit: 521
February 19, 2014, 10:21:28 PM
Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

I am trying to figure out which altcoin to give my algorithm to. I want it to have a first-mover advantage. Otherwise we end up with a sea of copycat coins and no focused challenger to Bitcoin.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.

Granted Oracle's chip is not the most cost effective. I was too lazy to go research for the name (Tilera) of the lower cost per thread CPU.

Are you claiming that massively parallel CPUs cost more per thread than Intel and AMD CPUs for consumer-level PCs?

Doesn't the link you provided refute that, or the Tilera CPU that someone else mentioned.

Your algorithm is not computation intensive and is main memory DRAM latency bound, thus lower-powered threads suitable for server tasks would be a better fit than the very high-powered PC CPUs. For example, web-server or chat-server threads are typically not compute bound rather I/O bound. The Oracle chip is more high powered I think because it is designed for database threads.
legendary
Activity: 983
Merit: 1091
February 19, 2014, 10:03:10 PM
Defeating those high-end server CPUs was another of my objectives (and I think I succeeded).

When can we expect a publication detailing your PoW?

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.

But the latter is not as cost effective; it costs way more per thread and per GB of memory,
and thus gets handily beaten by a farm of pc's of the same total cost.
hero member
Activity: 518
Merit: 521
February 19, 2014, 09:22:48 PM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

I disagree. I think it may be one of the most important features of superior altcoin if it is done correctly because it could help prevent off-chain fractional reserve banking as was failing in the 1800s (and thus centralization and right back to central banking again):

https://bitcointalksearch.org/topic/m.5166632
https://bitcointalksearch.org/topic/m.5182519

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

Good point. Smiley

Defeating those high-end server CPUs was another of my objectives.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.

To be more specific, not a personal computer cpu-only. As you pointed out, the high-end server CPUs from Oracle, etc could be used.
legendary
Activity: 983
Merit: 1091
February 15, 2014, 08:54:02 PM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.

I feel the importance of low block interval times is overrated.

If a cheap multicore cpu like the http://hackaday.com/2012/09/28/massively-parallel-64-core-computer-costs-99/
can already saturate DRAM's latency, then there's not much point in developing an ASIC.

If you want to call Cuckoo Cycle a memory-only PoW rather than a CPU-only PoW,
that suits me fine.
hero member
Activity: 518
Merit: 521
February 15, 2014, 04:34:34 PM
I see you haven't discovered how I made it fast. Thus Cuckoo can't support very low block times (with botnet resistance) and will increase variance.

Since Cuckoo is only main memory latency bound, someone could design a lower-cost, more efficient ASIC coupled with standard DRAM.

Thus I conclude yours is memory-coin, not cpu-only.
legendary
Activity: 983
Merit: 1091
February 15, 2014, 04:05:02 PM
Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less.

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.

Cuckoo is not suitable for very short block intervals (unless you shrink the memory requirement
below 1GB, but then you lose botnet resistance).
For a description of the algorithm, see the paper. It's not that long.

The problem of your PoW is the very fact that you have not published it...
hero member
Activity: 518
Merit: 521
February 15, 2014, 03:57:50 PM
Validation is instant in any case. Why do you bring up these point that were addressed already?

I didn't know validation is instant. Nevertheless 8 seconds may be too slow for finding blocks if block times are reduced to a minute or less.

Do you have a succinct pseudo-code description of your algorithm so I can analyze it quickly?

You haven't solved the major problem of your PoW yet.

You can't know that because you haven't seen mine.
legendary
Activity: 983
Merit: 1091
February 15, 2014, 11:30:46 AM
I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.

Cuckoo Cycle lets you set whatever memory requirement you want.
If you think 4GB is too little, then just go for more.
Validation is instant in any case. Why do you bring up these point that were addressed already?

You haven't solved the major problem of your PoW yet.
hero member
Activity: 518
Merit: 521
February 15, 2014, 07:27:39 AM
I have the survey on botnets. They target Asian gaming machines (rich boys) that typically have 8 GB.

Validation has to be several orders-of-magnitude faster than finding a block, else anti-spam and anti-DDoS doesn't work.

8 minutes is way too slow. You are orders-of-magnitude from solving the cpu-only nut. I've already solved it.
legendary
Activity: 983
Merit: 1091
February 13, 2014, 02:18:26 PM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.

Since yesterday, the implementation allows for multi-threading, and the README now says:

6) running time for the current implementation on high end x86 is under 24s/GB single-threaded,
   and under 3s/GB for 12 threads.

So 16GB would take well under a minute on a server.

In any case, I disagree that 16GB is needed to defeat botnets. Most machines in a botnet
have less than 4GB of *unused* memory, and running a 4GB cuckoo would send them into
swap-hell, not only making them useless for mining, but also alerting their owner.
So Cuckoo Cycle might actually help to shrink botnets...
hero member
Activity: 518
Merit: 521
February 12, 2014, 11:28:41 PM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.

Too slow to use 16 GB to defeat botnets:

https://github.com/tromp/cuckoo

Quote
6) running time is under 24s/GB for the current implementation on high end x86.

I stopped there because it isn't important for me to see if the rest of your design makes sense, since botnet resistance is a critical requirement in my opinion. MemoryCoin has none also.
legendary
Activity: 983
Merit: 1091
February 11, 2014, 11:33:24 PM
With all the expertise on CPU oriented PoWs in this thread,
can anyone comment on my Cuckoo Cycle design, which is
focused entirely on main memory random access latency?

https://github.com/tromp/cuckoo has a whitepaper and implementation.

Feedback is most welcome.
newbie
Activity: 1
Merit: 0
February 03, 2014, 07:16:49 PM
All a MemoryCoin 2.0 ASIC requires is two SHA512 blocks going into an AES block-- if you can generate the pseudorandom data on the fly quickly enough, there's no need to store it in memory at all.
Pages:
Jump to: