Author

Topic: [CONCEPT+JAVA TEST VERSION] New, memory intensive PoW algo (Read 2983 times)

hero member
Activity: 784
Merit: 500
It just want to mention the Ethereum also has a memory hard proof of work algorithm.

Really? I've never seen it before. I took a look, and while I like what I see there, I do agree that it fills a specified niche, though this aims to take it farther (though in a more brute-forcey approach, I must admit).
legendary
Activity: 868
Merit: 1000
Cryptotalk.org - Get paid for every post!
It just want to mention the Ethereum also has a memory hard proof of work algorithm.
newbie
Activity: 19
Merit: 0
wonder which coin is going to use this first
hero member
Activity: 784
Merit: 500
I ended up writing up an early test implementation in Java, and posted it as a gist. Note that it differs from the original post in some aspects.
hero member
Activity: 784
Merit: 500
Thanks Grin

Remember that the values are widely tweakable. u, v, w, and the way q is chosen affect FPU, x, y, z affect disk lookup, and t affects amount of direct raw hashing needed. For a 500MH/sec SHA miner, a t value of 29 would send the miner to perform disk lookups only once a second which is just enough to push out GPUs. A value around 32 or so would be just enough to push out ASICs and some GPUs (4 billion hashes between disk lookups = 1 second of raw hashing for 4 GH/sec) while a value around 36 would make GPUs viable while making nearly any ASIC effectively useless for the hashing.

Remember that the network can also recalculate these parameters whenever it wants to based on the design of the coin itself.
full member
Activity: 165
Merit: 100
Interesting algo, having high ram or disk requirement will not solve anything. I personally think that we should keep the requirement within the general area, but be able & willing to evolve over time, to ensure that there will not be too high an incentive for people to come out with ASIC, as for GPU, I don't think they are a threat to the safety of the network as anyone can more or less set up a GPU rig.
hero member
Activity: 784
Merit: 500
I didn't have botnets in mind, that's a good point. As for the price for a fast miner, you seem to be saying two different things. "I guess that still won't stop people from finding out relatively cheap solutions to the problem." and "$10-20K server".

I'm not exactly sure what you mean by cheap, but remember that the final nonce size is also tweakable and probably the best option to set to cinch the point of it being accessible. Set it high enough but not too high, and you hit the sweet spot where massive sets of disks won't help but your ASIC will be left dependent on disks anyway.
sr. member
Activity: 462
Merit: 251
Exactly, there's no real sweet spot, though different coins will fill different niches by using different parameters. I'm looking toward the roughly 200 GB point for disk space, and most likely watch it increase over time as RAM capacities increase.

Anyway, I was considering using larger runs of data in order to give rotational platters an advantage, and that could be done by having M(input) use z nearly-consecutive numbers instead of being all over the place.

I don't plan to try to take over all of the problem cases, just try to make an offer that would be able to fill multiple ones (not at the same time).

I guess that still won't stop people from finding out relatively cheap solutions to the problem. So I can't fit the thing into RAM to be fast?

There are products like DDR drive (sort of RAM disk on PCI-x card) or this - http://techreport.com/review/16255/acard-ans-9010-serial-ata-ram-disk (64 GB of ram accessible via SATA interface)

Basically, more memory slots and you have more disk space with speed and seek times close to the RAM. 200 GB? Not a problem, put 64 GB in the machine and buy 3 of those 64GB things.... still not enough? Build 2 such computers and connect them with 10 Gbit Ethernet. Fast enough Smiley

And the casual miner? Casual miner has perhaps 500 or 1000 GB disk and probably quite full, with maybe less than 100 GB remaining. What, delete 100 GB of stuff to free the space? No way ...

This may end up as even worse difference between miners with consumer equipment (some PC with decent CPU, decent GPU and decent amount of RAM) and "professionals" with ASIC's or anything that will help them to mine really fast.

16 GB RAM stick now costs about 0.25 BTC, some better motherboards that can handle 128 GB (8 sticks) is for about 0.5 BTC, CPU, case and other stuff for also about 0.5 BTC. So for about 3 BTC you have PC with 128 GB RAM. Connect two of them with 10 Gbit ethernet and no need for a slow SSD or even disk. Or buy one and put in some sort of DDR-drive - basically more memory sticks that look like SSD to the system Smiley

Just ~3 times the price of some decent PC suitable for casual mining with traditional algorithms (scrypt, quark, ...)

$10-20K server with 40+ cores, 512G+ of memory and 1-2TB of SSDs will do equally fine, though for 2-4 times the cost.

There is no reasonable value for the limit that will still allow "ordinary people" to mine the coin, while not allowing to construct specialized hardware for this for not-so-much money.

It is true though, that this will rule out the botnets quite effectively ...
hero member
Activity: 784
Merit: 500
Yeah, as it occupies a bit of a new area for mining. I'm not actually developing an altcoin myself on this as my knowledge of C is minimal, so I just wanted to pose the concept and see if anyone might want to adapt/adopt it.


Anyway, I know of a bunch of things that can use lots of disk space but none of them are easily verifiable. Something like climate prediction is an example of this.

With that said, don't they already have $5K ASICs able to perform massive amounts of hashing? Right, they do.
legendary
Activity: 868
Merit: 1000
Cryptotalk.org - Get paid for every post!
It's an interesting idea, but as mentioned it really just shifts the ideal mining equipment away from an ASIC or GPU into an expensive server. A $10-20K server with 40+ cores, 512G+ of memory and 1-2TB of SSDs will be able to move massive IO transactions. Is that the type of PoW network you're looking to build? If so, then go for it. At least the botnet operators won't have access to machines with those specs.

Yes, let's break the banks mining for these coins!

Anyway,  interesting idea.  However,  can we think of something that computes something useful but uses a lot of memory?
legendary
Activity: 1442
Merit: 1001
It's an interesting idea, but as mentioned it really just shifts the ideal mining equipment away from an ASIC or GPU into an expensive server. A $10-20K server with 40+ cores, 512G+ of memory and 1-2TB of SSDs will be able to move massive IO transactions. Is that the type of PoW network you're looking to build? If so, then go for it. At least the botnet operators won't have access to machines with those specs.
hero member
Activity: 784
Merit: 500
Exactly, there's no real sweet spot, though different coins will fill different niches by using different parameters. I'm looking toward the roughly 200 GB point for disk space, and most likely watch it increase over time as RAM capacities increase.

Anyway, I was considering using larger runs of data in order to give rotational platters an advantage, and that could be done by having M(input) use z nearly-consecutive numbers instead of being all over the place.

I don't plan to try to take over all of the problem cases, just try to make an offer that would be able to fill multiple ones (not at the same time).
sr. member
Activity: 462
Merit: 251
as well as a practically limitless amount of extra disk data, the size of which is configured by the coin itself.

This would be hard. Nowadays having 16 or 32GB or RAM is not very expensive, so I assume if you really want to force disk access, you need more data, but merely eating more than 30 GB of data for the computation would deter many people. Also won't help much, this would hurt people with old-fashioned rotational drives, and with enough money you can buy 1 TB SSD in a PCI-x card with 1million IOPS and 1-2 GByte/sec throughput (not as fast as RAM, but much faster than ordinary SSD with perhaps 10K IOPS and 200 MB/sec throughput). If you want to rule these out, you can set the requirements to 1.5 TB or more, but then almost nobody will mine such coin.

If somebody want to try 5 such altcoins ... oops, I need 8 TBytes? I need a disk array?

Also, stuffing 64 or 96 GB of ram into a computer is also not that hard Smiley

IMHO it is impossible to find a barier that will be really barrier for at lest those moderately rich without ruling way too many users out with only slightly inferior hardware, or not enough free disk space (hard drives tend to fill up, even those several-TB pieces Smiley

hero member
Activity: 784
Merit: 500
License-related edit: I hereby release the details of this algorithm and the linked gist into the public domain. No warranty is implied or given, and I request but do not require that I hear of any coins that get created using this algo. Implementations are free to modify their implementation to suit their language needs freely.

Edit: I've created a self-contained Java version and made it available on Github Gist. Some differences exist from the description below, and as far as I can tell the code is in better shape in terms of design than the description below. All byte[]<->BigInteger conversions are big-endian, though if anyone wants to write a C version it can differ slightly from the Java version, as long as it itself is consistent. I've also noticed that u and v values were not deterministic, and in some cases one can get away with a lower u value and still a good proportion of the same hashes, as a mining optimization. The values must be consistent for verification, however.


Lately, I've been noticing a small trend in PoW algos, geared toward increasing memory requirements and decreasing ASICability. Fist came SHA, which was highly compact and fast on ASICs, followed by Scrypt and memorycoin's POW algo in the mix as well. I'd like to propose to take it a step further, and on that note, make it tweakable to suit an altcoin and even a given moment in time.

Similar to Scrypt, large amounts of data are used as intermediate parameters in a given hashing computation. This data, however, would extend to multiple GB of data in the blockchain(or any coin-specific data set like the UTXO set, as long as it can be replicated consistently), as well as a practically limitless amount of extra disk data, the size of which is configured by the coin itself.

Each set of disk-intensive hashes leaves the ability for a miner to try some different nonces in order to avoid making the entire mining operation grind disks, though disks will remain a large portion. 7 configurable values exist, called t, u, v, w, x, y, and z, in addition to a difficulty value acting similarly to how bitcoin handles difficulty values. t, u, v, w, x, y, and z will be discussed at the end. q is also a specialized parameter.

To begin, a first nonce is chosen, and hashed with the rest of the coinbase using SHA256, twice. This will produce a 256-bit hash, obviously. The hash is taken modulo ((size of blockchain or other coin-specific data rounded down to multiple of 1024) + 7). Such a modulo is a slow operation, and difficult for any potential optimizations. The result is then used to select a position, and length is selected according to hash % (65537), which is a prime which complicates certain naive implementations of modulo by increasing the time needed.

The data selected by the offset and length is concatenated onto the old hash, which is then hashed and processed in the likewise manner x times more. The data itself must be replicable (entire longest-chain concatenated would work) for a given block to be mined.

Once that is complete, a new portion of the algorithm kicks in, which will provide even more RAM-capacity resistance on smaller blockchains.

In this, y hash operations will be performed in a loop on a piece of data, where each operation uses newHash = H(oldHash+M(oldHash)) where + is concatenation and m is, in pseudocode:

Code:
M(input):
hval = input
do for *z* times:
    hVal = H(hVal XOR H(
                     prm +
                     [*u* highest bits of decimal portion of 11th root of P((hVal % *w*)) ]
                 )

endloop
return hVal

where P(n) is q^n, and *u* is parameter u from above. prm is just some constant to be determined by the coin. It may be the coin's name, some message, a prime, but must remain constant for a given block. It ma be defined based on the previous block.

The square root must be calculated by a specific iterative algorithm where ϵ is less than 2^(-v). Strict conformance to IEEE 754 without wider intermediates (akin to java's strictfp keyword) is required.

This operation is extremely difficult to do at high speed, and must be cached on disk for mining. A single verification will still be able to function without this massive memory requirement. Depending on w, this could require massive amounts of disk space.

After y such hashings, the resulting hash can be concatenated with a strictly t-bit nonce, and hashed by a simple SHA256. This nonce allows for some disk-less operation, but is limited by the generation params.

Parameters:

u: Multiplies size of second-stage cache, requires special handling in n-th root operation if larger than what the FPU can handle alone due to its supported number widths.

v: Determines how much floating-point calculation goes into stage 2. This will determine time needed to create the cache. If prm is defined to be based on the block, a cache for stage 2 would have to be recreated with each new block, meaning that floating-point results must be separately cached.

q: This may be set per-block or statically. If per-block, it would require all floating point data to be recalculated per block. It may be defined as pi or e when static, and should be irrational or fully use the maximum precision afforded by an IEEE 754 double.

w: Determines how many different floating point results must be cached. Larger means direct disk space requirement increases.

x: Basic timing parameter for stage 1, specifying how many blockchain/other per-block data set lookups would be needed.

y: How many stage 2 rounds.

z: How many stage 2 hashes in each round.

The performance breakdown for verifying a hash without cached data is:

y*z* log2v floating-point operations for taking roots. y*z incommensurable floating point exponentiation operations.

x mandatory disk lookups

2*x + y*z SHA256 hash operations

All of these parameters can be tweaked by the coin, whether every certain number of blocks, akin to difficulty adjustments, or fixed. The former would benefit from a coin that can detect what kind of mining power is gaining a monopoly.

Edit: Added parameter t. This parameter can make or break ASIC exclusion and some of the goals of this PoW, by putting it in the sweet spot where the high-end disk-laden servers discussed below are not needed, but an ASIC or even possibly a GPU would perform too much IO to disk to be justifiable. Of course, if one wishes to favor massive disk-laden servers, they can set this parameter to be lower.
Jump to: