License-related edit: I hereby release the details of this algorithm and the linked gist into the public domain. No warranty is implied or given, and I request but do not require that I hear of any coins that get created using this algo. Implementations are free to modify their implementation to suit their language needs freely.
Edit: I've created a self-contained Java version and made it available on
Github Gist. Some differences exist from the description below, and as far as I can tell the code is in better shape in terms of design than the description below. All byte[]<->BigInteger conversions are big-endian, though if anyone wants to write a C version it can differ slightly from the Java version, as long as it itself is consistent. I've also noticed that u and v values were not deterministic, and in some cases one can get away with a lower u value and still a good proportion of the same hashes, as a mining optimization. The values must be consistent for verification, however.
Lately, I've been noticing a small trend in PoW algos, geared toward increasing memory requirements and decreasing ASICability. Fist came SHA, which was highly compact and fast on ASICs, followed by Scrypt and memorycoin's POW algo in the mix as well. I'd like to propose to take it a step further, and on that note, make it tweakable to suit an altcoin and even a given moment in time.
Similar to Scrypt, large amounts of data are used as intermediate parameters in a given hashing computation. This data, however, would extend to multiple GB of data in the blockchain(or any coin-specific data set like the UTXO set, as long as it can be replicated consistently), as well as a practically limitless amount of extra disk data, the size of which is configured by the coin itself.
Each set of disk-intensive hashes leaves the ability for a miner to try some different nonces in order to avoid making the entire mining operation grind disks, though disks will remain a large portion. 7 configurable values exist, called t, u, v, w, x, y, and z, in addition to a difficulty value acting similarly to how bitcoin handles difficulty values. t, u, v, w, x, y, and z will be discussed at the end. q is also a specialized parameter.
To begin, a first nonce is chosen, and hashed with the rest of the coinbase using SHA256, twice. This will produce a 256-bit hash, obviously. The hash is taken modulo ((size of blockchain or other coin-specific data rounded down to multiple of 1024) + 7). Such a modulo is a slow operation, and difficult for any potential optimizations. The result is then used to select a position, and length is selected according to hash % (65537), which is a prime which complicates certain naive implementations of modulo by increasing the time needed.
The data selected by the offset and length is concatenated onto the old hash, which is then hashed and processed in the likewise manner
x times more. The data itself must be replicable (entire longest-chain concatenated would work) for a given block to be mined.
Once that is complete, a new portion of the algorithm kicks in, which will provide even more RAM-capacity resistance on smaller blockchains.
In this,
y hash operations will be performed in a loop on a piece of data, where each operation uses newHash = H(oldHash+M(oldHash)) where + is concatenation and m is, in pseudocode:
M(input):
hval = input
do for *z* times:
hVal = H(hVal XOR H(
prm +
[*u* highest bits of decimal portion of 11th root of P((hVal % *w*)) ]
)
endloop
return hVal
where P(n) is q^n, and *u* is parameter u from above. prm is just some constant to be determined by the coin. It may be the coin's name, some message, a prime, but must remain constant for a given block. It ma be defined based on the previous block.
The square root must be calculated by
a specific iterative algorithm where ϵ is less than 2^(-v). Strict conformance to IEEE 754 without wider intermediates (akin to java's strictfp keyword) is required.
This operation is extremely difficult to do at high speed, and must be cached on disk for mining. A single verification will still be able to function without this massive memory requirement. Depending on w, this could require massive amounts of disk space.
After y such hashings, the resulting hash can be concatenated with a strictly t-bit nonce, and hashed by a simple SHA256. This nonce allows for some disk-less operation, but is limited by the generation params.
Parameters:
u: Multiplies size of second-stage cache, requires special handling in n-th root operation if larger than what the FPU can handle alone due to its supported number widths.
v: Determines how much floating-point calculation goes into stage 2. This will determine time needed to create the cache. If prm is defined to be based on the block, a cache for stage 2 would have to be recreated with each new block, meaning that floating-point results must be separately cached.
q: This may be set per-block or statically. If per-block, it would require all floating point data to be recalculated per block. It may be defined as pi or e when static, and should be irrational or fully use the maximum precision afforded by an IEEE 754 double.
w: Determines how many different floating point results must be cached. Larger means direct disk space requirement increases.
x: Basic timing parameter for stage 1, specifying how many blockchain/other per-block data set lookups would be needed.
y: How many stage 2 rounds.
z: How many stage 2 hashes in each round.
The performance breakdown for verifying a hash without cached data is:
y*z* log
2v floating-point operations for taking roots. y*z incommensurable floating point exponentiation operations.
x mandatory disk lookups
2*x + y*z SHA256 hash operations
All of these parameters can be tweaked by the coin, whether every certain number of blocks, akin to difficulty adjustments, or fixed. The former would benefit from a coin that can detect what kind of mining power is gaining a monopoly.
Edit: Added parameter t. This parameter can
make or break ASIC exclusion and some of the goals of this PoW, by putting it in the sweet spot where the high-end disk-laden servers discussed below are not needed, but an ASIC or even possibly a GPU would perform too much IO to disk to be justifiable. Of course, if one wishes to favor massive disk-laden servers, they can set this parameter to be lower.