Using k-SUM to Create an ASIC/FPGA Resistant PoW

optimiz3

newbie

Activity: 37

Merit: 0

Yes - agreed. Better to establish equivalence to an existing problem now than later. Appreciate the feedback and glad this structure was explored more thoroughly.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: tromp on June 22, 2014, 09:31:49 AM

Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i
as keys in a set S, and running

for i = 1 to n
for j =i+1 to n
if S contains -(h_i + h_j) mod d
then 3-SUM found

You've got another problem here in that it is trivial to trade-off memory for time
(e.g. do the above twice, first for half of S that is even, then for the other half that's odd).

So 2-SUM and 3-SUM both seem to be memory-easy rather than memory-hard.

4-SUM and higher don't seem immune to such trade-offs either.

optimiz3

newbie

Activity: 37

Merit: 0

Quote

Like I said before, you need to choose d and n so that you expect at
most one solution.

I suppose so - working through all this it seems we've established kSUM is equivalent to the Birthday Problem but with k collisions instead of two.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 22, 2014, 10:12:48 AM

Quote

and assume d = O(n^3).

I'm using the assumption that d == n.

That's no good. If you aim to use 1Gbit of memory (128MB), and
set d = n = 2^30, then there are about (n choose 3)/d ~ 2^60/6
solutions, taking forever to try all, and you're not progress-free.

Like I said before, you need to choose d and n so that you expect at
most one solution.

optimiz3

newbie

Activity: 37

Merit: 0

Quote

Let n be the maximum number of nonces, h_1...h_n be the n hashes modulo d,
and assume d = O(n^3).
Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i
as keys in a set S, and running

for i = 1 to n
for j =i+1 to n
if S contains -(h_i + h_j) mod d
then 3-SUM found

So your memory usage is only sqrt of your running time. This is why I wrote earler:

"For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3."

Agree the upper bound on running time is O(n^2), but isn't the expected running time O(n)? We know the time to lookup h_i and h_j is O(1), and expect to find a collision 50% of the time in O(sqrt(n)). I'm using the assumption that d == n.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 22, 2014, 01:27:24 AM

Then it becomes much harder memory-wise. Since we've established the average memory requirements for 2SUM are O(sqrt(d)), and we know 3SUM can be re-expressed as 2SUM(hash0, 2SUM(hash1, hash2)), 3SUM in this form should have complexity of O(sqrt(d)) * O(sqrt(d)) = O(d) average case, with a worst case of O(n^2). Such a massive variance in potential memory use is interesting, but I would expect a miner to optimize for the 50% case at sqrt(d). Assuming this is correct, it's pretty neat that we could tie the average memory usage linearly to d.

What do you think? Does this construction seem correct/more robust?

Let n be the maximum number of nonces, h_1...h_n be the n hashes modulo d,
and assume d = O(n^3).
Then 3-SUM can be solved in O(n) memory and O(n^2) time, by storing all h_i
as keys in a set S, and running

for i = 1 to n
for j =i+1 to n
if S contains -(h_i + h_j) mod d
then 3-SUM found

So your memory usage is only sqrt of your running time. This is why I wrote earler:

"For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3."

optimiz3

newbie

Activity: 37

Merit: 0

So the first thing I'd assert is any value of d should be a power of 2. The reasoning being it avoids biasing any k-SUM when applying modulo d. A convenient side benefit is the modulo can be done via bit mask.

3SUM is interesting because it's relatively prime with any power of 2 that could be d. Also it simplifies discussion because we know the addition of the 3rd number "shifts" the target.

I.e. for SUM2 we have

(k-SUM(hash(block, nonce)) % d) == 0

and for SUM3 we roughly have

(k-SUM(hash(block, nonce)) % d) == -(3rd value % d)

However, the extra input in 3SUM doesn't really achieve much, and actually works against us complexity wise. In normal 3SUM the domain is +-Infinity, but here it is GF(d), and by wrapping around at d we know once we pick a term, we have a 1/d chance the sum of whatever is chosen for all additional terms will "wrap us around" back to 0 and yield a solution. Plus, we can then reuse any items found as part of the solution for any other item. It's relatively intuitive to extend this reasoning from 3SUM to kSUM.

Net: k should == 2 like you hinted at earlier, at least in this construction, since > 2 doesn't buy us anything.

I also pondered an alternate construction to try to mitigate the wrap-around effect of GF(d):

(k-SUM(hash(block, nonce) % d)) == 0

Here we apply modulus before k-SUM (assume the MSB of hash is a sign bit here since we don't have modulo wrap-around), but this has the adverse effect of making values closer to zero more valuable. I.e. if k=3, d=8, and you draw a 7, you can form far fewer solutions than if you had drawn a 1 due a lack of an outer modulus. So this alternate construction is out.

Incorporating your observation that 2SUM is roughly equivalent to Momentum complexity wise, I went back to the drawing board...

Taking the original 3SUM construction, I realized a significant source of weakness was the lack of ordering. If we change the construction to something like:

SUM(
  hash(block, key0, nonce0),
  hash(block, key1, nonce1),
  hash(block, key2, nonce2)) % d == 0

(key0, key1, and key2 are unique/different)

Then it becomes much harder memory-wise. Since we've established the average memory requirements for 2SUM are O(sqrt(d)), and we know 3SUM can be re-expressed as 2SUM(hash0, 2SUM(hash1, hash2)), 3SUM in this form should have complexity of O(sqrt(d)) * O(sqrt(d)) = O(d) average case, with a worst case of O(n^2). Such a massive variance in potential memory use is interesting, but I would expect a miner to optimize for the 50% case at sqrt(d). Assuming this is correct, it's pretty neat that we could tie the average memory usage linearly to d.

What do you think? Does this construction seem correct/more robust?

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 21, 2014, 04:02:23 PM

OK - there isn't any fundamental disagreement then.

Next step: choose values of k and d and write down the algorithm for solving such k-SUM.
Then we can look at runtime and memory usage and possible time-memory trade-offs
in detail...

optimiz3

newbie

Activity: 37

Merit: 0

OK - there isn't any fundamental disagreement then.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 21, 2014, 03:40:45 PM

How are you defining an attempt? If it is a single nonce range scan then we agree.

An attempt is looking for a cycle in a single graph defined by a specific header.
There is some confusion here as the word nonce is used in two senses.
There is a "macro"-nonce that's used to add entropy into the header,
and for some PoWs like Momentum and Cuckoo Cycle there's also micro-nonces
used in building a proof.
An attempt corresponds to a macro-nonce.

Quote from: optimiz3 on June 21, 2014, 03:40:45 PM

I was referring to progress as state within a nonce range scan.

That's not what progress-free means in the context of PoWs.

Progress free means that you need to make many (hundreds or more)
attempts of finding a proof, and that you gain nothing from each failing attempt.

It implies that having 10 slow computers is about as good
as having a single one that's 10 times faster.

optimiz3

newbie

Activity: 37

Merit: 0

How are you defining an attempt? If it is a single nonce range scan then we agree. I was referring to progress as state within a nonce range scan.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 21, 2014, 03:12:46 PM

Yes, while thinking about the design of memory-hard functions I realized there is a hard tradeoff here:
1. Progress-Free (scrypt; i.e. sequential memory hard)

There is no trade-off. Cuckoo Cycle is progress free.
Each attempt only has 2.5% chance of finding a 42-cycle.
When an attempt fails, either by not finding a 42-cycle,
or by not satisfying the additional hash target constraint,
you're back at square one.

If you like you can view each cycle finding attempt as
an scrypt computation that has a 97.5% chance of aborting.

optimiz3

newbie

Activity: 37

Merit: 0

Yes, while thinking about the design of memory-hard functions I realized there is a hard tradeoff here:

1. Progress-Free (scrypt; i.e. sequential memory hard)
2. Fast-To-Verify (cuckoo, ksum, momentum)

The Momentum paper somewhat identifies the mechanism in #2 indirectly by calling out the necessity of independent intermediate solutions as being required for fast verification. The very presence of independent intermediate solutions implies "progress" since they are remembered in order to construct a result solution.

Based on this, an ideal Fast-To-Verify memory-hard PoW would forces a 1:1 mapping between intermediate results and input domain values. An optimal one would only do this.

Does this sound like a correct design goal?

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 21, 2014, 02:23:34 PM

So I *think* we're agreeing, but are taking different routes to the destination. Were you referring to O(sqrt(d)) before applying Choose, or something else?

Yes, we agree it takes on average on the order of sqrt(d) samples to find a 2-sum solution.
(btw, the text you quote had some typos that I since fixed).

Quote from: optimiz3 on June 21, 2014, 02:23:34 PM

Consider if one adds a final hash step construction to kSUM similar to Cuckoo Cycle:

HASH(concat(nonce{i})) <= Target

As my paper explains this is needed for precise difficulty adjustment. If you need the PoW to
become, say, exactly 3.8 times more difficult you can just divide Target by that much.

If instead you can only control PoW difficulty by restricting the nonce-range, then it
may not be clear by how much to restrict it in order to reduce the chance of finding
a solution by exactly 3.8

optimiz3

newbie

Activity: 37

Merit: 0

It wasn't obvious to me how you were getting your O estimate of O(sqrt(d)) - I get that it represents complexity as d approaches infinity.

Quote

So for instance, 2*sqrt(d) = O(d). With that many nonces, you can form up to
2*sqrt(d) * (2*sqrt(d)-1) / 2 = 2*d-sqrt(d)
2-sums, making it pretty likely that one of them is 0 mod d.

If I rephrase this,

2*sqrt(d) * (2*sqrt(d)-1) / 2 == Choose(2 * sqrt(d), 2) == 2*d-sqrt(d)

O(2*d - sqrt(d)) == O(2*d - d^{1/2}) == O(d)

However concretely we also know that we have to sample d/2 unique hashes before we are guaranteed a 2SUM solution - also O(d).

So I *think* we're agreeing, but are taking different routes to the destination. Were you referring to O(sqrt(d)) before applying Choose, or something else? I agree that d/2 hashes on average was incorrect - it should have been d/2 hashes worst case - I'll need to do a bit more thorough complexity analysis.

Separately - I'd like to dig deeper into making this progress-free, as right now I think both a modified kSUM and Cuckoo Cycle suffer from the same thing:

Consider if one adds a final hash step construction to kSUM similar to Cuckoo Cycle:

HASH(concat(nonce{i})) <= Target

Such a construction really just constrains the domain to the nonce range. It's not obvious how adding the final HASH < Target buys anything for either kSUM or Cuckoo Cycle and may be a vector for simplification. I say this because one can arbitrarily pick nonces from potentially many partial solutions for a nonce range, and the likelihood of a partial solution gets greater as both kSUM and Cuckoo Cycle populate their sets/graphs.

bitwho

legendary

Activity: 1274

Merit: 1000

very interested. please continue.

It would be nice to see scrypt get modified and remove the ASIC effect

Vertcoin

full member

Activity: 168

Merit: 100

An interesting topic, I will keep an eye on.

tromp

legendary

Activity: 1000

Merit: 1120

Quote from: optimiz3 on June 21, 2014, 10:53:09 AM

Am I off here? How are you getting O(sqrt(d))?

You seem somewhat unfamiliar with O() notation.
Loosely speaking, it ignores constant factors.
So for instance, 2*sqrt(d) = O(sqrt(d)). With that many nonces, you can form up to
2*sqrt(d) * (2*sqrt(d)-1) / 2 = 2*d-sqrt(d)
unique 2-sums, making it pretty likely that one of them is 0 mod d.

Although not an exact instance of the birthday paradox,
it behaves similarly, in that you're looking for 2 nonces whose hash
is opposite (rather than identical) in value;
one hash is h mod d and the other is (-h) mod d.

optimiz3

newbie

Activity: 37

Merit: 0

When considering modulus by d, I did so because modulus by a fixed value has pretty well known optimizations.

Quote

With O(sqrt(d)) hashes you can form O(d) different 2-SUMs, and expect a solution,
just like with the birthday paradox.

For 3-SUM, O(d^{1/3}) hashes using O(d^{2/3}) memory should suffice.

I don't think this is an instance of the birthday paradox as we're not trying to find collisions (like in Momentum), rather we're trying to find a subset sum where P is large.

The number of unique subsets is d choose k. To make things concrete let d = 16, sqrt(16) = 4, so 4 hashes picked. Assume all are unique values and the sums don't collide. We can get all possible 2-sums of the 4 hashes via 4 choose 2 or 6.

If O(sqrt(d)) gave us O(d) different 2-SUMs, then we'd expect for 4 hashes to get 16 unique 2-SUMS, but instead we have 6.

Am I off here? How are you getting O(sqrt(d))?

To be clearer, the PoW I'm suggesting is:

SUM = 0;
for (i to k) {
SUM += hash(block, nonce{i});
}
return (SUM % d) == 0;

Collisions make it less likely that values will be found, not more likely.

EDIT: Going in to this, I did read your paper and momentum first - the multi-nonce submission was inspired from your work. Structurally one can think of kSUM as a form of graph generation; every unique number hashed adds a fully connected vertex, but the edges have weights; find a path of weight d and length k.

tromp

legendary

Activity: 1000

Merit: 1120

3 more observations:

To make the PoW progress-free, one must limit the number of nonces that can be used,
e.g to d^{1/k}, and then put a further constraint H(k nonces) <= target
on any solution. So d is more of a size parameter than a difficulty parameter.

For maximum memory-hardness, running time should be linear in memory use,
which rules out some values of k, such as 3.

The 2-sum PoW suffers from the same time-memory trade-offs that plague Momentum.

Topic: Using k-SUM to Create an ASIC/FPGA Resistant PoW (Read 1440 times)