Speeding up verification of digital signatures

gmaxwell

staff

Activity: 4284

Merit: 8808

So, if you just need the sign of R.y and still get a reasonable speedup then maybe (e.g. with the sqrt cost). If you have to have the full value for R thats far less clear, as it would increase the signature size by 32 bytes, and thats not really acceptable. (Also even with R— I don't think you can avoid the sqrt because I think you need to check that R is actually on the curve, otherwise I think you'll create an attack)

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: gmaxwell on January 23, 2014, 08:31:13 PM

Are you really getting a 2.5x while not having the y coordinate sign and doing combinatorial recovery? If so thats really surprising to me.

No, it assumes that R is known.

For actual implementation, it would require that an odd/even bit is provided.

The square root operation is an additional potential slow down.

Signatures that don't include this info could be marked as non-standard and have to be individually checked. If it was non-standard to not include the info and the reference client included it, then most transactions would have the extra info.

Quote

Can you try implementing this on top of libsecp256k1? Getting a speedup on a very slow implementation— even though the speedup is purely algorithmic— is less interesting than getting it on a state of the art implementation.

I could have a look. Is it worth it if R has to be provided?

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: prezbo on January 23, 2014, 06:28:14 PM

It wouldn't cause a fork, it's just another, more efficient version of checking signatures to the one already used - which is check every signature individually.

There is risk in anything changed.

We get a 6x increase from changing from openssl (which is faster than bouncy castle) to libsecp256k1, but we've not switched to that it in the reference software in part because of concern for the correctness of the implementation as it is not very mature and has not had a lot of review. That also doesn't have the integration complexity of batch validation.

We (at least Sipa and I) were previously aware of batch validation and IIRC sipa experimented with it some for another curve.

Are you really getting a 2.5x while not having the y coordinate sign and doing combinatorial recovery? If so thats really surprising to me.

Can you try implementing this on top of libsecp256k1? Getting a speedup on a very slow implementation— even though the speedup is purely algorithmic— is less interesting than getting it on a state of the art implementation.

Altoidnerd

sr. member

Activity: 406

Merit: 251

http://altoidnerd.com

I highly encourage TierNolan to try to get some attention to this. He can handle the mailing list I'd say he has the chops.

prezbo

sr. member

Activity: 430

Merit: 250

Quote from: Altoidnerd on January 23, 2014, 06:25:42 PM

It would help adoption. The QT startup time is getting bad. It would buy the core devs time to work another more permanent solution. But I don't know how feasible this is without a fork or something.

basically, you're not gonna get core dev's attention here anymore. But run it by them because it looks like you did your homework. Thats my 2 bits.

It wouldn't cause a fork, it's just another, more efficient version of checking signatures to the one already used - which is check every signature individually.

Altoidnerd

sr. member

Activity: 406

Merit: 251

http://altoidnerd.com

It would help adoption. The QT startup time is getting bad. It would buy the core devs time to work another more permanent solution. But I don't know how feasible this is without a fork or something.

basically, you're not gonna get core dev's attention here anymore. But run it by them because it looks like you did your homework. Thats my 2 bits.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: Altoidnerd on January 23, 2014, 06:11:56 PM

You should send your work to the development mailing list. This looks like something good.

Well, a 2.5X improvement isn't that ground-breaking.

It would have to be much higher to make it worth it, due to the added risk.

Altoidnerd

sr. member

Activity: 406

Merit: 251

http://altoidnerd.com

You should send your work to the development mailing list. This looks like something good.

The big blockchain problem is a serious one. Great work...

Sorry that I can only help if its written in python, I'm a very bad programmer.

prezbo

sr. member

Activity: 430

Merit: 250

Quote from: TierNolan on January 23, 2014, 05:38:05 PM

Quote from: deuteragenie on January 23, 2014, 05:20:07 PM

Wow! Impressive how fast you did this!

Thanks, but it is mostly the fact that the bouncy castle library has an EC point library built in.

Also, the formula just "worked". I had expected an hour or two of debugging at least.

Quote

- I suppose that addition can be sped up in bounty castle because parameter 'A' is zero for the BC curve.

Dunno. It looks like addition doesn't actually depend on the curve parameters, weirdly.

It should, but that's probably taken care of in the addition part of the code. I don't think you can do much better than what you've done. This article proposes some additional improvements on the naive double-and-add but I've just skimmed the paper so not sure if it's worth the effort.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: deuteragenie on January 23, 2014, 05:20:07 PM

Wow! Impressive how fast you did this!

Thanks, but it is mostly the fact that the bouncy castle library has an EC point library built in.

Also, the formula just "worked". I had expected an hour or two of debugging at least.

Quote

- I suppose that addition can be sped up in bounty castle because parameter 'A' is zero for the BC curve.

Dunno. It looks like addition doesn't actually depend on the curve parameters, weirdly.

Quote

- This part of your code is interesting:

for (int j = 255; j >= 0; j--) {
         sum = sum.twice();
         for (int i = 0; i < len; i++) {
            if (u2.testBit(j)) {
               sum = sum.add(Q);
            }
         }
      }

I do not know how much the 'if' and the 'testbit' method cost, but theoretically they could be replaced by a symbolic expression evaluator, at the cost of traversing a tree and a few pointers. Not sure what would be fastest.

Java doesn't really do pointers like that. However, I think the EC calcs are probably the limiting factor, so that shouldn't matter.

Test bit is probably implemented as an array lookup + bit mask.

Quote

- I am unsure whether the 'attack' on batch verifications apply to the use of signature verifications of the blockchain in the BC context. Presumably, one could apply batch verify until, say, the last 1000 blocks, then switch to the individual verifications.

Dunno. There is a paper that talks about protecting it by using random numbers.

You multiply u1(i) and u2(i) by a constant. It means you have to multiply R(i) by the same constant though, so that means the LHS has more calcs too.

deuteragenie

newbie

Activity: 36

Merit: 0

Wow! Impressive how fast you did this!

- I suppose that addition can be sped up in bounty castle because parameter 'A' is zero for the BC curve.
- This part of your code is interesting:

for (int j = 255; j >= 0; j--) {
         sum = sum.twice();
         for (int i = 0; i < len; i++) {
            if (u2.testBit(j)) {
               sum = sum.add(Q);
            }
         }
      }

I do not know how much the 'if' and the 'testbit' method cost, but theoretically they could be replaced by a symbolic expression evaluator, at the cost of traversing a tree and a few pointers. Not sure what would be fastest.

- I am unsure whether the 'attack' on batch verifications apply to the use of signature verifications of the blockchain in the BC context. Presumably, one could apply batch verify until, say, the last 1000 blocks, then switch to the individual verifications.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: Altoidnerd on January 23, 2014, 02:29:53 PM

I'm not sure why you defined Q_k in the final step.

I had a go at coding it.

The results are

Bouncy castle: 2.5ms per verify

Naive Implementation: 3.3 per verify

Batch: 1.0ms per verify

It is 2.5X faster. If it was combined with the internal tricks that bouncy castle do, then maybe it would be 4-5X bouncy castle's speed.

The batch results show that the EC calcs are what is taking the time, so it is raw EC time limited.

That means the randomizer thing might be possible.

It saves lots of multiplies, but it does it by doing lots of adds.

I also calculate R directly, so it doesn't have to make multiple attempts to find a solution.

Altoidnerd

sr. member

Activity: 406

Merit: 251

http://altoidnerd.com

I'm not sure why you defined Q_k in the final step.

Moreover I believe this algebra is fine, but I'm not convinced a computer will be able
to rearrange a block like this actually resulting in a speed up.

We need an analysis in a particular computer language or simply a test to see if anything happens faster.

TierNolan

legendary

Activity: 1232

Merit: 1094

Would this work?

It is based on "Shamir's trick".

The only EC operations are double and add. This allows you build up a multiply.

a = a0 + a1*2^1 + a2*2^2 + .... an-1*2^(n-1)

which can be simplified to

a = a0 + 2*(a1 + 2*(a2 + 2*(... an-1)))

aG can then be expressed as

aG = a0*G + 2*(a1*G + 2*(a2*G + 2*(.... an-1*G)))

This is the standard double and add rule. You start from the MSB and work backwards.

I am going to assume 3 bit numbers, which shouldn't reduce the argument.

aG = a0*G + 2*(a1*G + 2*(a2*G))
bQ = b0*Q + 2*(b1*Q + 2*(b2*Q))

aG + bQ = a0*G + 2*(a1*G + 2*(a2*G)) + b0*Q + 2*(b1*Q + 2*(b2*Q))

re-arrange
aG + bQ = (a0*G + b0*Q) + 2*(a1*G + 2*(a2*G)) +2*(b1*Q + 2*(b2*Q))

factor by 2
aG + bQ = (a0*G + b0*Q) + 2*(a1*G + 2*(a2*G) + b1*Q + 2*(b2*Q))

re-arrange
aG + bQ = (a0*G + b0*Q) + 2*((a1*G + b1*Q) + 2*(a2*G) + 2*(b2*Q))

factor by 2
aG + bQ = (a0*G + b0*Q) + 2*((a1*G + b1*Q) + 2*(a2*G + b2*Q))

The effect is that you can handle the sum of 2 products with the same number of doublings as a single doubling.

Note: multiply by a0, b1 etc aren't really multiplies, since they are single bit numbers. You multiply by 1 or 0.

Anyway, for batch mode, that seems to give an equivalent boost.

R1 = a1 * G + b1 * Q
R2 = a2 * G + b2 * Q
R3 = a3 * G + b3 * Q
R4 = a4 * G + b4 * Q

converts to

R1 + R2 + R3 + R4 = a1 * G + b1 * Q1 + a2 * G + b2 * Q2 + a3 * G + b3 * Q3 + a4 * G + b4 * Q4

Even with different public keys you can still get the boost in speed.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: deuteragenie on January 23, 2014, 03:58:31 AM

Why do you say that the technique is potentially "insecure" ? This would only be used to verify signatures on the blockchain, how can this be "insecure" ?
The signature verification can be tested thoroughly against a reference implementation (OpenSSL and sipa), so the likelihood to incorrectly verify signatures (either declaring them valid, when they are not, or vice-versa) is very low.

The method will definitely pass valid signatures.

I was concerned that there might be a way to sign invalid signatures somehow.

From Google, the authors have presented a system which uses "Randomisers". That suggests that their original scheme had some potential (or actual) weakness.

Verify

A = B
C = D

is weaker that just verifying

A + C = B + D

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: Altoidnerd on January 22, 2014, 03:49:27 PM

So am I understanding the speedup is by recognition of signatures by a particular key within a single block?

The signature test requires 2 point multiplications. One of them is a multiplication by the generation point (G) and the other is by the public key (Q).

Assuming you are checking 4 signatures, you normally check:

R1 = a1 * G + b1 * Q
R2 = a2 * G + b2 * Q
R3 = a3 * G + b3 * Q
R4 = a4 * G + b4 * Q

This can be simplified to

(R1 + R2 + R3 + R4) = (a1 + a2 + a3 + a4) * G + (b1 + b2 + b3 + b4) * Q

Point addition and normal integer addition are (relatively) very fast. That reduces the number of multiplies to 2, no matter how many signatures.

However, if Q is different for each signature (different public key), then you can only reduce the complexity of the (a1 + a2 + a3 + a4) * G. So, at most half of the multiplies are eliminated in that case.

deuteragenie

newbie

Activity: 36

Merit: 0

It would be good if there was an implementation of this available somewhere to try to figure out what the actual speed up could be, as it is not possible, on modern CPU architecture to determine the speed up by counting the number of adds/muls. Unfortunately, I have not been able to identify an implementation.

Quote from: TierNolan on January 22, 2014, 02:41:34 PM

The custom bitcoin curve code gives a much higher effect without using potentially an insecure technique.

Is it so that sipa's custom code is used in the current implementation ?

Why do you say that the technique is potentially "insecure" ? This would only be used to verify signatures on the blockchain, how can this be "insecure" ?
The signature verification can be tested thoroughly against a reference implementation (OpenSSL and sipa), so the likelihood to incorrectly verify signatures (either declaring them valid, when they are not, or vice-versa) is very low.

prezbo

sr. member

Activity: 430

Merit: 250

Quote from: Altoidnerd on January 22, 2014, 03:49:27 PM

Quote from: prezbo on January 22, 2014, 11:57:20 AM

See equation 3 of the naive algoritm, because the curve point multiplication with base point P is done only once per batch of signatures it offers at least about 2x the speedup in any case.

Sorry to go back to this point...looking at Eq (3), it refers to the situation in which addresses are re-used, a behavior we recently had a thread about (it is discouraged). THis makes me suspicious

Reusing addresses would provide additional speedup, but 2x would be guarantied even with unique addresses.

Altoidnerd

sr. member

Activity: 406

Merit: 251

http://altoidnerd.com

Quote from: prezbo on January 22, 2014, 11:57:20 AM

See equation 3 of the naive algoritm, because the curve point multiplication with base point P is done only once per batch of signatures it offers at least about 2x the speedup in any case.

Sorry to go back to this point...looking at Eq (3), it refers to the situation in which addresses are re-used, a behavior we recently had a thread about (it is discouraged). THis makes me suspicious

Quote from: deuteragenie on January 22, 2014, 02:23:54 PM

Relevant part of the introduction:
' In this paper, we propose three algorithms to verify original ECDSA signatures in batches. Our algorithms apply to all cases of ECDSA signatures sharing the same curve parameters, although we obtain good speedup figures when all the signatures in the batch come from the same signer.
Our algorithms are effective only for small batch sizes'

So am I understanding the speedup is by recognition of signatures by a particular key within a single block?

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: prezbo on January 22, 2014, 02:28:01 PM

That is correct, the question is if this is significant enough to bother with it.

I don't think it would be top priority. The custom bitcoin curve code gives a much higher effect without using potentially an insecure technique.

Topic: Speeding up verification of digital signatures (Read 1413 times)