Theoretical max speeds for ECDSA verification

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Thanks for the input Hal, it's tremendously helpful as always to have a professional cryptographer look over these things. We know it takes effort to post these days and appreciate it a lot.

The 40% speed up from your code is definitely one of the next lowest hanging fruits after multi-threading the verification. I suspect intelligent micro-optimization (using SSE3 etc) can get us a bit more speed too.

The reason you didn't see much speedup from switching off signature verification is that the database code is very inefficient. You could try merging sipas ultraprune branch. You can also try my LevelDB branch. I don't think they're mergeable together unfortunately, at least not yet (one needs to be rebased on the other). LevelDB uses a more efficient on-disk format and moves disk IO to a separate thread, ultraprune dramatically shrinks the size of the database (to just hold unspent outputs) which reduces the working set and makes it also a lot more efficient. With these changes we've seen block chain indexing times measured in minutes when signature verification is disabled.

Once all the database work is merged in, Bitcoin will be bottlenecked on CPU for the foreseeable future, which is correct and as it should be. That's why I'm now investigating how far we can push ECDSA verification - I want to re-calculate the scalability numbers. There's often talk of Bitcoin being just a payments backbone or settlement protocol between "bitbanks" but I hope we can avoid that and stay as close to Satoshis vision as possible.

With the latest results from the research community, it's looking increasingly feasible that a single powerful computer (by todays standards) could handle basically all the transactions that credit card companies/PayPal handle and more. Of course the way we use Bitcoin in future will be different and I think cheap transactions will open up new use cases that weren't previously possible, so in the end to handle the same number of people as the CC infrastructure will require more tx throughput than they do. But even so, the need for multi-machine "supernodes" seems to be receding.

Hal

vip

Activity: 314

Merit: 4276

Hi, Mike. I am still alive alive but greatly limited.

I looked at the first paper, which as you know is not about batch verification. It gives a speed up if you know R that we should already get with just r, with our pseudo Koblitz curve. But I discovered that I had missed an essential optimization in my Koblitz implementation, which is to split the G multiplication into two, with half size exponents. This should increase the speed-up from 20% to more like the 40% they claim. So thanks! (Not that I can do anything about it. Hopefully this hint will be sufficient.)

On batch verification, I had thought of a much kludgier approach. Bear in mind that in the current signature (r,s), r is just the x-coordinate of R. So we need to supply the y-coordinate. (Actually, that's not important, but it saves a little space.) So presently the scriptSig pushes the key and the signature on the stack. My idea was to first push a magic number and the y-coordinate of R. Hopefully, legacy code would ignore these two deeper stack values, and just verify the signature traditionally. But new code could check for the magic number, reconstruct R, and do batch verification. This way transactions could be broadcast that were backwards compatible.

Your idea is cleaner but I am concerned about transaction hash consistency between old and new nodes. Old nodes don't see R, so I think new nodes can't hash it. And it might have to be that way forever. On the plus side, this would let you go back and convert all existing transactions to the new format, giving benefit from batch verification immediately.

I'll try to look at the second link tomorrow and see if such dramatic speedups are possible. I did try an experiment where I patched out signature verification entirely, just had it return true. I didn't get all the speedup I was hoping for on initial blockchain download. Be interesting to repeat that now that it takes so much longer.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

I spent some time reading the literature to learn more about batch verification and other acceleration techniques. I think they could be applied to give us significant speedups, without any backwards incompatible or hard-forking changes.

Firstly we have to read and understand this paper:

http://www.mathnet.or.kr/mathnet/preprint_file/cacr/2005/cacr2005-28.pdf

The primary technique outlined in it is not applicable to Bitcoin because, as they say:

Quote

From the case analysis in the previous section, it follows that the transformation of the signature veriﬁcation equation signiﬁcantly reduces the size of the scalar multiples involved in this equation (if one can store a single precomputed multiple of the generator of G). As a result, we might expect a considerable speed-up of signature veriﬁcations, since this eliminates a large number of point doubling operations, a common element in multiple point multiplications. We expect signiﬁcant improvements, both for prime curves and for random binary curves. For Koblitz curves, we can only expect a marginal improvement, since for those curves the Frobenius map (which assumes the role of point doubling) comes almost for free.

However, the modified ECDSA* algorithm is a requirement to use the batch verification technique from Cheon and Yi.

ECDSA* is a simple transformation of ECDSA that yields larger signatures, but which can be (batch) verified more quickly. The signatures can be converted between forms and have equivalent security. The main difference is that instead of a signature being (r, s) where r is the X co-ordinate of the point in affine space mod N, the signature becomes (R, s) where R is the full point including the Y co-ordinate. Of course "r" is still needed during the verification phase but it's a fast transform and the extra data is useful later.

The technique that matters to us is the one described in section 4.2 of this paper:

http://www.iacr.org/archive/pkc2007/44500442/44500442.pdf

They claim a 4x speedup when batching verifications together for the multiple-signer case, 7x more for the case of single signers (obtainable by grouping signatures by address) and a 9x speedup for single signers on Koblitz curves. At least I think it's 9x. My understanding of section 5 is a bit ropey - I'd appreciate somebody with a stronger grasp of elliptic curve mathematics reading it and checking that it really is 9x, as the comparisons they give at the end are a bit confusing.

To retrofit Bitcoin for this without the need to redefine any opcodes or do anything else that modifies the core messages, we can define CTransaction2 message that wraps a CTransaction. Outside the CTransaction message it provides the R values for each signature separately. Nodes store CTransaction2 messages. They are sent only to nodes that support a sufficiently high protocol version, otherwise the wrapped CTransaction is sent instead and older nodes perceive no difference. Obviously block hashing and so on is all done on the wrapped messages. When an (r,s) signature is received from an older node, the newer node calculates R from it so it can save newer nodes the additional work of doing the conversions themselves.

The result is basically a bandwidth/cpu tradeoff. You save CPU time but have to receive/send a bit more data per input. As people upgrade the chances of a node hearing a transaction in the newer format that is more efficiently verified increases. As transactions come in their input signatures are sorted by key, queued up and individual CPU cores pop groups off the queue as fast as they can. Small delays may be introduced to make a propagation latency/cpu load tradeoff - if load is getting too high then it may be worth leaving work in the queues for longer in the hope that more transactions come along that spend outputs with the same addresses, allowing more work amortization.

For obvious reasons batch verification is applicable primarily when transaction rates are very high. Otherwise it's just a way to make the tail-end of initial block download faster, which over the longer term will become less and less interesting as nodes become more stable and long-lived.

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Quote from: Mike Hearn on August 26, 2012, 08:38:41 AM

What I meant is I go and find an output on the chain that the un-upgraded merchant will see as a NOP/always-valid. Then I send him a transaction that connects to it and then a regular old-style output that uses the address the merchant gave me. I find the merchants IP address (maybe the node is running on the web server) and send them the transaction. They "verify" it and discover the input script satisfies the output, which causes them to believe that they received money.

Right, that was one of the lessons learned from BIP16-- transactions redeeming non-standard inputs aught to be treated as non-standard. And they are, as of v0.6 (I think, I'm really good at forgetting when changes were introduced).

If the merchant is using stock bitcoind, then the non-standard transaction won't show up in their wallet until it appears in a block with 1 confirmation, making the security downgrade non-existent (it becomes a variation on the Finney attack).

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote from: Gavin Andresen on August 25, 2012, 01:14:38 PM

That's wrong; you wouldn't be able to send a new-style transaction to a merchant unless they'd already upgraded and were publishing new-style addresses.

What I meant is I go and find an output on the chain that the un-upgraded merchant will see as a NOP/always-valid. Then I send him a transaction that connects to it and then a regular old-style output that uses the address the merchant gave me. I find the merchants IP address (maybe the node is running on the web server) and send them the transaction. They "verify" it and discover the input script satisfies the output, which causes them to believe that they received money.

Of course if the merchant is waiting for N confirmations it gets harder and harder, but essentially that's the SPV security model. The merchant was downgraded to it without realizing.

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: CodesInChaos on August 25, 2012, 02:58:28 PM

As a side-note: Ed25519 is not just EcDSA on another curve. It's a different signature algorithm based on Schnorr signatures. Not sure how big the consequences of that are, but I suspect public key recovery isn't possible with it.

Public key recovery is possible with it if a trivial modification is made. (IIRC the algorithm, for mostly belt-and-suspenders reasons mixes a hash of the expected pubkey into the value being signed, so this would need to be changed to the hash of the address instead)

CodesInChaos

newbie

Activity: 19

Merit: 0

Quote from: Gavin Andresen on August 25, 2012, 11:05:18 AM

Quote from: Mike Hearn on August 25, 2012, 08:23:43 AM

If we manage the first hard fork process successfully...

I don't think a new signature algorithm doesn't require a hard fork; redefine an OP_NOP as OP_CHECKSIGVERIFY2 that uses ed25519, create a new 'standard' transaction type that uses that new opcode, and a new bitcoin address type that corresponds to it

If you add instructions, please split it into an instruction that pushes the serialized transaction, and one that verifies the signature of the data on the stack. Having a general purpose signature verification instruction would be nice. In particular it would enable an interesting variant of off-the-chain transactions.

And why a new address type? Aren't script hash addresses enough?

----

As a side-note: Ed25519 is not just EcDSA on another curve. It's a different signature algorithm based on Schnorr signatures. Not sure how big the consequences of that are, but I suspect public key recovery isn't possible with it.

twobitcoins

full member

Activity: 144

Merit: 100

Quote from: Gavin Andresen on August 25, 2012, 11:05:18 AM

... a new bitcoin address type ...

Why would a new address type be needed? I thought the point of P2SH was that it's the last time a new address type is ever needed. Anyone who can send to P2SH addresses should automatically be able to send to addresses using new verification rules.

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Quote from: Mike Hearn on August 25, 2012, 11:24:32 AM

Redefining NOPs means that merchants who don't upgrade get effectively downgraded to SPV level security (trust the chain strength) rather than full verification because I could send them a transaction that spends anyone elses money and they'd accept it. Or is that wrong?

That's wrong; you wouldn't be able to send a new-style transaction to a merchant unless they'd already upgraded and were publishing new-style addresses.

... which would actually just be BIP16 addresses, with the redemption script being something like OP_NOP1
(I'm wrong about needing a new bitcoin address type).

Obviously merchants wouldn't start doing that until a majority of miners were interpreting OP_NOP1 as OP_ED25519_VERIFY.

Quote

I don't think now is the right time to do any of that, mostly because I wouldn't be surprised if some solution for instant "off the chain" payments is adopted instead, in which case perhaps sep256k1 transaction cost will be negligible.

What would such a solution look like?

http://gavintech.blogspot.com/2012/07/off-chain-transactions.html

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote from: Gavin Andresen on August 25, 2012, 11:05:18 AM

That would probably be better than a hard fork; I'm not sure what the transition plan would look like for old transactions if OP_CHECKSIG was redefined to use the ed25519 curve.

Sure, I was thinking more that there'd just be a new sighash flag indicating that the signature was of a new type.

Redefining NOPs means that merchants who don't upgrade get effectively downgraded to SPV level security (trust the chain strength) rather than full verification because I could send them a transaction that spends anyone elses money and they'd accept it. Or is that wrong? And zero-confirmation transactions effectively become worthless, of course.

I think if I had a Bitcoin using website that I wasn't paying much attention to I'd rather it simply stopped working than went through a silent security downgrade. Once people complain to me the site isn't noticing new payments, it's just a software upgrade to fix the outage. But if somebody pays me with somebody elses coins and manages to feed me such a block, that could result in serious losses.

Quote

I don't think now is the right time to do any of that, mostly because I wouldn't be surprised if some solution for instant "off the chain" payments is adopted instead, in which case perhaps sep256k1 transaction cost will be negligible.

What would such a solution look like?

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: Mike Hearn on August 25, 2012, 10:35:41 AM

As far as I know the latest results for GPU/FPGA acceleration of ECDSA show either no speedups at all or only trivial speedups. It's not something that is inherently superior with dedicated hardware, apparently.

At any rate given the potential for nearly 100k (batch) verifications per second just using software, it seems better to pursue those approaches rather. Your point about the security of the chosen curves is well made, but I'm not sure how you'd get more confidence in the security of a new curve or version of ECDSA without trying it in the wild.

It isn't just raw speed, it is speed per dollar, or speed per watt. If you can plug in a low power USB device that verifies sigs as well as your CPU, that's a server you don't need to buy, power and cool. Might be worth looking into, I'm not saying it is the answer.

Well, are we in the innovative cryptography business, or the innovative money business? I know that there is some overlap, but I'd prefer that we be as conservative as we can be and let other people take the risks that we don't absolutely have to. Not that I think that there will be a zero-day released for Secp256k1 some day...

Gavin Andresen

legendary

Activity: 1652

Merit: 2316

Chief Scientist

Quote from: Mike Hearn on August 25, 2012, 08:23:43 AM

If we manage the first hard fork process successfully...

I don't think a new signature algorithm doesn't require a hard fork; redefine an OP_NOP as OP_CHECKSIGVERIFY2 that uses ed25519, create a new 'standard' transaction type that uses that new opcode, and a new bitcoin address type that corresponds to it, then start by rolling out miner support, etc. as sketched out here.

That would probably be better than a hard fork; I'm not sure what the transition plan would look like for old transactions if OP_CHECKSIG was redefined to use the ed25519 curve.

If the new transaction type was significantly cheaper, then cheaper transaction fees could incentivize people to upgrade their clients/wallets.

I don't think now is the right time to do any of that, mostly because I wouldn't be surprised if some solution for instant "off the chain" payments is adopted instead, in which case perhaps sep256k1 transaction cost will be negligible.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Quote from: Pieter Wuille on August 25, 2012, 09:04:04 AM

There are more potential optimizations:

Yes, you'll see I mentioned both of these already:

Quote

DJB and friends claim that with their ed25519 curve (the "ed" is for Edwards) and careful implementation they can do batch verification of 70,000+ signatures per second on a cheap quad-core Intel Westmere chip

That implementation of ed25519 is written in assembly and is crafted specifically to use as much ILP as possible, along with whatever CPU instructions are available (they have an implementation that requires amd64 support).

As far as I know the latest results for GPU/FPGA acceleration of ECDSA show either no speedups at all or only trivial speedups. It's not something that is inherently superior with dedicated hardware, apparently.

At any rate given the potential for nearly 100k (batch) verifications per second just using software, it seems better to pursue those approaches rather. Your point about the security of the chosen curves is well made, but I'm not sure how you'd get more confidence in the security of a new curve or version of ECDSA without trying it in the wild.

kjj

legendary

Activity: 1302

Merit: 1026

Regarding the curve, I'd really prefer moving to a curve that is better trusted than to one that is faster. Secp256k1 isn't widely used, and some people consider it unsafe. Well, not unsafe exactly, maybe nervous; they don't have the confidence in it that they have in some other curves.

If there are optimizations known for popular, widely trusted curves, that would be best.

One small downside to the rise of FPGA/ASIC mining is that miners won't automatically have piles of nearly-idle CPUs sitting around managing their GPUs that could participate in pooled verification schemes. (I'm thinking something like distcc) I'll shoot the BFL guys an email, they'll soon have piles of FGPAs sitting around as they buy them back from people upgrading to ASICs, maybe they'd be interested in repurposing them as signature verification devices.

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

There are more potential optimizations:

Batch verification: theoretically, it should be possible to verify the different signatures in a single transaction or a single block at once, as we don't care to know which signature fails. This seems very experimental, but I read things about a speedup of up to 4x
Specialized/assembly implementations: for the NIST-P256 curve (aka secp256r1), Google implemented an optimized version that is 3-4x times faster on x86-64. Their techniques may be portable to secp256k1?

For reference, script validation reaches a speed of 1735 sig verifications per second, on a single core of my Core i7-2670QM (2.2GHz) CPU. Assuming at least Hal's optimization and parallellisation are possible without additional overhead, 8000 sigops/s should be doable on this CPU, which means validating a current worst-case block (because of the 20k sigops limit) in 2.4s.

Mike Hearn

legendary

Activity: 1526

Merit: 1134

A lot of good work on scalability has been done lately, albiet not yet merged. So soon it will be time to refresh the scalability wiki page/FAQ entries.

For full nodes LevelDB and, much more importantly, sipas ultraprune work on more efficient working sets put us back to the point where ECDSA verification is the bottleneck rather than IO.

I want to make a roughly ordered list of known techniques we can use to increase signature verification throughput. How does this look:

Eliminate redundant signature checks (DONE)
Multi-thread signature checking. This is easier post-Sipa refactorings. For most machines on which Bitcoin runs this will yield at minimum a 2x speedup (I doubt anyone is using single core machines anymore) and for servers, more like a 4-8x speedup assuming it's OK to saturate all cores.
Implement the Koblitz-curve specific optimizations, this is the GLV method and Hal explained how to do it some time ago. Apparently it's something like a 20% speedup but could go higher.
Allow multi-machine sharded verification, so you can have multiple machines in parallel. Of course this takes us more towards the "supernode" design which some find controversial.

There may also be ways to use exotic CPU instructions like the ones in SSE3 or AES-NI. There's some discussion of it here, but it seems to only apply to binary curves rather than the prime order curves we use.

Much research has been done in recent years on ways to make ECDSA faster, but unfortunately they almost all are incompatible with the secp256k1 curve Bitcoin uses. Batch verification, in particular, seems to require changes to the signatures (at least that I've been able to find). Gaining access to those speedups would mean a global upgrade and hard fork. But what a speedup can result!

DJB and friends claim that with their ed25519 curve (the "ed" is for Edwards) and careful implementation they can do batch verification of 70,000+ signatures per second on a cheap quad-core Intel Westmere chip, which is now several generations old. Given advances in CPUs over time, it seems likely that in the very near future the cited software will be capable of verifying many hundreds of thousands of signatures per second even if you keep the core count constant. But core counts are not constant - it seems likely that in 10 years or so 24-32 core chips will be standard even on consumer desktops. At that point a million signatures per second or more doesn't sound unreasonable.

If we manage the first hard fork process successfully, it seems therefore that even regular computers you can have in your home (given enough RAM and bandwidth) will be capable of keeping up with traffic loads far in excess of what both VISA and MasterCard handle combined.

Topic: Theoretical max speeds for ECDSA verification (Read 4072 times)