I spent some time reading the literature to learn more about batch verification and other acceleration techniques. I think they could be applied to give us significant speedups, without any backwards incompatible or hard-forking changes.
Firstly we have to read and understand this paper:
http://www.mathnet.or.kr/mathnet/preprint_file/cacr/2005/cacr2005-28.pdfThe primary technique outlined in it is not applicable to Bitcoin because, as they say:
From the case analysis in the previous section, it follows that the transformation of the signature verification equation significantly reduces the size of the scalar multiples involved in this equation (if one can store a single precomputed multiple of the generator of G). As a result, we might expect a considerable speed-up of signature verifications, since this eliminates a large number of point doubling operations, a common element in multiple point multiplications. We expect significant improvements, both for prime curves and for random binary curves. For Koblitz curves, we can only expect a marginal improvement, since for those curves the Frobenius map (which assumes the role of point doubling) comes almost for free.
However, the modified ECDSA* algorithm is a requirement to use the batch verification technique from Cheon and Yi.
ECDSA* is a simple transformation of ECDSA that yields larger signatures, but which can be (batch) verified more quickly. The signatures can be converted between forms and have equivalent security. The main difference is that instead of a signature being (r, s) where r is the X co-ordinate of the point in affine space mod N, the signature becomes (R, s) where R is the full point including the Y co-ordinate. Of course "r" is still needed during the verification phase but it's a fast transform and the extra data is useful later.
The technique that matters to us is the one described in section 4.2 of this paper:
http://www.iacr.org/archive/pkc2007/44500442/44500442.pdfThey claim a 4x speedup when batching verifications together for the multiple-signer case, 7x more for the case of single signers (obtainable by grouping signatures by address) and a 9x speedup for single signers on Koblitz curves. At least I think it's 9x. My understanding of section 5 is a bit ropey - I'd appreciate somebody with a stronger grasp of elliptic curve mathematics reading it and checking that it really is 9x, as the comparisons they give at the end are a bit confusing.
To retrofit Bitcoin for this without the need to redefine any opcodes or do anything else that modifies the core messages, we can define CTransaction2 message that wraps a CTransaction. Outside the CTransaction message it provides the R values for each signature separately. Nodes store CTransaction2 messages. They are sent only to nodes that support a sufficiently high protocol version, otherwise the wrapped CTransaction is sent instead and older nodes perceive no difference. Obviously block hashing and so on is all done on the wrapped messages. When an (r,s) signature is received from an older node, the newer node calculates R from it so it can save newer nodes the additional work of doing the conversions themselves.
The result is basically a bandwidth/cpu tradeoff. You save CPU time but have to receive/send a bit more data per input. As people upgrade the chances of a node hearing a transaction in the newer format that is more efficiently verified increases. As transactions come in their input signatures are sorted by key, queued up and individual CPU cores pop groups off the queue as fast as they can. Small delays may be introduced to make a propagation latency/cpu load tradeoff - if load is getting too high then it may be worth leaving work in the queues for longer in the hope that more transactions come along that spend outputs with the same addresses, allowing more work amortization.
For obvious reasons batch verification is applicable primarily when transaction rates are very high. Otherwise it's just a way to make the tail-end of initial block download faster, which over the longer term will become less and less interesting as nodes become more stable and long-lived.