Thanks for your reply.
One standard issue in cryptography that I was alluding to was the necessity of having canonical forms, so that signatures (which use hash functions) work properly after a message is mangled by the network. This problem was obvious when encrypting messages sent over email, e.g. PGP and PEM (early 1990's). Another standard (but more subtle) issue is the opportunity for covert channels, which need only be low bandwidth before they can effectively leak a private key. Thus an experienced cryptographer would have understood there might be a problem and could have looked for it. Indeed, in the case of bitcoin the problem was noted back in October 2012, as I pointed out in an earlier post.
Absolutely, though its worth noting that while canonical encodings like DER are known and specified, virtually nothing enforces them. Bitcoin, from day one, has used a canonical encoding (DER)-- unfortunately, the underlying implementation in OpenSSL had undocumented behavior where it also accepted invalid DER until a couple months ago. (likewise, (all?) other similar libraries, such as bouncy castle behaved likewise-- this wouldn't have been escaped by not using OpenSSL). I spent some amount of time looking for precise DER implementations and BER implementations for reference test points in libsecp256k1 and was unable to find _any_ open source software which precisely implemented (no more accepted or rejected than specified) them. Even OpenSSL's recent fix for their behavior was not to correct their parser but instead to attempt to round-trip the encoding, which is not a guarantee. (You can see
this list for some of the crazy things the OpenSSL parser, as an example, would accept.).
(And FWIW, the whole Bitcoin network has filtered these encodings everywhere for several years, even though it required quite some wallets (like Armory and MTGOX) change their behavior).
But that deals with serialization, I am aware of no evidence that anyone has observed the algebraic malleability of ECDSA prior to us, and no evidence that anyone else took any action against it (now or previously).
As far as leaving serialization elements out of protocols, great care must be taken there too as there have been many vulnerabilities created by leaving things out. For example, the original PGP fingerprints hashed the modulus and exponent (but not their potentially non-canonical serialization), but also as a result had no delimitation so someone could move some bytes of your modulus into the exponent until they found a key they could factor, and then they'd have a weak key with the same fingerprint as you.
Signatures being covered under identifying hashes is ubiquitous, e.g. x509 certificate chains work just like bitcoin in this respect; and this interacts poorly with cert blacklisting due to malleability.
In any case, I don't want to belabor this, but you were really harsh and critical where I think the reality is almost 180 degrees out: Bitcoin suffered from undocumented behavior in third party code; the outside world while somewhat aware of problems from non-canonicality has largely not acted on it-- Bitcoin Core is a leader in responsive and responsible handling of this, and our efforts resulted in the discovery of an additional form which appears to have not been known/discussed previously), which our ecosystem (out of all the ECDSA users in the world) appears to be the only ones at all protected against.
I really don't care much what armchair jockies on the forums (not specifically referring to you) think, and so I don't usually expend a lot of resources bragging... but to have someone be really harsh on a matter where I think our response has shown considerable innovation and leadership is, at least a little, bit frustrating!
Also, the closing of a low/high-s covert channel would not be significant until deterministic signatures were used. In this regard, I note the date on RFC6979 is August 2013, which is after the bitcoin dev's were aware of the problem.
It is unclear what you mean by "would not be significant", but the matter is largely orthogonal-- 6979 or not, implementations that do not heed the lowS behavior specifically will produce non-lowS signatures at a half the time per signature, leading to most of their transactions being blocked. No wallet that I'm aware of has ever provided a "resign" button that might break through-- and when wallets do create new transactions they usually use a new random selection of coins, so 6979 would also produce a new signature. Even if they did provide a "resign" and used a random signature, once your transaction has more than a couple inputs the probability that it would pass a lowS test becomes very very low.
RFC6979 does not really close a covert channel either, alas, it is impossible to determine if a device is using 6979 without access to its keys and a malicious device could use 6979 99.99% of the time but then sometimes change to a kleptographic method (perhaps triggered by the message itself). The RFC does eliminate the strong need for a good RNG at signing time -- a common implementation problem-- and make review/auditing somewhat easier.
I did a little searching around to see if I could find a non-bitcoin example of the problem, but didn't find anything, probably for the reasons above. This may not help you with regard to the HSM vendor, but as a customer of such a device I would not want to have an obvious potential covert channel, even if the output could be "fixed" by software in a nearby computer, since if the nearby computer could be trusted there would be no need for the HSM in the first place. (Comment inapplicable if said HSM always outputs high-S.)
Unfortunately if the device has free choice of its nonce (which having control of low/high S implies) then it has a high bandwidth covert channel. E.g. produce the nonce as H(ECDH(attacker_pubkey, pubkey) || message_to_be_signed). More elaborate versions of this allow the embedding of additional data, beyond just leaking the current private key. Even if something constrains the choice of nonce, a malicious device can do rejection sampling to turn get N bits of covert channel out of the nonce by doing 2^n computation-- low vs high isn't different except by being computationally cheaper to use. So, I'm not sure I'll be able to sell it in terms of covert channel suppression, but it's worth a try. Thanks for the suggestion.