This is already being implemented, in fact. Nothing to do with mastercoin.
Thanks Jeff, this is very helpful to know
Seems to have worked for all 20,000 - but my ECDSA point validity testing is being done with code from the Casascius Bitcoin Address Utility and there is a line in there that is making me wonder whether the validatePoint() and other check functions are giving me the whole picture on whether the keys are valid:
// todo: ensure X and Y are on the curve
You can find the results here. Tachikoma, when you're back and have a bit of time would you mind taking the '*RAW' file and running some of those pubkeys through your Mastercoin::Util.valid_ecdsa_point? function and see what results you get? They are the corrected keys with the last byte rotated and all 20,000 are supposed to be valid.
You're on the right track, although be warned that the assumption that modifying a single byte until the X point is on the ECDSA curve is always possible may be invalid; my understanding is that there's no upper-bound known on the gaps between valid points on the curve. Having said that, a decent assumption is, as you've already figured out, they are randomly distributed and about half of all randomly chosen 256-bit integers are valid X points.
Stepping back for a moment, keep in mind that censorship isn't as simple as just "block invalid X points" - censoring any data that looks like it isn't random is quite possible too. I've been told that some pools are already implementing this with rules that block transactions where the data encoded in them is too "compressible" or has too many ASCII characters. What you need to be doing, at minimum, is encrypting the data you are embedding too. As I suggested in the Bitmsg thread a good way to do this is to use a cipher, taking the first CTxIn's prevout.hash + prevout.n + nSequence + nLockTime, hashing all them together to create an initialization vector, and then using that IV with some type of cipher. The resulting encrypted data is just as likely as passing randomness tests as any other pubkey, and doing so doesn't have any real performance impact. (encryption is fast!) Of course, for Mastercoin, this isn't really encryption as encryption, more encryption for the purpose of steganography, but for that purpose it's still effective.
Once we've settled on using encryption we can then make use of it to give a very robust way of ensuring the data in question looks like valid public keys. Essentially what you want is a chained block cipher mode, but one where each PUSHDATA corresponds to one block; another equally valid version is to have a stream cipher where the cipher is re-keyed on every PUSHDATA.
Now lets say we've split our Mastercoin message into a series of PUSHDATA's, forming plaintext P_0 to P_n. We also have our steam cipher E_k for key k, where the k is well known. (could be SHA256('MasterCoin Rocks!'); it's not a secret) We compute the cipher text as C_i = E_k(P_i XOR C_{i-1}), C_0 = IV. (AKA CBC mode)
Now, what what we do is for every C_i we test if it's a valid pubkey, if not we modify P_i until it is, just like you were doing above. Given that ~50% of all X points are valid pubkeys the probability of not being able to find a valid one is 2^-256, essentially impossible, and because the IV is different for every message there's no danger of an attacker finding a specific one that doesn't work for everybody. (by finding some region where a large number of X points are invalid and constructing some kind of asset or something where trading in it is likely to result in messages hitting that case)
AES works on 128-bit blocks, while a compressed pubkey makes for a 256-bit block, so you'll have to make the iterator being modified be at the beginning rather than end of a block. You'll also need a crypto library that can save state, allowing you to back-up two blocks whenever you need to modify the iterator. That said if you put the iterator in the second block, you only need to back-up a single block, and it's rather unlikely for a 128-bit prefix to have no 128-suffixes that are valid pubkeys; you'll probably get away with it, but be warned that I'm not a cryptographer.
A more general "stuff-data-in-the-blockchain" solution would also allow for stuffing in any PUSHDATA, specifying which PUSHDATAs happened to be valid with, say, the low-order bits of the nValue in each CTxOut. In this case you'd want a rule that handles 33-byte PUSHDATA's specially, dropping the first byte and the iterator byte from the plaintext output. I've actually got code that does most of this; I might go publish it as a general purpose library for censorship-resistant embedding of data in the blockchain.
Also in case you guys aren't already aware, currently the IsStandard() rules consider any PUSHDATA from 33 bytes to 120 bytes as standard, and there is nothing checking that pubkeys are valid at all. (including the first byte that determines compressed/uncompressed) For instance my OpenTimestamps timestamper stuffed timestamps into bare CHECKMULTISIG scriptPubKeys by setting the first byte of the pubkey to zero to be sure it could never be spent.(1)
1) Note that because the IV is constructed from a one-way-function, and CBC-mode encryption is also a one-way-function, we can be sure that no attacker could ever come up with a way to get you to create a pubkey for which they knew the secret key for. In theory an XOR stream cipher mode doesn't have this guarantee.
Disclosure: Someone contacted me privately and offered to pay me to help you guys out, although they didn't realize at first how far along you guys were in understanding the problem. I assume they were a Mastercoin investor, but I don't know and they asked to remain anonymous. All the same, I hope this helps.
Paid or not Peter this is extremely helpful, thank you (though it will take multiple readings for me to even remotely try to understand it all!).
JR, without wanting to be badgering I feel this is the most critical issue for mastercoin right now. How we store data in the blockchain is fundamental and should be locked away as early as possible (and ideally written into the spec). As you made that the number one goal of the first coding contest I'm guessing you feel the same Tachikoma's work is great but perhaps we could refine it somewhat - in development we're spitting out invalid ECDSA points in multi-sigs some of the time and as Jeff has raised above, pubkey checking is on the way regardless of mastercoin.
As I've mentioned before I view the miners as our critical issue and if some of the major pools are doing things like looking for obvious compressable/text data as Peter notes, perhaps some further investigation into steganography (an apt analogy I think) would be in order - obfuscating the data we're storing in the pubkey (IMO) seems quite achievable and spotting our padding ("000000000" etc) isn't going to be hard.
Thoughts?