Author

Topic: Normalized / canonical transaction ID for helpdesk usage & a new base32 encoding (Read 6397 times)

full member
Activity: 126
Merit: 100
Good job maaku.  Smiley
I'm going to take a look through this code and probably come and ask a lot of questions.
legendary
Activity: 905
Merit: 1012
The case sensitivity of base58 is a hindrance everywhere, not just over the phone. Many people do not distinguish case sensitivity well in hand writing, and some fonts have trouble. Base58 has been a persistent pain point since the creation of bitcoin. Random mixing of upper and lower case doesn't make for very good placeholders, compared with alternatives like regular grouping by space or hyphen.
legendary
Activity: 1512
Merit: 1036
Small correction - base32-human originated with Zooko:
http://zgp.org/pipermail/p2p-hackers/2002-November/000924.html

I don't know why encoding is being re-imagined - Bitcoin's Base58 is a natural evolution of previous human-readable encodings, includes an example checksum, and is already in code. It doesn't use groupings, specifically so a double-click selects the whole thing.

I see few scenarios where one would need to communicate an entire transaction id by voice where lower-case-only would be worth the byte cost. In situations where the other party only needs confirmation of a transaction ID, reciting the starting letters in either upper or lower would allow positive identification of the transaction. I agree that reading mixed upper and lower over a phone is painful though, just having done that for generated passwords and user IDs.

The randomly mixed in upper case characters in base58 seem to make good visual place holders; when reading all-lower, it is very easy to lose your place.

I also just realized I forgot to demonstrate what the new ntxid looks like. Here's the ntxid for 6988d5fd2735b86e005ee9249a8b8053c91cd31fd1bfeadcf678093d1b710223:

ntxid
txbtsogbjjuimfqas7sgkbaqqkjxygyixk3deuxmrm1uqte8nukemm6yxujjjzbr

Base58 (no container):
86xpwm8Q5zCGUD844C2WZeWEn8tMadWbAG3pxBmfBea6

The above could be wrapped as ASCII in a container, "BTCTX" + base58 of txid and any ECC scheme
legendary
Activity: 905
Merit: 1012
The error correction aspects of this proposal have been broken off into their own draft BIP:

https://gist.github.com/maaku/8996338#file-bip-ecc32-mediawiki

And pull-request:

https://github.com/bitcoin/bitcoin/pull/3713
kjj
legendary
Activity: 1302
Merit: 1026
Having some experience in running a front-line customer service helpdesk, I know that a significant amount of time and therefore money can be wasted transmitting long strings of information over the phone (e.g. a 64-character hash string), or from handwriting which can be worse. Also since we already have a notion of a 32-byte transaction hash encoded as a 64-digit hex string, it could be very confusing for users who do not understand the difference.

I'm having a really, really hard time coming up with a situation where someone would ever need to transmit a ntxid over the phone.

A transaction ID isn't data useful to the system, it is merely the name of some useful data.  Since inputs are also referenced by txid, that particular name can also be useful if, for example, you ever find yourself in a bizarre situation where you need to create an offline transaction by phone.  Such a situation is on the borderline of imagination though.

A name that is purely a name is not useful for moving information around, but only for comparison and searching.  For example, if you are searching your local node or some website for a transaction, the first few letters will suffice, and if they don't provide your desired result, the remainder of the letters aren't any help at all.  These use cases are common and plentiful, and we can make them better without adding yet another entity to our pantheon of ad hoc identifiers and strings.
legendary
Activity: 905
Merit: 1012


I've pushed a hopefully final version of the error correction coded normative transaction ID branch to my public repository:

https://github.com/maaku/bitcoin/tree/normtxid

This version breaks behavior from what is described in this pull-request by using the regular (not SignatureHash) transaction ID for coinbase transactions. At this time I cannot imagine why you would need a normalized transaction ID for a coinbase transaction. But just because I can't imagine a use doesn't mean there isn't one, and services indexing the block chain need to make a decision about this edge case. The patch in this pull request has the potential to result in duplicate ntxid's for coinbase transactions since the coinbase string which contains the BIP-34 block height is a scriptSig and therefore stripped from the normative data structure.

Additionally, the code now provides error correction coding for arbitrary length base32 strings, and complete coverage with a suite of unit tests. It also corrects two bugs that were found in the encoding algorithm, which were uncovered in the process of writing the tests. All that remains to be done is to write a BIP documenting its inner workings.
legendary
Activity: 905
Merit: 1012
Wouldn't the more obvious thing to ask for the customer's target address, not the TXID?

You're involving assumptions about the type of transaction, the customer's usage of their target address (address reuse is sadly a fact of life), and the scenarios under which a non-malleable txid is useful. This is an attempt at a more generic solution.

Anyways, does this ID really have to be THAT unique? Why not a 32-bit hash like CRC32 or ADLER32 and just have the helpdesk look through the handful of transactions that have this hash in the mempool/last 3 days of blocks currently?

If that's all you need, then you can take a small hash over the normalized txid to get whatever it is you want. Again though, there's many applications where a truly unique ntxid is required, and a 32-bit checksum wouldn't deliver that.
legendary
Activity: 2618
Merit: 1007
Wouldn't the more obvious thing to ask for the customer's target address, not the TXID?

Anyways, does this ID really have to be THAT unique? Why not a 32-bit hash like CRC32 or ADLER32 and just have the helpdesk look through the handful of transactions that have this hash in the mempool/last 3 days of blocks currently?
legendary
Activity: 905
Merit: 1012
Maybe you could do something with uppercase lowercase instead? Capitalise every first, fifth etc letter? Or alternate blocks of four uppercase with four lowercase letters.

I thought about that, but I don't think it would be obvious to every user that the case sensitivity is merely for display purposes. By using one case uniformly the user doesn't feel prompted to qualify with "big"/"little" or "capital"/"lowercase". It's also less reliable as a visual distinction since there are non-cased digits thrown in as well.
hero member
Activity: 714
Merit: 500
Martijn Meijering
Maybe you could do something with uppercase lowercase instead? Capitalise every first, fifth etc letter? Or alternate blocks of four uppercase with four lowercase letters.
member
Activity: 116
Merit: 11
No, doesn't work on iPhone.
legendary
Activity: 905
Merit: 1012
Well it does work in Chrome on Android. I wonder if it works in iOS? (I don't have a device to test). Since the concern is mobile users, maybe that is enough.
member
Activity: 116
Merit: 11
Darn. Oh well, was worth a shot.
legendary
Activity: 905
Merit: 1012
not on my browser (Firefox)...
member
Activity: 116
Merit: 11


txbtsogb_jjuimfqa_s7sgkbaq_qkjxygyi_xk3deuxm_rm1uqte8_nukemm6y_xujjjzbr

works fine.
legendary
Activity: 905
Merit: 1012
Maybe add some dashes every four characters to make it easier to keep track of where you are in the txid when you're reading it aloud?

Actually the first version had just that - but with groupings of eight instead of four. Visually it looked good, but the feedback I got was that it would be hard to select in order to copy-paste, especially on a tap-to-click mobile device. Although making it easier to read over the phone is a design goal, it probably shouldn't be at the cost of a more difficult copy-paste, since I imagine that is the more common scenario these days.

But that does remind me: I should make the decoder accept and ignore hyphens, so people are free to use that in their own systems when it makes sense.
hero member
Activity: 714
Merit: 500
Martijn Meijering
Quote
txbtsogbjjuimfqas7sgkbaqqkjxygyixk3deuxmrm1uqte8nukemm6yxujjjzbr

Maybe add some dashes every four characters to make it easier to keep track of where you are in the txid when you're reading it aloud?
legendary
Activity: 905
Merit: 1012
Thanks sundance!

I just pushed code which adds unit tests, and enables error code protection for arbitrary-length base32 strings.

I also just realized I forgot to demonstrate what the new ntxid looks like. Here's the ntxid for 6988d5fd2735b86e005ee9249a8b8053c91cd31fd1bfeadcf678093d1b710223:

Quote
txbtsogbjjuimfqas7sgkbaqqkjxygyixk3deuxmrm1uqte8nukemm6yxujjjzbr
newbie
Activity: 22
Merit: 29
Well done Mark, a nice synthesis of crypto and error correcting codes. And the code looks nice at that.
legendary
Activity: 905
Merit: 1012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Pieter Wuille proposed a "normative transaction ID" (ntxid)[1] -- a hash covering the same data which is signed in the standard transaction types. A transaction cannot be changed in such a way as to alter this ntxid without also invalidating a the SIGHASH_ALL signatures typically used by wallet software in the standard transaction types.

Having some experience in running a front-line customer service helpdesk, I know that a significant amount of time and therefore money can be wasted transmitting long strings of information over the phone (e.g. a 64-character hash string), or from handwriting which can be worse. Also since we already have a notion of a 32-byte transaction hash encoded as a 64-digit hex string, it could be very confusing for users who do not understand the difference.

I proposed and have implemented[2] an alternative encoding format for these normative transaction identifiers. It is also available as a patch against bitcoind itself[3] (I wasn't sure if I should open a separate pull request from sipa's). It differs from the presumptive default 64-character hex encoding in the following ways:

1. Always starts with the two characters: "tx".
2. 64-digit base32 instead of base16, and therefore is visually distinctive in that it includes the full range of alphanumeric digits.
3. Automatic correction of up to 2-characters of input data, and highly probable detection of further non-recoverable errors (approximately 1:1,000,000 chance of a false positive).
4. Uses Phil Zimmermann's z-base-32 encoding alphabet[4], which provides better resistance to transcriptive errors than the RFC 3548 standard.

The encoding format is in fact entirely general. It is able to transform any power-of-two sized string (>= 16 bytes in length) into a base32 string a multiple of 31 digits in length, with each 31 digit grouping containing error correcting codes capable of recovering from a single-digit error in the base32 input. It uses CRC polynomials for the error correction capability[5], which requires 5 error correction digits per 26 digits of input, instead of the theoretically optimal 4, but is much, much easier to implement correctly then available non-patented error correction codes.

The encoding format should be applicable to other uses as well, such as exporting private keys. It has additional features to support this sort of application: the encoding of a few extra bits of information in the padding structure, such as whether the private key is encrypted, and whether the public-key should be compressed.

Aside: if you like this work, please consider donating a few coins to 1DeZqzJ2f472VaGG6qAVzw5FNq5v4eL7pb. I work full-time on core bitcoin improvements, funding myself entirely through community donations. Any donations received to this address will be used to further address malleability concerns, such as an eventual soft-fork to eliminate known forms of malleability entirely.

[1] https://github.com/bitcoin/bitcoin/pull/3656
[2] https://gist.github.com/maaku/8996338
[3] https://github.com/maaku/bitcoin/tree/normtxid
[4] http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
[5] http://www.drdobbs.com/article/print?articleId=184401662&siteSectionName=
-----BEGIN PGP SIGNATURE-----

iQIcBAEBAgAGBQJTAlzpAAoJECsa1YSJj/UD2EMP/RXBPu9hTaWKBi2Lg3YQ8WIv
IdonshPAFq6YDx/dREPJZqIK/DXTORxcc0sNkmL7wfHSssz8A6cXzlQne+kqX2sf
g5+6W7Vn0WOTAbWYKNufnktbcm/5teH0yvXBfNoVpei76Z5NIjWbE9HVS0Ib8tys
Xr8d69+ufKB2TdM5NLRVmxtdyVXvSlo7706m1lNbTEDVZ5qM/Biv8YZsLqXVJN4j
pPdTxQ4jQrOOklUEWLGQ18EnGj5fAggSG0+ijdkxdFFYmmjlq/D3ucmSmRlBTeJE
aCdokV8ZFqmalvHMfjmawLB0faGtYDRnDIcKwlNHAES1XCRS4vVWGPHmXP12SjjA
PItuirf7G14UHEcKzFLqaF3j/XP4y0BZd9/J38VbvTEumwsMw6+UVMSFIytTkKIL
raS1zRVLf+r/Xkq7X4NFSIQx+N9Svzs03/iIxCF8eafC4vEQNZaSqwkXqmbLWmrQ
byRlE459BaPM0Bh+aykxMTfFcguZbK/7t0NuXawcJBJCrBvApvzCQfyDQXOM4LDr
8T7uvRcoFnOOnLr5zjXKREmVhtEKgl3Zl5rRbGW0EfJbiyX03U6OmDxwsw4EIh72
ToHEBNSAsq6lDvMT/bTitA5+yqx8k0E3ztYsgn6DkfFGXr2m8dKBYTDF5jh6HAS1
Fu+7gMV5ZfQ+3T/MLGvb
=xmVG
-----END PGP SIGNATURE-----
Jump to: