Author

Topic: [RFC] c32d encoding (Read 1420 times)

vip
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
May 15, 2013, 09:45:25 AM
#2
This would be my wish list for a new alternate address encoding:

1. Some sort of datestamp/expiry field so that users could mark their addresses so clients would throw up warnings if someone attempts to pay them at some point in the future beyond the point when the address is expected to be used.  The field could be optional and missing to make the address valid forever, and could have variable resolution (e.g. a time portion only if it's truly sensitive with resolution greater than 1 day).  One could still pay an expired address, they would just have to wade through a lot of protest from their client and perhaps have it in an advanced mode.

2. Checksum/error correction based on Reed-Solomon instead of SHA256, so that single-position errors, up to a limited number, have a 100% probability of being correctable and a 0% probability of being corrected to the wrong thing.

3. Of course, compatibility with payment protocols.

Meanwhile, I would submit the following:

4. In practice, the ambiguity between symbols such as "isuvzSVZ" is, in my estimation, a low concern, given the widespread emphasis on copying, pasting, or scanning bitcoin addresses and not typing them in.  While I have done it myself many times with the unusual way I do bitcoins, I have never seen a single potential payee say "Here, hand-type this Bitcoin address into your Bitcoin client."  Further, with the proper Reed-Solomon error correction, visually similar substitutions could be routinely reliably corrected.  (That said though, there's also a value in a power-of-two radix, and if throwing away some symbols helps us achieve that, well then... this is a great idea)

5. The prefix (in this case "2") should be useful to the user to help them identify what kind of object type they are looking at.  In terms of usability, it is a really costly design decision to use the same prefix for an address versus a private key and to rely on the user to distinguish the two using a software tool or by somehow memorizing the length.  Having the same prefix with distinction by length etc. would be a great idea for an altcoin used only by a niche community with specialized skills (of the sort who would learn Klingon etc.), but for a coin meant the public at large, a public who will suffer from confusion and errors as a result of this, this really needs to not be forgotten.
legendary
Activity: 2576
Merit: 1186
May 15, 2013, 01:32:24 AM
#1
This encoding is designed so that it could replace Base58Check in new data, with the following goals in mind:
  • Impossible(?) to manipulate without completely changing it
  • Clearly identifiable prefix, regardless of data size
  • Cheaper to process (simpler and faster code; it's a power-of-two radix)
  • Fixed length string for fixed length data
  • More unambiguous (removal of chars 'isuvzSVZ')
  • Compatible with using seven-segment displays
  • Altcoin friendly (16 bit namespace, which can be read without decoding)

Since there are fewer digits and more identifying/signature characters, addresses are longer. This should be less of a problem since ordinary users will hopefully be using addresses less common as the payment protocol becomes more popular.

Example Python code (including tests) is at the bottom.
I can write up a formal BIP if this seems useful.

For example:

160 bits of data, such as current addresses:
  • 2nc111dhAPE2aUdYAOF88JhLn5jEjbULy4eFe9tyFYFE8
An ordinary P2SH destination, incorporating Greg's "require the hash mid-image to be relayed" concept (256 bits of data):
  • 2bc511A95e74P13dPb6b5t7yrh12EhC363ayH98n1cFbr3rAHdA49nCcC1G3P71j
The same key in Namecoin:
  • 2nc5119ttL35HPhc3Hh6aHe2tOhF6rdFtAOE1ahFLt9Ecabhcn5FLea5Le71P56C
The example "puzzle" script from the wiki (arbitrary scripting):
  • 2bc311d126acCyAnHAjabeUtOHcr7F811j4UYE6ECtOcbcGGn4O9chAt7O7y2LU9ty9cnG4
An alternative for BIP32 extended public keys (560 bits):
  • 2bc911AcchHheAGFnn9LC6FdF7bOc99APJtcEc46U655JheH6LCr3Y333eFEOtPJ9rj22rEcchHheAG Fnn9LC6FdF7bOc99APJtcEc46U655JheH6LCr3YJCtPYea
An alternative for BIP32 extended private keys (552 bits):
  • 2bcb11O77GHdP53FH7Jh44OdEh3rLd4eFr2h7c8rGeErELG18yCy9O7L9LednyHJa5hyeAP77GHdP53 FH7Jh44OdEh3rLd4eFr2h7c8rGeErELG18yCyGG5drPF1


c32.py
Code:
# Digits are chosen to be the least ambiguous
# They all have unique seven-segment glyphs, and cannot be easily confused by humans
digits = '123456789abcdehjnrtyACEFGHJLOPUY'
radix = len(digits)

def encode(v):
    if not len(v):
        return ''
    n = 0
    bits = 0
    o = []
    pad = (len(v) * 8) % 5
    if pad:
        v = b'\0' + v
    v = bytearray(v)  # For Python 2.7 compatibility
    for i in range(len(v) - 1, -1, -1):
        n |= v[i] << bits
        bits += 8
        while bits >= 5:
            o.insert(0, digits[n & 0x1f])
            n >>= 5
            bits -= 5
            if i == 0 and pad:
                break
    return ''.join(o)

def decode(s):
    n = 0
    bits = 0
    o = bytearray()
    for i in range(len(s) - 1, -1, -1):
        n |= digits.index(s[i]) << bits
        bits += 5
        while bits >= 8:
            o.insert(0, n & 0xff)
            n >>= 8
            bits -= 8
    return bytes(o)

def test():
    from math import ceil
    assert '' == encode(b'')
    for (i, oc) in (
        (1, '8'),
        (2, '2'),
        (3, 'j'),
        (4, '4'),
        (5, 'Y'),
        (6, '8'),
        (7, '2'),
        (8, 'j'),
        (9, '4'),
    ):
        ol = int(ceil(i * 8 / 5.))
        try:
            inzero = b'\0' * i
            inone = b'\xff' * i
            ezero = encode(inzero)
            eone = encode(inone)
            dzero = decode(ezero)
            done = decode(eone)
           
            assert ezero == '1' * ol
            assert eone == oc + ('Y' * (ol - 1))
            assert dzero == inzero
            assert done == inone
        except AssertionError:
            raise AssertionError('Input of length %s failed test' % (i,))
    try:
        for c in range(1024):
            decode('111' + chr(c))
    except ValueError:
        pass
    else:
        raise AssertionError('Invalid decode input (%02x) did not throw a ValueError' % (c,))

if __name__ == '__main__':
    test()
    print("Tests passed")


c32d.py
Code:
import c32
import hashlib
import struct

def _checksum(v):
    return hashlib.sha256(hashlib.sha256(v).digest()).digest()[-4:]

'''
String format:
- c32(Raw format) in reverse order

Raw format:
- 4 bytes checksum
- N bytes data (NOTE: encoded to prevent hidden changes)
- - for script:
- - - N bytes: varint preimage data length
- - - N bytes: preimage data
- - - N bytes: script data
- - for BIP32 HD parent key:
- - - 32 bytes: chain code
- - - 33 bytes: parent pubkey
- - for BIP32 serialized key:
- - - 1 byte: depth
- - - 4 bytes: child number
- - - 32 bytes: chain code
- - - One of:
- - - - 32 bytes: private key data
- - - - 33 bytes: public key data
- 1 byte flag (ignored if unknown)
- 1 byte type
- - 01 script (with preimage data)
- - 02 script hash preimage
- - 03 BIP32 HD parent key
- - 04 BIP32 serialized public key
- 2 bytes namespace (blockchain id)
- - 2d41 Bitcoin  ('2bc')
- - 2e01 Namecoin ('2nc')
- - 2e37 Freicoin ('FRC')
'''

class c32d:
    __slots__ = ('data', 'ns', 'dtype', 'dflag')
   
    def __init__(self, data, ns, dtype, dflag):
        self.data = data
        self.ns = ns
        self.dtype = dtype
        self.dflag = dflag
   
    @classmethod
    def decode(cls, s, raw = False):
        if not raw:
            full = c32.decode(s[::-1])
        else:
            full = s
       
        csum = bytearray(full[:4])
        v = bytearray(full[4:])
       
        # Encode the configuration bytes to simplify decoding
        pv = 0xbf
        for i in range(len(v) - 1, len(v) - 5, -1):
            pv = v[i] ^ (csum[i % 4]) ^ pv
            v[i] = pv
       
        v.append(0xbf)
        for i in range(len(v) - 1):
            v[i] ^= csum[i % 4] ^ v[i + 1]
        v.pop()
       
        v = bytes(v)
        if csum != _checksum(v):
            raise ValueError('c32d checksum wrong')
       
        o = cls(None, None, None, None)
        o.data = v[:-4]
        o.dflag = v[-4]
        o.dtype = v[-3]
        o.ns = struct.unpack('!H', v[-2:])[0]
       
        return o
   
    def encode(self, raw = False):
        try:
            v = self.data + struct.pack('!BBH', self.dflag, self.dtype, self.ns)
        except struct.error as e:
            raise ValueError(e)
        csum = bytearray(_checksum(v))
       
        v = bytearray(v)
        pv = 0xbf
        for i in range(len(v) - 1, -1, -1):
            pv = v[i] ^ csum[i % 4] ^ pv
            if i < len(v) - 4:
                v[i] = pv
        v = csum + bytes(v)
       
        if raw:
            return v
       
        return c32.encode(v)[::-1]

decode = c32d.decode

def encode(*a, **ka):
    return c32d(*a, **ka).encode()

def test():
    c32.test()
    for (p, s, raw) in (
        ((b'', 0, 0, 0), '1111115Fd9acc', b'\xb5\xa5\x0c\xb9\x00\x00\x00\x00'),
        ((b'test', 4232, 142, 219), '955OGe8hOGc97hH4EJj1', b'?V\x1e\\d/\x1cq\xdb\x8e\x10\x88'),
        ((b'\xff' * 0x100, 0xffff, 0xff, 0xff), 'YYYYYYc327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGb2cOdG3', b'\xb0\xce,*\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xff\xff\xff\xff'),
    ):
        (kp, pp) = ({}, p)
        for i in range(2):
            o = c32d(*pp, **kp)
            assert o.data == p[0]
            assert o.ns == p[1]
            assert o.dtype == p[2]
            assert o.dflag == p[3]
            kp = {
                'data': p[0],
                'ns': p[1],
                'dtype': p[2],
                'dflag': p[3],
            }
            pp = ()
        assert o.encode() == s
        assert o.encode(raw=True) == raw
   
    def ensureValueError(f):
        try:
            f()
        except ValueError:
            pass
        else:
            raise AssertionError('Invalid decode input did not throw a ValueError')
    ensureValueError(lambda: encode(b'', -1, 0, 0))
    ensureValueError(lambda: encode(b'', 0x10000, 0, 0))
    ensureValueError(lambda: encode(b'', 0, -1, 0))
    ensureValueError(lambda: encode(b'', 0, 0x100, 0))
    ensureValueError(lambda: encode(b'', 0, 0, -1))
    ensureValueError(lambda: encode(b'', 0, 0, 0x100))
   
    # Invalid c32d
    ensureValueError(lambda: decode('1111115Fd9adc'))
    ensureValueError(lambda: decode('11A1115Fd9acc'))
   
    # Invalid c32
    ensureValueError(lambda: decode('111x115Fd9acc'))
    ensureValueError(lambda: decode('1111115Fd9acx'))

if __name__ == '__main__':
    test()
    print("Tests passed")
Jump to: