Author

Topic: how do I extract the address from the output scriptPubKey for early coinbse tx? (Read 375 times)

full member
Activity: 173
Merit: 120
... the bug is converting the P2PK pubkeyscript to a P2PKH address.

when you have 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX then the corresponding script is
Code:
OP_DUP OPHASH160 <119B098E2E980A229E139A9ED01A469E518E6F26> OP_EQUALVERIFY OP_CHECKSIG
and NOT this:
Code:
<0496B538E853519C726A2C91E61EC11600AE1390813A627C66FB8BE7947BE63C52DA7589379515D4E0A604F8141781E62294721166BF621E73A82CBF2342C858EE> OP_CHECKSIG
take this simple example, we have a concept of a number that we can represent in different encodings. with hexadecimal characters (0xff), with numbers/digits (255), with words (two hundred and fifty five), in binary (11111111),... no matter which one you choose, they all represent the same exact thing. but if you decide to convert 0xff to 12345 that would be wrong, if some application did that it would be its bug.
it is the same with scripts. we have defined a different "encoding" for certain scripts that i mentioned in my first comment. you can convert those script to these encodings and get the result called "address" and you can convert that address back to those scripts. doing anything other than this is a mistake.
pooya87, thank you for indulging me. I think I am finally catching on, but there is a subtly that I might be still missing maybe?  Let me try to recap and explain my objective to ensure I am not "doing anything other than this is a mistake".  I am simply an amateur/hobbyist researcher data scientist type presently focused on studying the early bitcoin blockchain because I am fascinated by it.  For the first couple of years with bitcoin's blockchain the output pkscript encodes the public keys in the two different flavors that you highlighted above. To normalize the output 'address' and to be able to take advantage of block explorers' APIs I am encoding both flavors to their base 58 version like so:
Code:
OP_DUP OPHASH160 <119B098E2E980A229E139A9ED01A469E518E6F26> OP_EQUALVERIFY OP_CHECKSIG
  becomes --> 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
Code:
<0496B538E853519C726A2C91E61EC11600AE1390813A627C66FB8BE7947BE63C52DA7589379515D4E0A604F8141781E62294721166BF621E73A82CBF2342C858EE> OP_CHECKSIG
becomes --> 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX

I then use one of the many APIs to look up, say, the current balance of that 'address' or maybe its tx history depending on what I am researching.  I am not trying to do anything other than that.
legendary
Activity: 3472
Merit: 10611
I am sorry I am being so dense, but I am still missing what you are calling a bug then since the base 58 rendering it produces is accurate for the early days of the blockchain. 

the process of creating the address from a public key is correct and you already got it right. the bug is converting the P2PK pubkeyscript to a P2PKH address.

when you have 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX then the corresponding script is
Code:
OP_DUP OPHASH160 <119B098E2E980A229E139A9ED01A469E518E6F26> OP_EQUALVERIFY OP_CHECKSIG
and NOT this:
Code:
<0496B538E853519C726A2C91E61EC11600AE1390813A627C66FB8BE7947BE63C52DA7589379515D4E0A604F8141781E62294721166BF621E73A82CBF2342C858EE> OP_CHECKSIG

take this simple example, we have a concept of a number that we can represent in different encodings. with hexadecimal characters (0xff), with numbers/digits (255), with words (two hundred and fifty five), in binary (11111111),... no matter which one you choose, they all represent the same exact thing. but if you decide to convert 0xff to 12345 that would be wrong, if some application did that it would be its bug.
it is the same with scripts. we have defined a different "encoding" for certain scripts that i mentioned in my first comment. you can convert those script to these encodings and get the result called "address" and you can convert that address back to those scripts. doing anything other than this is a mistake.
full member
Activity: 173
Merit: 120
Just in case anyone else is interested in the python code the mirrors the above logic that seems to work well for the outputs scripts found in the EARLY blockchain file. It takes 160-bit RIPEMD-160 hash as an input adds the network bytes prefix for mainnet ('00') and outputs the base 58 rendition:
this is essentially the same bug that those block explorers have.
may i ask what exactly are you trying to do by reproducing this mistake?
Why I am seeking to extract it is because I do make use of the various block explorers APIs and that is how I can look up information unfortunately.   Embarrassed

ah, I thought the 'mistake' was perpetuating referring it as a 'bitcoin address' instead of a base 58 encoding of a hash of the sender/recipient's public key (AKA PP2PKH right?). 
I am sorry I am being so dense, but I am still missing what you are calling a bug then since the base 58 rendering it produces is accurate for the early days of the blockchain. 

Since I was following this page for my implementation, what would you edit from this page to correct the bug? 
https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses



legendary
Activity: 3472
Merit: 10611
I think I understand now what they call an 'address' (i.e., 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX) is really a Base 58 encoded rendering of the 160-bit (i.e, 20 byte) ripemd160 hash of the public ECDSA key with network byte prefix. Is that getting closer?

if the address is a Base58 encoded string with a checksum and starts with character "1" then you are correct (of course it is RIPEMD160 hash of SHA256 of pubkey).
but we have a couple of address types that correspond to different script types: P2PKH, P2SH, P2WPKH and P2WSH. the first two use Base58 encoding with version byte 0 and 5 and the next two use Bech32 encoding with hrp=bc.

Just in case anyone else is interested in the python code the mirrors the above logic that seems to work well for the outputs scripts found in the EARLY blockchain file. It takes 160-bit RIPEMD-160 hash as an input adds the network bytes prefix for mainnet ('00') and outputs the base 58 rendition:
this is essentially the same bug that those block explorers have.
may i ask what exactly are you trying to do by reproducing this mistake?
full member
Activity: 173
Merit: 120
You should update your Bitcoin core client, 0.19.0.1 and latest version won't derive an address from P2PK outputs using decoderawtransaction or getrawtransaction "TXID" "true".

For example  Block #1 coinbase:
output scriptPubKey (HEX) --> base 58 address, i.e.,
410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac -->12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
If you want to get the address despite the fact that it shouldn't be converted into an address,
you can convert it through any "pub key to address" code that supports uncompressed pub key using the public key which is in that scriptPubKey (highlighted):

41 - push 65 bytes to stack
0496.............58ee - Public key
ac - OP_CHECKSIG
Thank you nc50lc and pooya87 for the help.  I am not using bitcoin core client, but instead I am trying to learn about bitcoin's early blockchain at a lower level by parsing the blockchain files. It was real early uncompressed pubic key usage that was throwing me off, but with your pointers I was able to figure it out.  It is unfortunate that all blockchain viewing websites use the 'address' label so often.  I think I understand now what they call an 'address' (i.e., 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX) is really a Base 58 encoded rendering of the 160-bit (i.e, 20 byte) ripemd160 hash of the public ECDSA key with network byte prefix. Is that getting closer?

In any case it was the really early uncompressed 64 byte public key (with '04' prefix) that was confusing me, but your tips and the following useful site helped me to write the python code that works for the early blockchain files. I know it won't work beyond the first couple of years.  After the initial uncompressed 64 byte public key usage, the pubkey appeared to transition quickly to the 20 byte ripemd160 hash so my decode entry point for those was #4 instead of #2 below:

http://gobittest.appspot.com/Address

1 - Public ECDSA Key
0496B538E853519C726A2C91E61EC11600AE1390813A627C66FB8BE7947BE63C52DA7589379515D 4E0A604F8141781E62294721166BF621E73A82CBF2342C858EE
2 - SHA-256 hash of 1
6527751DD9B3C2E5A2EE74DB57531AE419C786F5B54C165D21CDDDF04735281F
3 - RIPEMD-160 Hash of 2
119B098E2E980A229E139A9ED01A469E518E6F26
4 - Adding network bytes to 3
00119B098E2E980A229E139A9ED01A469E518E6F26
5 - SHA-256 hash of 4
D304D9060026D2C5AED09B330B85A8FF10926AC432C7A7AEE384E47B2FA1A670
6 - SHA-256 hash of 5
90AFE11C54D3BF6BACD6BF92A3F46EECBE9316DC1AF9287791A25D340E67F535
7 - First four bytes of 6
90AFE11C
8 - Adding 7 at the end of 4
00119B098E2E980A229E139A9ED01A469E518E6F2690AFE11C
9 - Base58 encoding of 8
12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX


Just in case anyone else is interested in the python code the mirrors the above logic that seems to work well for the outputs scripts found in the EARLY blockchain file. It takes 160-bit RIPEMD-160 hash as an input adds the network bytes prefix for mainnet ('00') and outputs the base 58 rendition:

Code:
import hashlib
import binascii
import base58

def Ripe160HashtoBase58(ripemd160hash):
        d = int('00' + ripemd160hash+ hashlib.sha256(hashlib.sha256(binascii.unhexlify('00' + ripemd160hash)).digest()).hexdigest()[0:8],16)
        return '1' + str(base58.b58encode(d.to_bytes((d.bit_length() + 7) // 8, 'big')))[2:-1]

Ripe160HashtoBase58('119B098E2E980A229E139A9ED01A469E518E6F26') --> '12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX'


 when faced with the uncompressed pubkey starting with the '04' (following the '41' size byte) like the above example I extract the 130 hex byte public key and generate the ripemd160hash (steps 2 & 3) before feeding it to the function above.

Code:
            ripemd160 = hashlib.new('ripemd160')
            ripemd160.update(hashlib.sha256(binascii.unhexlify(130hexbytePK)).digest())
            ripemd160hash = ripemd160.hexdigest ()
legendary
Activity: 2534
Merit: 6080
Self-proclaimed Genius
You should update your Bitcoin core client, 0.19.0.1 and latest version won't derive an address from P2PK outputs using decoderawtransaction or getrawtransaction "TXID" "true".

For example  Block #1 coinbase:
output scriptPubKey (HEX) --> base 58 address, i.e.,
410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac -->12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX
If you want to get the address despite the fact that it shouldn't be converted into an address,
you can convert it through any "pub key to address" code that supports uncompressed pub key using the public key which is in that scriptPubKey (highlighted):

41 - push 65 bytes to stack
0496.............58ee - Public key
ac - OP_CHECKSIG
legendary
Activity: 3472
Merit: 10611
in a bitcoin transaction there is no "addresses", there are only scripts. and an address is human readable form of scripts defined ONLY for a handful of them: P2PKH, P2SH, P2WPK and P2WSH. no other script has a defined address.
the early coinbase transactions and any other very early transaction are using P2PK scripts which have no equivalent address.

the fact that some block explorers such as blockchain.com are showing an address is their bug and they are too lazy to fix it.
full member
Activity: 173
Merit: 120
how do I technically extract the address from the output scriptPubKey for early coinbase transactions?  bonus credit if python code  Grin

For example  Block #1 coinbase:
output scriptPubKey (HEX) --> base 58 address, i.e.,
410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da758937951 5d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac -->12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX

Thanks in advance!

https://www.blockchain.com/btc/tx/0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098

Code:
{
"txid": "0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098",
"hash": "0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098",
"version": 1,
"size": 134,
"vsize": 134,
"weight": 536,
"locktime": 0,
"vin": [
{
"coinbase": "04ffff001d0104",
"sequence": 4294967295
}
],
"vout": [
{
"value": 50,
"n": 0,
"scriptPubKey": {
"asm": "0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee OP_CHECKSIG",
"hex": "410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac",
"reqSigs": 1,
"type": "pubkey",
"addresses": [
"12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX"
]
}
}
]
}
Jump to: