Best way to extract addresses from sigScript and PkScript?

riplin

member

Activity: 116

Merit: 11

Quote from: kjj on July 26, 2013, 03:02:55 PM

Yes, exactly.

Understanding this is the key to making sense of compressed keys and WIFs too.

I understand compressed keys and WIF's Wink

Anyway, I've decided not to do a simple memcmp, but rather something more generic:

Code:

        internal static Hash160 GetAddress(byte[] aScript)
        {
            Hash160 hash = null;
            ParseState parseState = ParseState.OP_ANY;
            ScriptParser parser = new ScriptParser(aScript, delegate(Opcode aOpcode, byte[] aData)
            {
                switch (parseState)
                {
                    case ParseState.OP_ANY:
                        if (aOpcode == Opcode.OP_HASH160)
                            parseState = ParseState.OP_20;

                        if ((aOpcode == (Opcode)33) || (aOpcode == (Opcode)65))
                        {
                            if (hash != null)
                                return false;

                            if (Utils.VerifyPublicKey(aData, aData.Length) != 1)
                                return false;

                            hash = new Hash160(Utils.RIPEMD160SHA256(aData));
                        }
                        break;

                    case ParseState.OP_20:
                        if (aOpcode != (Opcode)20)
                            return false;

                        if (hash != null)
                            return false;

                        hash = new Hash160(aData);
                        parseState = ParseState.OP_EQUALVERIFY;
                        break;

                    case ParseState.OP_EQUALVERIFY:
                        if (aOpcode != Opcode.OP_EQUAL && aOpcode != Opcode.OP_EQUALVERIFY)
                            return false;

                        parseState = ParseState.OP_ANY;
                        break;
                    default:
                        return false;
                }
                return true;
            });

            if (parser.Parse())
            {
                return hash;
            }

            return null;
        }

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: riplin on July 26, 2013, 02:39:33 PM

Quote from: kjj on July 26, 2013, 02:34:03 PM

In that case, why do you bother extracting the pubkey? The satoshi client just compares the script to a stored copy of the script that matches a key.

So it's building up: OP_DUP OP_HASH160 OP_EQUALVERIFY OP_CHECKSIG

and just matching on that?

Yes, exactly.

Understanding this is the key to making sense of compressed keys and WIFs too.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: riplin on July 26, 2013, 02:39:33 PM

Quote from: kjj on July 26, 2013, 02:34:03 PM

In that case, why do you bother extracting the pubkey? The satoshi client just compares the script to a stored copy of the script that matches a key.

So it's building up: OP_DUP OP_HASH160 OP_EQUALVERIFY OP_CHECKSIG

and just matching on that?

yes. but in the first block, they were using different scripts

riplin

member

Activity: 116

Merit: 11

Quote from: kjj on July 26, 2013, 02:34:03 PM

In that case, why do you bother extracting the pubkey? The satoshi client just compares the script to a stored copy of the script that matches a key.

So it's building up: OP_DUP OP_HASH160 OP_EQUALVERIFY OP_CHECKSIG

and just matching on that?

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: riplin on July 26, 2013, 02:14:56 PM

Quote from: kjj on July 26, 2013, 02:05:54 PM

If you aren't doing a full scripting engine, you should make templates and search them until you find a hit.

jl2012 gives an example of the "standard" transaction template.

Also, see https://bitcointalksearch.org/topic/m.1348297

I have a full script engine, but I'm writing an SPV node now that isn't doing full script evaluation. It just wants to extract the addresses to test against the local wallet. I guess it doesn't really matter too much if I get false positives, since they'll be filtered out by the bloom filter / local db anyway.

In that case, why do you bother extracting the pubkey? The satoshi client just compares the script to a stored copy of the script that matches a key.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I guess you only need to analyze the data from pk_scripts, from unspent outputs - the input addresses will just be tx ids (who cares).
They (txout scripts) are in general pretty much straight forward,
Recently a hash, with something.
Previously there was the entire public key.

riplin

member

Activity: 116

Merit: 11

Quote from: kjj on July 26, 2013, 02:05:54 PM

If you aren't doing a full scripting engine, you should make templates and search them until you find a hit.

jl2012 gives an example of the "standard" transaction template.

Also, see https://bitcointalksearch.org/topic/m.1348297

I have a full script engine, but I'm writing an SPV node now that isn't doing full script evaluation. It just wants to extract the addresses to test against the local wallet. I guess it doesn't really matter too much if I get false positives, since they'll be filtered out by the bloom filter / local db anyway.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: riplin on July 26, 2013, 01:45:54 PM

Hi everyone,

Is there a best practice method of extracting addresses from the sigScript and PkScript?

For example, is it safe to assume that data pushes of 20 bytes are addresses and 33 / 65 (if starting with 0x02 / 0x03 for 33 and 0x04 for 65) are public keys?

I know that there are transactions that don't have addresses, like this one: https://blockchain.info/tx/a4bfa8ab6435ae5f25dae9d89e4eb67dfa94283ca751f393c1ddc5a837bbc31b

But are there transactions that have more than one? Do i have to run the scripts in order to find out which one is actually used?

I also read somewhere that it's possible to get the public key from the signature or is that not the case?

Ultimately the is no such thing as the address.
Yes you can do these things and they will work in 90+% of cases, but never for all of them.
If you sell it to a miner, you can introduce a tx that can create a new address format - whatever you can think of.
I guess you should be able to name it then Wink

As for extracting the address from the signature - it's somehow possible, though from what I recall the output might be an address that is a false one. So it only makes sense if you have the right one, so you can check.

Peter Todd

legendary

Activity: 1120

Merit: 1164

Quote from: riplin on July 26, 2013, 01:45:54 PM

Hi everyone,

Is there a best practice method of extracting addresses from the sigScript and PkScript?

For example, is it safe to assume that data pushes of 20 bytes are addresses and 33 / 65 (if starting with 0x02 / 0x03 for 33 and 0x04 for 65) are public keys?

I know that there are transactions that don't have addresses, like this one: https://blockchain.info/tx/a4bfa8ab6435ae5f25dae9d89e4eb67dfa94283ca751f393c1ddc5a837bbc31b

But are there transactions that have more than one? Do i have to run the scripts in order to find out which one is actually used?

I also read somewhere that it's possible to get the public key from the signature or is that not the case?

I have a pull-req that might help you: https://github.com/bitcoin/bitcoin/pull/2830

Basically it adds a "decodescript" RPC call that takes a script and decodes it into human readable form. That includes the opcodes as well as the address itself, either pay-to-script-hash (standard addresses) or P2SH. (or non-standard if the script doesn't match a standard address form) It also lets you calculate the P2SH address that would correspond to that script - if you don't understand what I mean by that statement read up on P2SH.

The bigger question though is why exactly do you need to do this? Are you trying to write a library?

kjj

legendary

Activity: 1302

Merit: 1026

If you aren't doing a full scripting engine, you should make templates and search them until you find a hit.

jl2012 gives an example of the "standard" transaction template.

Also, see https://bitcointalksearch.org/topic/m.1348297

riplin

member

Activity: 116

Merit: 11

Quote from: jl2012 on July 26, 2013, 01:59:19 PM

In perl regex

^76a914(.{40})88ac$

Great, now there's another thing I have to figure out. Thanks though.

jl2012

legendary

Activity: 1792

Merit: 1121

Quote from: riplin on July 26, 2013, 01:54:08 PM

Quote from: jl2012 on July 26, 2013, 01:48:18 PM

Certainly it's not safe to assume that. Why don't you just use regular expression to extract the address?

What would be the criteria for a regular expression then? There's 20 bytes in an address, all of them random.

In perl regex

^76a914(.{40})88ac$

riplin

member

Activity: 116

Merit: 11

Quote from: jl2012 on July 26, 2013, 01:48:18 PM

Certainly it's not safe to assume that. Why don't you just use regular expression to extract the address?

What would be the criteria for a regular expression then? There's 20 bytes in an address, all of them random.

jl2012

legendary

Activity: 1792

Merit: 1121

Quote from: riplin on July 26, 2013, 01:45:54 PM

Hi everyone,

Is there a best practice method of extracting addresses from the sigScript and PkScript?

For example, is it safe to assume that data pushes of 20 bytes are addresses and 33 / 65 (if starting with 0x02 / 0x03 for 33 and 0x04 for 65) are public keys?

I know that there are transactions that don't have addresses, like this one: https://blockchain.info/tx/a4bfa8ab6435ae5f25dae9d89e4eb67dfa94283ca751f393c1ddc5a837bbc31b

But are there transactions that have more than one? Do i have to run the scripts in order to find out which one is actually used?

I also read somewhere that it's possible to get the public key from the signature or is that not the case?

Certainly it's not safe to assume that. Why don't you just use regular expression to extract the address?

riplin

member

Activity: 116

Merit: 11

Hi everyone,

Is there a best practice method of extracting addresses from the sigScript and PkScript?

For example, is it safe to assume that data pushes of 20 bytes are addresses and 33 / 65 (if starting with 0x02 / 0x03 for 33 and 0x04 for 65) are public keys?

I know that there are transactions that don't have addresses, like this one: https://blockchain.info/tx/a4bfa8ab6435ae5f25dae9d89e4eb67dfa94283ca751f393c1ddc5a837bbc31b

But are there transactions that have more than one? Do i have to run the scripts in order to find out which one is actually used?

I also read somewhere that it's possible to get the public key from the signature or is that not the case?

Topic: Best way to extract addresses from sigScript and PkScript? (Read 1627 times)