Author

Topic: Understanding Transaction #14 on block #141460 (Read 2315 times)

member
Activity: 82
Merit: 10
January 02, 2014, 03:53:10 PM
#20
Yeah, I'm still working on documentation/cleanup.  It was written with the primary intention that other programmers would build it and step through the source code as a learning exercise.

I *will* add some command line switches where you can just run it and get like a dump of keys, that sort of thing.  Just haven't gotten to that yet.

I am now working on a pretty amazing visualization tool that I think the community is going to find very interesting.  I'm writing a 3d graphics application which can visually display virtually every single block on the blockchain with a reasonable balance and then *animate* it in real-time over time.

John
legendary
Activity: 924
Merit: 1132


I like your code.  It's very clear, minimal dependencies, and conforming to language standards. 

It's usable for my purposes as is because I can write my own main function easily treating it as a library, but I think most people will actually want you to provide a "main" function that gets command line arguments and can be commanded to do a particular set of things. 

What I wanted to do is simple; list all the public keys in the blockchain.  Which makes my 'main' function about a six-liner I think. 
member
Activity: 82
Merit: 10
Yeah, I got this all fixed and working.  What I did is that I changed my code to scan in all block-headers from the files on disk.  I compute the block-hash based on each header and put it in a hash-map.  Then, once I have hit the last header block, I walk backwards using the 'previous-block' field to do a hash-table lookup until I hit the first block.  Seems to work perfectly unless, I suppose the 'last' block was an orphan, but I'm not that worried about such a rare event.

I revised all of the source and made some new graphs and posted it here.

http://codesuppository.blogspot.com/2014/01/show-me-money-scatter-graph-of-all.html
newbie
Activity: 32
Merit: 0
Ok, it looks like I finally have this fixed.  It was just the orphan blocks which were throwing me off, and now those are all taken into account.  The parser now successfully reads every single block, transaction, input, and output in the blockchain.  There are some output scripts which it fails to derive a valid public key signature for  However, in all of those cases I found both blockexplorer and blockchain.info have the same issue, so I'm not treating that is a big concern for now.

I plan to write up a specific blog post detailing all of these 'gotchas' you have to know about to stream/parse the block-chain sequentially (rather than trying to scan all of the blocks into some kind of graph/tree) up front.  There are a bunch little annoying things that are not obvious.

[Edit: Spoke too soon, I get off the rail a little further down the blockchain, still have to debug this some more...]

John

I found it easiest to naively commit blockdats into the database as I parse them.

The simplest way to derive the longest branch for me was to represent it as a function applied to the tree which returns a sequence of block entities on the branch. Then I can just cache it, intersect it with blocks to see if they're on it, or apply it to any sub-tree (like the tree at 50 confirmations) when I append new blocks.

Output scripts only happen to usually conform to some common footprints that let you extract addresses and pubkeys, so just store the binary blob in the database which I'm sure you're doing. If I recall correctly, one of the few assumptions you can make about blockdat data is that the reference client does ensure it parses into op-codes, so at least there's that!

The road to robustness is definitely a humbling journey. Hilarious, even.

Keep us updated. I have a lot gotchas etched into comments around my codebase, but "surely I'll blog about them someday!" turns out only to be a compulsive lie I tell myself to feel benevolent.
member
Activity: 82
Merit: 10
Ok, it looks like I finally have this fixed.  It was just the orphan blocks which were throwing me off, and now those are all taken into account.  The parser now successfully reads every single block, transaction, input, and output in the blockchain.  There are some output scripts which it fails to derive a valid public key signature for  However, in all of those cases I found both blockexplorer and blockchain.info have the same issue, so I'm not treating that is a big concern for now.

I plan to write up a specific blog post detailing all of these 'gotchas' you have to know about to stream/parse the block-chain sequentially (rather than trying to scan all of the blocks into some kind of graph/tree) up front.  There are a bunch little annoying things that are not obvious.

[Edit: Spoke too soon, I get off the rail a little further down the blockchain, still have to debug this some more...]

John
member
Activity: 82
Merit: 10
Thanks, I'm working on my implementation now.  For my first attempt I'm just going to skip the orphan block when I detect it and assume the next block is not an orpahan.  I have an assert in my code to see if this is, in fact, always the case.

Here is how I believe you can get a bitcoin-qt local database with orphaned blocks in it.

If you fire up bitcoin-qt with a completely fresh database, it will download the entire blockchain to your hard drive with zero orphan blocks.  However..if you leave bitcoin-qt running day to day, synchronizing to the network, then it is possible for it to write an orphaned block to your local hard drive (this is what happened to me).  One thing, perhaps, unique about my block-chain parser is that it is entirely sequential.  It reads in 4 blocks at any given time, the logical previous block, the logical 'current' block, the logical 'next', block, and the one after that.  Each cycle it reads in the 'next' block in that pattern and shuffles some pointers. 

Since I was already reading the next block, and the one after that, I think this 'fixup' approach is going to work.

Thanks,

John
legendary
Activity: 924
Merit: 1132
The two blocks with the same number should both have the hash for the same previous block, because they are both built on the same previous block.  It is the subsequent block that will disambiguate them.  The subsequent block will be built on (and therefore contain the hash of) only one of the instances of the previous block. 

I do not know why both of them are recorded in the blockchain you're looking at, though; the blocks in orphaned forks are supposed to be completely unnecessary for building a complete record of transactions. 

If the blocks in orphaned chains are in the record you're looking at, then you need to be able to turn off your transaction recording and track two (or even more) blockchains.  If both of them have a successor, you look for 'grandchildren,' and if both of them have a successor, you look for 'great-grandchildren' etc.  Sooner or later, one branch terminates in a block having no subsequent block, so you then know the other branch is the valid one.  From there you go back to the point of the fork, start recording tx again, and go forward ignoring the terminated branch.

member
Activity: 82
Merit: 10
So, when parsing the blockchain sequentially and you encounter an 'orphaned' block what is the proper way to deal with this?

Looking at the raw data stream I imagine I could 'detect' it by the fact that multiple blocks in a row point to the same 'previous' block.  Is it always safe/true that the last block is considered on the 'main' chain?

Thanks,

John
legendary
Activity: 924
Merit: 1132
So, it should be no harder to spend than any other transaction you don't have the key for.   Grin

Just find the corresponding private key and you're golden.
staff
Activity: 4326
Merit: 8951
it's still not valid since the header and checksum are off.  I suppose I could throw away the header and the checksum,

Gah. what are you guys going on about? It's just a regular, ordinary, pay to pubkey output...  there are thousands of them in the blockchain— with with ASCII garbage encoded in the public key. It can't be spent. It's not unusual, except that the pubkey is junk, which is not surprising because the protocol rules have practically no constraint on the scriptpubkeys in txouts (beyond their size). The hash160 being shown for it is correct— it's the hash160 corresponding to a normal pay-to-pubkey-hash for that "pubkey".
legendary
Activity: 924
Merit: 1132

I'm very interested in this blockchain parser.  When you're done I hope you don't mind letting others download and use it.
legendary
Activity: 1512
Merit: 1036
I just updated my long post with more descriptions, I found DER docs to decode the mystery bytes in the signature. They are part of the ASN.1 stream container that describes encoding of two ints.

I found something interesting - Bitcoin also pulls the same address out of thin air for that transaction; it is picking up on the two bytes at the end as an opcode:

Quote from: Bitcoin-Qt console
gettxout 9740e7d646f5278603c04706a366716e5e87212c57395e0d24761c0ae784b2c6 0
{
"bestblock" : "000000000000000099630b3f60c04edf89931160a381ca0a5e3bc35d1195272d",
"confirmations" : 136428,
"value" : 0.00100000,
"scriptPubKey" : {
"asm" : "4c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420676f642069736e2 774207265616c2c207375636b612e2053746f7020706f6c6c7574696e672074686520626c6f636b 636861696e207769746820796f7572206e6f6e73656e73652e OP_CHECKSIG",
"hex" : "4c684c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420676f6420697 36e2774207265616c2c207375636b612e2053746f7020706f6c6c7574696e672074686520626c6f 636b636861696e207769746820796f7572206e6f6e73656e73652eac",
"reqSigs" : 1,
"type" : "pubkey",
"addresses" : [
"1Address"
]
},
"version" : 1,
"coinbase" : false
}

This means if you want to find out where blockchain.info gets that address, you have source code. There's the scriptpubkey, with "checksig" on the end in the ASM version. I think Bitcoin must be seeing some data there, on whatever it has found as a pubkey x and y, it makes an address. If this is also the case that it uses the same pubkey and tries a checksig if someone attempts to spend it, you would need a privkey for that pubkey.

The script actually does nothing other than store a whole bunch of data to the stack. The last byte that looks like an opcode also appears pushed to the stack if I am counting bytes correctly this time. The byte vector is a VarLen, it is non-zero, and therefore the stack should be left "True" with a big integer.

I just had an interesting idea - can the "LUKE-JR IS" TXOUT be spent? I think it could be if the script was interpreted right:

A transaction is valid if nothing in the combined script triggers failure and the top stack item is true (non-zero).

I tried to spend it.
ERROR: CTxMemPool::accept() : nonstandard transaction input

Looks like no fun for now, though:
        // Check for non-standard pay-to-script-hash in inputs
        if (Params().NetworkID() == CChainParams::MAIN && !AreInputsStandard(tx, view))
            return error("AcceptToMemoryPool: : nonstandard transaction input");

This would be a transaction to spend it:
0100000001c6b284e70a1c76240d5e39572c21875e6e7166a30647c0038627f546d6e7409700000 0008c493046022100b35acef5d3f5b42ce2e72b60a2e4a52570ce5d28735a561ee48707b0806101 6902210089b752641772b17db3d7e92cef66949513fec42711fadc32b787228e1cc0b399014104e 0ba531dc5d2ad13e2178196ade1a23989088cfbeddc7886528412087f4bff2ebc19ce739f25a630 56b6026a269987fcf5383131440501b583bab70a7254b09effffffff01905f0100000000001976a 914da6475289c7f49bcd9ede6ab7203b304ffb265f288ac00000000

I'm going to figure this out, next stop, testnet. It's probably a bug if there is no hash160 and the transaction doesn't look like old-skool pay to pubkey.
sr. member
Activity: 252
Merit: 250
Guys, thanks for the detailed response.  I've made a lot of progress on my parser, though my big outstanding issue is that it is going off the rails somewhere after block #240,000.  I will debug that tomorrow.  In a previous file I hit a case where after scanning a block and expecting the block-header for the next block, instead it found a big chunk of zero bytes.  I had to just scan the file until I hit the header which was some distance further along in the file.  I don't know how/why bitcoin-qt could do this, but on my machine it does.  I have a sneaky suspicion that if I deleted my blockchain and forced it to completely redownload it, this problem would go away.  However, I don't want to do that, because I want to figure out the source of the problem as it is now on my machine.

At any rate, I experimented with generating some 3d surface graphs of the blockchain evolving over time.  It's very early stuff, but it was kind of cool to see anyway.

codesuppository.blogspot.com/2013/12/work-in-progress.html

Very cool graphs! Great work!  Smiley
member
Activity: 82
Merit: 10
Guys, thanks for the detailed response.  I've made a lot of progress on my parser, though my big outstanding issue is that it is going off the rails somewhere after block #240,000.  I will debug that tomorrow.  In a previous file I hit a case where after scanning a block and expecting the block-header for the next block, instead it found a big chunk of zero bytes.  I had to just scan the file until I hit the header which was some distance further along in the file.  I don't know how/why bitcoin-qt could do this, but on my machine it does.  I have a sneaky suspicion that if I deleted my blockchain and forced it to completely redownload it, this problem would go away.  However, I don't want to do that, because I want to figure out the source of the problem as it is now on my machine.

At any rate, I experimented with generating some 3d surface graphs of the blockchain evolving over time.  It's very early stuff, but it was kind of cool to see anyway.

codesuppository.blogspot.com/2013/12/work-in-progress.html
legendary
Activity: 2912
Merit: 1060
Sorry that's funny. I'd like to see the scripture luke jr leaves.

Ps you guys are fing smart
legendary
Activity: 1512
Merit: 1036
Here's my interpretation of the transaction:


Transaction data format version (uint32_t):
01000000

TXIN:
    TX_IN count (number of Transaction inputs, satoshi VarInt):
    01
    TXIN DATA:
    Previous txout hash:
    21eb234bbd61b7c3d31034762126a64ff046b074963bf359eaa0da0ab59203a0
    Previous txout index:
    01000000
    Script Length:
    8b
        Signature Length: (48h = 72 bytes)
        48
        ECDSA Signature (X.690 DER-encoded):
            ASN.1 tag identifier (20h = constructed + 10h = SEQUENCE and SEQUENCE OF):
            30
            DER length octet, definite short form (45h = 69 bytes) (Signature r+s length)
            45
            ASN.1[/url] tag identifier (02 = INTEGER):
            02
             Signature r length (DER length octet):
             20
             Signature r (unsigned binary int, big-endian):
             263325fcbd579f5a3d0c49aa96538d9562ee41dc690d50dcc5a0af4ba2b9efcf
            ASN.1[/url] tag identifier (02 = INTEGER):
            02
             Signature s length (DER length octet):
             21
             Signature s (first byte is 00 pad to protect MSB 1 unsigned int):
             00fd8d53c6be9b3f68c74eed559cca314e718df437b5c5c57668c5930e14140502
        Signature end byte (SIGHASH_ALL):
        01

    Key length:
    41
        Public Key prefix:
        04
        Public Key part x
        52eca3b9b42d8fac888f4e6a962197a386a8e1c423e852dfbc58466a8021110e
        Public Key part y
        c5f1588cec8b4ebfc4be8a4d920812a39303727a90d53e82a70adcd3f3d15f09
    Sequence Number:
    ffffffff

TXOUT:
    txout number:
    01
    Value in base units:
    a086010000000000
        Script Length (107 bytes):
        6b

        Script (if we were to run it):
        OP_PUSHDATA1 - The next byte contains the number of bytes to be pushed onto the stack:
        4c
        Bytes (68 = 104 bytes):
        68
        STACK DATA 104 bytes:
        4c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420
        676f642069736e2774207265616c2c207375636b612e2053746f7020706f6c6c
        7574696e672074686520626c6f636b636861696e207769746820796f7572206e
        6f6e73656e73652e

Lock Time (cannot be included before this block):
00000000


Edit: What the script does is just one opcode: push all remaining data to the stack.

The script is not opcodes though, it is exactly this in ascii:

>>> "4c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420676f642069736e2 774207265616c2c207375636b612e2053746f7020706f6c6c7574696e672074686520626c6f636b 636861696e207769746820796f7572206e6f6e73656e73652e".decode("hex")
"LUKE-JR IS A PEDOPHILE! Oh, and god isn't real, sucka. Stop polluting the blockchain with your nonsense."

To answer the question, there is probably a simple parser used by the blockchain site that looks for the hash160 opcode and uses the 160 bits after that to calculate an address. Nothing like the hash160 is in the scriptsig though. It might also scan for a public key a-la generate transaction. I put every hex string combination possible from the scriptsig through a pubkey to address and didn't get that address either.

legendary
Activity: 1512
Merit: 1036
I'm doing a reindex to help you out, if someone wants to answer faster, here's the first transaction in question:

getrawtransaction 9740e7d646f5278603c04706a366716e5e87212c57395e0d24761c0ae784b2c6

010000000121eb234bbd61b7c3d31034762126a64ff046b074963bf359eaa0da0ab59203a0010000008b4830450220263325fcbd579f5a3d0c49aa96538d9562ee41dc690d50dcc5a0af4ba2b9efcf022100f d8d53c6be9b3f68c74eed559cca314e718df437b5c5c57668c5930e1414050201410452eca3b9b42d8fac888f4e6a962197a386a8e1c423e852dfbc58466a8021110ec5f1588cec8b4 ebfc4be8a4d920812a39303727a90d53e82a70adcd3f3d15f09ffffffff01a0860100000000006b4c684c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420676f642069736e2 774207265616c2c207375636b612e2053746f7020706f6c6c7574696e672074686520626c6f636b 636861696e207769746820796f7572206e6f6e73656e73652eac00000000


blockexplorer's interpretation:

{
  "hash":"9740e7d646f5278603c04706a366716e5e87212c57395e0d24761c0ae784b2c6",
  "ver":1,
  "vin_sz":1,
  "vout_sz":1,
  "lock_time":0,
  "size":306,
  "in":[
    {
      "prev_out":{
        "hash":"a00392b50adaa0ea59f33b9674b046f04fa62621763410d3c3b761bd4b23eb21",
        "n":1
      },
      "scriptSig":"30450220263325fcbd579f5a3d0c49aa96538d9562ee41dc690d50dcc5a0af4ba2b9efcf022100f d8d53c6be9b3f68c74eed559cca314e718df437b5c5c57668c5930e1414050201 0452eca3b9b42d8fac888f4e6a962197a386a8e1c423e852dfbc58466a8021110ec5f1588cec8b4 ebfc4be8a4d920812a39303727a90d53e82a70adcd3f3d15f09"
    }
  ],
  "out":[
    {
      "value":"0.00100000",
      "scriptPubKey":"4c554b452d4a522049532041205045444f5048494c4521204f682c20616e6420676f642069736e2 774207265616c2c207375636b612e2053746f7020706f6c6c7574696e672074686520626c6f636b 636861696e207769746820796f7572206e6f6e73656e73652e OP_CHECKSIG"
    }
  ]
}


--
for reference, a normal 1 in 1 out 1 BTC transaction:
{
  "hash":"c4064f2bebbc049493207f88490fc75f52a2246952f10aed346ff5302d9ec6e8",
  "ver":1,
  "vin_sz":1,
  "vout_sz":1,
  "lock_time":0,
  "size":192,
  "in":[
    {
      "prev_out":{
        "hash":"cc2dff38e25bb23845cfc3f4c3c6d5e4d292d9bb1dd7b34d2dfad310f6ad5f95",
        "n":0
      },
      "scriptSig":"304502203b3de7314e11b0fbbc42a185a722497e209f8c83e6c67188a9d77d597a0ad3c20221008 605c83437ffbfba47dd67cbedc9bcaac9b1f36009fabfcb9faea143ba7ddc5701 020a4b9b363a66d5a9999e8362ff07e065bddd65ed07e1631e44394dfb51908263"
    }
  ],
  "out":[
    {
      "value":"1.00000000",
      "scriptPubKey":"OP_DUP OP_HASH160 2bf5301ed1b86a86cd0aa8f4c6e8dd1beed60b44 OP_EQUALVERIFY OP_CHECKSIG"
    }
  ]
}
member
Activity: 82
Merit: 10
Yeah, it's probably something like that.  Hopefully somebody who worked on the blockchain.info code can explain how they handle that specific transaction.  If I take the previous 65 bytes prior to the OP_CHECKSIG as a public key, it's still not valid since the header and checksum are off.  I suppose I could throw away the header and the checksum, but rather than doing that and just guessing, I would be interested to learn what the blockchain info guys did specifically.  At any rate, it appears to be an 'unspendable' output, and I think that's a safe/correct way to interpret it for now.

As I'm scanning the blockchain there are quite a few of these unspendable outputs (outputs without any valid public key associated with them).

For just one example, check this transcation out:

http://blockexplorer.com/tx/6d5088c138e2fbf4ea7a8c2cb1b57a76c4b0a5fab5f4c188696aad807a5ba6d8

Not that the 'To address' is 'Unknown' and of type 'Strange'.  It contains the instructions to verify a signature but no actual signature!  There's a whole bunch like this.  Presumably this means that all of these outputs are 'lost forever'.  I wonder how much that is going to add up to in the end..

Thanks,

John
legendary
Activity: 960
Merit: 1028
Spurn wild goose chases. Seek that which endures.
Hmm. Looking at the transaction, the output isn't just that string; it's that string plus OP_CHECKSIG. So even though that ASCII string is almost surely not a public key, it's being given as though it was.

My guess is that, if you apply the usual public key hash function to that string, you'll get the address 17Xbx4rf27bTjbRdwUKHrzpmg2unVXo1DB.
member
Activity: 82
Merit: 10
I'm working on some code to parse the bitcoin blockchain and I just ran into an issue with transaction #14 in block #141460.  I did a google search and found some comments about this particular transaction which has an output which just inserted some nonsense text into the script.  The specific text that it inserted is: "LUKE-JR IS A PEDOPHILE! Oh, and god isn't real, sucka. Stop polluting the blockchain with your nonsense."

The output just has this text and no destination public key that I can identify.

Here is my question.

In block-explorer it shows this output transaction as having an 'unknown' destination address and is marked as 'strange'. 

http://blockexplorer.com/tx/9740e7d646f5278603c04706a366716e5e87212c57395e0d24761c0ae784b2c6

This seems reasonable to me considering the script.

However, and this is the core of my question, on the BlockChain.info website it lists the output script with a valid destination address.

This makes no sense to me since the output doesn't have one.  What does BlockChain info do in this case?  How are they coming up with that 'destination' address?  Do they just default back to the miner's original coinbase output address?

https://blockchain.info/tx/9740e7d646f5278603c04706a366716e5e87212c57395e0d24761c0ae784b2c6

On blockchain.info it indicates that the output is going to this address: 17Xbx4rf27bTjbRdwUKHrzpmg2unVXo1DB

But I have no clue how they came up with that address since it's not contained in the output-script.

Thanks,

John
Jump to: