Author

Topic: How to create block query tool (python/c++)? (Read 1598 times)

legendary
Activity: 1428
Merit: 1093
Core Armory Developer
October 01, 2011, 01:31:46 PM
#12
FYI:  I have officially released my code under the AGPL v3.  My thread about it is here:  https://bitcointalksearch.org/topic/pybtcengine-btc-backend-in-python-with-cswig-46485

The example code shows you how to do everything you wanted (@BkkCoins), except for updating the blockchain file in real time.  However, it only takes about 20s to reload and rescan the blockchain, so you could set your code to call BlockDataManager.Reset() then BDM.readBlkFile_FromScratch() again, periodically.



legendary
Activity: 1428
Merit: 1093
Core Armory Developer
September 27, 2011, 12:37:40 AM
#11
BkkCoins,

I was thinking about what you said you wanted the library for.  I realized that I didn't have a good way to find the sender of a given input:  the sender is not always identified in a TxIn, so you have to use the BlockDataManager to go find the TxOut.  Anyways, I squashed a bug and added some methods to make this a lot easier, definitely worth updating if you already checked it out.  (and if you prefer this over genjix's tool... I'm sure his sql database would be easy to access in python, too).

Here's some sample python code to get you started with my code if you plan to use it (this is scattered throughout testswig.py, but I extracted the important parts).

Code:
from sys import path
path.append('..')   # I'm running this from the cppForSwig directory
from pybtcengine import *
from datetime import datetime
from BlockUtils import *

# Create the BlockDataManager, scan the chain, organize the block headers
bdm = BlockDataManager_FullRAM.GetInstance()
bdm.readBlkFile_FromScratch('../blk0001.dat')  # point this to your blkfile!!!
bdm.organizeChain();

# At this point all blockheaders and txs are loaded into memory.
# Can access instantaneously by hash or by height
# Get block 100,014 because it's got 6 diverse Tx
someBlk = bdm.getHeaderByHeight(100014)

# A helper method to convert header times to a pretty format
def unixTimeToFormatStr(unixTime, formatStr='%Y-%b-%d %I:%M%p'):
   dtobj = datetime.fromtimestamp(unixTime)
   dtstr = dtobj.strftime(formatStr)
   return dtstr[:-2] + dtstr[-2:].lower()

# Another helper to convert raw, 20-byte address values to Base58
def hash160ToAddr(hash160):
   b = PyBtcAddress().createFromPublicKeyHash160(hash160)
   return b.getAddrStr()

# Going to print senders, receivers
print 'TxList for block #', someBlk.getBlockHeight()
topTxPtrList = someBlk.getTxRefPtrList()
print 'NumTx:', len(topTxPtrList)
for txptr in topTxPtrList:

   # We print big-endian because we like to to be able to put it in BlockExplorer to verify
   print '\nTx:', binary_to_hex(txptr.getThisHash().toString(), BIGENDIAN)[:16],

   # Each Tx has a pointer to the header of the block it's included in
   blkHead = txptr.getHeaderPtr()
   print 'Blk:', blkHead.getBlockHeight(),
   print 'Timestamp:', unixTimeToFormatStr(blkHead.getTimestamp())

   # Print the TxIns
   for i in range(nIn):
      txin = txptr.getTxInRef(i)
      if txin.isCoinbase():
         print '\tSender:', ''.center(34),
         print 'Value: 50 [probably]';
      else:
         print '\tSender:', hash160ToAddr(bdm.getSenderAddr20(txin).toString()),
         print 'Value:',  coin2str(bdm.getSentValue(txin))
        
   # Print the TxOuts
   for i in range(nOut):
      txout = txptr.getTxOutRef(i)
      print '\tRecip: ', hash160ToAddr(txout.getRecipientAddr().toString()),
      print 'Value:', coin2str(txout.getValue())



You can probably construct your BTC webs pretty easily by combining calls from above.  If you have problems, make sure you're passing binary forms, not hex, and whether you converted to or from a BinaryData object.  All python methods require binary python strings.  All C++/SWIG methods require BinaryData objects (create with BinaryData(pyBinaryStr) and back via bindata.toString()).

-Eto

P.S. -- I also cleaned up some code and put most methods in the .cpp files so you can scan the .h files easier to find the methods you're looking for.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
September 26, 2011, 02:52:19 PM
#9
Gavin, that's a very good idea.  I have never even heard of mmap(), but it sounds like a very useful tool.  In fact, I had some projects at work where this could've been very useful.  Why didn't you tell me about this sooner?!  Smiley  (and apparently it works for interprocess communication?  also useful!)

At the moment, I think I have plenty of time before this becomes a necessity, but I'll definitely keep it in mind as work to expand it.  Though, in the long run I plan to convert/fork it into a headers-only-but-save-my-own-tx-data implementation.
legendary
Activity: 1652
Merit: 2301
Chief Scientist
September 26, 2011, 02:18:50 PM
#8
Very cool etotheipi.

Have you tried using mmap() to page the blockchain file into RAM instead of copying it explicitly? Operating systems are typically very well optimized for accessing mmap()'ed files.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
September 26, 2011, 01:36:41 PM
#7
Yes, I am very pleased with its performance.  It uses about 1.2 GB in RAM to hold everything, and it's speed comes from the idea of copying the entire blockchain into RAM in a single copy operation (120MB/s for my HDD), then using only references and pointers to locations in that chunk of data for everything else--removing the need for extraneous copy operations.

At the moment, this FullRAM implementation was the easiest way for me to do everything (and also the fastest), and perfectly okay for at least a year (at the moment, it's for my own use and I have 8GB of RAM).   So I implemented it with BlockObjRef objects to reference RAM, but can be later updated to reference file locations instead of RAM.  Then when the blockchain is too big, I can do my initial scan in pieces, and leave the bulk of the data on the disk, retrieving it on demand.  The maps of headers and tx hashes/refs will fit into RAM no problem for a very long time.

Bear in mind there is some learning curve to the way I organized things.  But I don't know any full BTC implementation that would avoid this (BTC is complicated...).  Everything is held in maps of pointers/references -- you can get from any one piece of information to another, it just may take a couple hops through memory.  Luckily, SWIG handles the pointers/references very well.  I haven't played with it too much (just got it working yesterday), but so far I haven't found any issues with it.

I was hoping to open-source this project eventually, but hadn't decided on it yet until I read your post.  So it's probably short on documentation.  But I'll be working on that in the near future.  Don't be afraid to ask questions, and if there's some functionality it's missing, please let me know so that I can add it--whatever it is should probably be there anyway...


hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
September 26, 2011, 01:05:58 PM
#6
That's sounds pretty amazing. I'll have to look at it. How does it handle when the block chain is too big for RAM? Will it work with low memory now or is that a future improvement. Using Abe it took me a few hours to read into Sqlite3 and it takes about 1.5 GB on disk. I've haven't yet explored how I can use this Sqlite3 DB.

My end goal is to scan an address and then output an SVG diagram showing transactions leading in/out from the address to aid in seeing money flows.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
September 26, 2011, 09:58:06 AM
#5
Funny you bring this up.  I just got my Python/C++/SWIG implementation of blockchain scanning in place, yesterday.   You can access it here.

It successfully reads the entire blockchain into RAM (while the blockchain is still small enough to fit there), populates maps of blockheaders/hashes and txs/hashes, and will even do a fresh scan of the entire wallet finding all the txouts/txins and calculating balances.  The python library has the ECDSA signing, verification and address calculations, but is not very good at scanning the blockchain which is why I did it in C++ and pulled it into python with SWIG.  I just finished some testing last night: I can do all the scanning, blockchain organization, indexing, and wallet balances/unspenttxouts from scratch in about 30s (which is pretty good considering the blockchain is currently 600MB+).

EDIT: I misspoke when I said "wallet":  I'm not talking about the official BTC wallet files, I'm talking about creating a C++ BtcWallet object filling it with 20-byte hashed-publickey addresses.

All the tools are there and mostly tested.  For now, there is a "unittest.py" file which should demonstrate most of the python code that is there, BlockUtilsTest.cpp to see how to use the C++ code, and testswig.py for examples on mixing the two.  The makefile has a "make" for making the C++ code, and "make swig" to run swig and compile the shared object.  This is all in Linux, but I have run and tested the C++ code (alone) in Windows XP with MSVS 2010.

This is what I was looking for 2 months ago and decided to do it myself.  Holy hell has it been educational!    

-Eto

P.S. - It's funny you bring this up, because my next goal was to write a PyQt application for exploring the blockchain.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
September 25, 2011, 04:51:58 AM
#4
Ya, I think this will work. I don't even need to interface to the block chain with this.

I can run Abe as a backend and let it update the SQL database and then write my own Python tool that simply uses the SQL database.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
September 25, 2011, 12:45:20 AM
#3
Thank you. Having a look now...
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
September 25, 2011, 12:27:22 AM
#1
Is there a library or module that can be used to build a CLI python or c++ tool?

I'd like to write something that does the same kind of stuff as blockexplorer but I would want to query the block chain locally not via http calls. I want to use the transaction info to generate an output document directly.

If there is an existing program that I can browse and take code from that would work too. I've been looking thru past forum posts but can't really find anything.

Ideas?
Jump to: