Pages:
Author

Topic: [ANNOUNCE] Abe 0.7: Open Source Block Explorer Knockoff - page 49. (Read 220986 times)

legendary
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
That's incredible
It took several days to read 124k blocks of the Bitcoin blockchain, but the version already read 128k blocks in less then 6 hours
donator
Activity: 2772
Merit: 1019
Now I know I'm making the mistake of including blocks from orphaned chains.

Is there an easy way to exclude blocks from orphaned chains in a sql query?

Yes, well not too hard: select b.* from block b join chain_candidate cc on (b.block_id = cc.block_id) where cc.chain_id = ? and cc.in_longest = 1


I see, of course. Thanks a lot for your help.

Hey, someone gave me a donation for a chart I made about coin age (pretty interesting: https://bitcointalksearch.org/topic/m.459678) using bitcoin-abe. Couldn't have done without, so I'm passing on part of it to you.
hero member
Activity: 481
Merit: 529
I hope this is the right place to ask this and not considered offtopic.
I don't object if moderators don't.  I guess we could also use the Github issue system.  (The issue here is a lack of database documentation or user guide.)

Now I know I'm making the mistake of including blocks from orphaned chains.

Is there an easy way to exclude blocks from orphaned chains in a sql query?

Yes, well not too hard: select b.* from block b join chain_candidate cc on (b.block_id = cc.block_id) where cc.chain_id = ? and cc.in_longest = 1
donator
Activity: 2772
Merit: 1019
I hope this is the right place to ask this and not considered offtopic.

I'm having some fun writing queries for my (finally finished after half an additional day of importing, thanks again for the optimization, John) bitcoin-abe db (example: https://bitcointalksearch.org/topic/m.458441).

Now I know I'm making the mistake of including blocks from orphaned chains.

Is there an easy way to exclude blocks from orphaned chains in a sql query?
hero member
Activity: 481
Merit: 529
Versions 0.4.1 and 0.5 released today.  0.4.1 contains mostly minor fixes since 0.4.  0.5 contains dramatic speed improvements, new back-ends, and more.

New in 0.5 - 2011-08-16

* Big speed improvement for address history and transaction pages.

* Big load time improvement for SQLite: below 10 hours for the BTC chain.

* MySQL supported.

* Oracle supported, but slow due to lack of transparent bind variable use in cx_Oracle.

* BBE-compatible HTTP API functions: nethash totalbc addresstohash hashtoaddress hashpubkey checkaddress

* New HTTP API functions: translate_address decode_address

* Online list of API functions (/q).

* Native BeerTokens currency support.

* Many minor improvements; see the Git log.


New in 0.4.1 - 2011-08-16

* Security enhancement: refer to orphan blocks by hash, not height.

* Fixed bugs affecting new chains defined via the configuration.

* Warn, do not exit, if a block file is missing or unparsable.

* Abe parses the new merged-mining block field, CAuxPow.

* Decrement the value returned by getblockcount for compatibility.

* Bug fix: remove '-' from parenthesized amounts.

* Fixed previous/next block links on /chain/CHAIN/b/NUMBER pages.

* Accept "var += val" in configuration as equivalent to "var = val"
  where "var" has not been defined.

* Added --commit-bytes option to adjust the database commit interval.

* Minor robustness and cosmetic improvements.

Enjoy!
hero member
Activity: 481
Merit: 529
Still thinking about various design issues though.

One more request relevant to Abe.  Would you please design the library with alternative chains in mind?  This is a must-have for Abe, which has good support for non-BTC currencies and a correspondingly inclined user base.

Quote from: libbitcoin/bitcoin.sql
CREATE DOMAIN amount_type AS NUMERIC(16, Cool CHECK (VALUE < 21000000 AND VALUE >= 0);
21 million is BTC-specific.

Quote from: libbitcoin/bitcoin.sql
CREATE DOMAIN address_type AS VARCHAR(110);
I'm not sure what you have in mind for addresses, but bear in mind that different currencies use different "address type" bytes and thus different address spaces.  Abe doesn't store addresses at all, just the public key hash160.  It efficiently looks up an address by extracting the pubkey hash (base 58 decode) and using that as a key.  It even does initial substring searches for addresses by converting the substring into one or a few pubkey hash ranges.

Address owners can (and do) use the same key pair in different networks under different names (addresses).  I expect this to become more common, and a frequent query will be to find balances of a given pubkey hash in all known currencies.

I also anticipate chain splits where both sides remain active indefinitely, especially with the advent of merged mining.  I like your span_left/span_right system for tracking small side chains, since it gives me a quick way to tell if any given block is an ancestor to another.  But I'm not sure how to apply it to this case.  Would you consider moving the span columns to a new table indexed by block_id and chain_id, corresponding to Abe's chain_candidate?

Cool stuff.  Thanks.
legendary
Activity: 1232
Merit: 1076
Thanks.

I'm disconnecting from the forums for a few days to focus. I can be contacted using my email on the front of bitcoin.org
hero member
Activity: 481
Merit: 529
I'm not so sure that it's good to prune out the previous_output_hash since when getting an entire block, you want the entire info (which includes the hashes)- especially for displaying on a block explorer type website, hashing the transaction or performing the OP_CHECKSIG .etc
Good points.
legendary
Activity: 1232
Merit: 1076
I'll probably add cumulative difficulty into the blocks very soon.
Cool.

Still thinking about various design issues though.

I might suggest one thing from experience with Abe...

Quote from: libbitcoin/bitcoin.sql
CREATE TABLE inputs (
...
    previous_output_id INT,
    previous_output_hash hash_type NOT NULL,
    previous_output_index BIGINT NOT NULL,

A hash per input will be a large fraction of storage space.  Abe avoids this by putting unlinked inputs in a separate table and then deleting them when it finds the output:

Quote from: bitcoin-abe/DataStore.py
CREATE TABLE txin (
    txin_id       NUMERIC(26) PRIMARY KEY,
    tx_id         NUMERIC(26) NOT NULL,
    txin_pos      NUMERIC(10) NOT NULL,
    txout_id      NUMERIC(26),
...)...

CREATE TABLE unlinked_txin (
    txin_id       NUMERIC(26) PRIMARY KEY,
    txout_tx_hash BIT(256)    NOT NULL,
    txout_pos     NUMERIC(10) NOT NULL,
    FOREIGN KEY (txin_id) REFERENCES txin (txin_id)
)

After the unlinked_txin is deleted, you can get the previous output's transaction hash via txin.txout_id to txout.tx_id to tx.tx_hash.


I'm not so sure that it's good to prune out the previous_output_hash since when getting an entire block, you want the entire info (which includes the hashes)- especially for displaying on a block explorer type website, hashing the transaction or performing the OP_CHECKSIG .etc

The fact is that you'll be looking up the previous hash so often that it'd be a wasted optimisation and not gain too much either (storage is cheap today). Also impact on performance is minimal since we've got fetching blocks down to being blindingly FAST.

There could be merit in updating the previous_transaction_id field when you connect the blocks, but I can't see a use case that often for cycling *backwards* in the blockchain (forwards yes, but not backwards). You only really look backwards once - during the initial block-chain verification which doesn't quite warrant the effort of adding this field.
donator
Activity: 2772
Merit: 1019
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long.  Block 62,000 is only about 5% of the way through, since the blocks are bigger now.
I've committed rewrites of two queries that MySQL did not fully optimize.  My (very informal) testing suggests a speedup of 10x.  Let me know if it helps.


yesss! at least 10 times! Thanks so much, now it seems feasable to get the data in Wink You're awesome.

I'll answer your questions from earlier post as soon as my mysql-workbench (which I updated and currently crashed consistently on startup) is back to operational state.
hero member
Activity: 481
Merit: 529
I'll probably add cumulative difficulty into the blocks very soon.
Cool.

Still thinking about various design issues though.

I might suggest one thing from experience with Abe...

Quote from: libbitcoin/bitcoin.sql
CREATE TABLE inputs (
...
    previous_output_id INT,
    previous_output_hash hash_type NOT NULL,
    previous_output_index BIGINT NOT NULL,

A hash per input will be a large fraction of storage space.  Abe avoids this by putting unlinked inputs in a separate table and then deleting them when it finds the output:

Quote from: bitcoin-abe/DataStore.py
CREATE TABLE txin (
    txin_id       NUMERIC(26) PRIMARY KEY,
    tx_id         NUMERIC(26) NOT NULL,
    txin_pos      NUMERIC(10) NOT NULL,
    txout_id      NUMERIC(26),
...)...

CREATE TABLE unlinked_txin (
    txin_id       NUMERIC(26) PRIMARY KEY,
    txout_tx_hash BIT(256)    NOT NULL,
    txout_pos     NUMERIC(10) NOT NULL,
    FOREIGN KEY (txin_id) REFERENCES txin (txin_id)
)

After the unlinked_txin is deleted, you can get the previous output's transaction hash via txin.txout_id to txout.tx_id to tx.tx_hash.
hero member
Activity: 481
Merit: 529
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long.  Block 62,000 is only about 5% of the way through, since the blocks are bigger now.
I've committed rewrites of two queries that MySQL did not fully optimize.  My (very informal) testing suggests a speedup of 10x.  Let me know if it helps.
legendary
Activity: 1232
Merit: 1076
I'll probably add cumulative difficulty into the blocks very soon.

Still thinking about various design issues though.
hero member
Activity: 481
Merit: 529
Are you on IRC? You should hit me up on #bitcoinconsultancy
Thanks, but I have too many time obligations (ages 5 and 2) to justify regular IRC use.

You can quickly query the total difficulty for a chain using a single SQL command:

SELECT SUM(to_difficulty(bits_head, bits_body)) FROM blocks WHERE span_left=X AND span_right=X;

Will give you the total difficulty in chain X (you have to have to_difficulty() defined as: bits_body * 2^[8*(bits_head - 3)] )

Great, but this presumably reads every block row.  Are you familiar with the /q/nethash function?  Abe reads every Nth (default: 144th) block's block_chain_work and subtracts the previous from the current value as part of the netHashPerSecond computation.  So I do only one query that returns (at the moment) 980 rows for 979 resulting values.  (It reads two table rows for every row it returns due to the trick by which I implement "every Nth" in portable SQL.)

I suppose not all applications would benefit from this optimization, and I could bolt it onto your schema via trigger or patch if necessary.  (Joys of open source!)
hero member
Activity: 481
Merit: 529
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long.  Block 62,000 is only about 5% of the way through, since the blocks are bigger now.

While waiting for the import, I reverse-engineered the schema using mysql-workbench.
Thanks!  Please update to the latest commit on the master branch and restart your import (it'll pick up where it left off) since I've indexed txin.txout_id.  Not sure if it will help here, though.  I won't be surprised if there's unneeded table scanning on import.  PostgreSQL took a few hours on my 4-year-old laptop last I tried, but Pg may implement foreign keys differently, and Abe currently relies on foreign keys for implicit indexing.

You could add txout->pubkey since MySQL supports null foreign keys.  Apparently some do not.  Anyway, txout.pubkey_id is indexed, I hope?
donator
Activity: 2772
Merit: 1019
Thanks for the report.  I'm considering a totally new initial import design with table indexes replaced by Python variables.

Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.

While waiting for the import, I reverse-engineered the schema using mysql-workbench.



Maybe this helps some people. I can also provide .pdf or .mwb files if someone wants them.

It seems some relations are "missing" (like txout, pubkey,...?). I didn't want to put them in the diagram, because I wanted it to exactly reflect the db schema.
legendary
Activity: 1232
Merit: 1076
Just wanted to point out my project here (libbitcoin): https://bitcointalksearch.org/topic/libbitcoin-30646

Ooh!  Lots of yumminess coming down the pike.  I look forward to using your library to help display transactions before they get into a block.

It'd be nice to harmonize our database schemata somewhat.  While I wrap my head around libbitcoin's span_left/span_right reorg strategy, I suggest storing total_difficulty in blocks (where it is immutable) rather than chains.  Abe relies on this (block_chain_work) to estimate network hash rate over a range of blocks by fetching just the start and end blocks.

Thanks!


Are you on IRC? You should hit me up on #bitcoinconsultancy

You can quickly query the total difficulty for a chain using a single SQL command:

SELECT SUM(to_difficulty(bits_head, bits_body)) FROM blocks WHERE span_left=X AND span_right=X;

Will give you the total difficulty in chain X (you have to have to_difficulty() defined as: bits_body * 2^[8*(bits_head - 3)] )
hero member
Activity: 481
Merit: 529
Just wanted to point out my project here (libbitcoin): https://bitcointalksearch.org/topic/libbitcoin-30646

Ooh!  Lots of yumminess coming down the pike.  I look forward to using your library to help display transactions before they get into a block.

It'd be nice to harmonize our database schemata somewhat.  While I wrap my head around libbitcoin's span_left/span_right reorg strategy, I suggest storing total_difficulty in blocks (where it is immutable) rather than chains.  Abe relies on this (block_chain_work) to estimate network hash rate over a range of blocks by fetching just the start and end blocks.

Thanks!
legendary
Activity: 1232
Merit: 1076
Thought you might find this info useful:

Just wanted to point out my project here (libbitcoin): https://bitcointalksearch.org/topic/libbitcoin-30646

Making great progress and it's functional. There's a working application in examples/poller.cpp that simply downloads the bitcoin blocks + transactions + everything into postgresql so you can inspect it. Someone could easily throw a web gui around this and...

voila! blockexplorer clone Smiley

https://gitorious.org/libbitcoin/libbitcoin/blobs/master/examples/poller.cpp
hero member
Activity: 481
Merit: 529
First of all, John, thanks a lot for Abe, it's awesome as far as I can tell.
Thanks!

I'm currenlty importing blocks into a mysql db (at block 16000 after about 1.5 hours), it'll takes ages (as in: days), but that's ok for me.
Thanks for the report.  I'm considering a totally new initial import design with table indexes replaced by Python variables.

One problem I found was with the datadir table:
[...]
Maybe you should consider using an INT PK here and an index on dirname if you need it?
Good idea.  Done.  The upgrade is not so bad, since Abe has machinery for it.  This is actually the seventeenth schema change handled by --upgrade. Smiley

Other than that, keep going! I sent my small donation Smiley
Thanks, I hope it works for you!
Pages:
Jump to: