libbitcoin - page 13. | Bitcointalksearch.org

genjix

legendary

Activity: 1232

Merit: 1076

Quote from: vector76 on August 14, 2011, 08:37:17 AM

Great work, I just have a question about the span. I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock. I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.

I'm still working on ProcessBlock (verifying the blocks) and everything below it (ConnectBlock/DisconnectBlock). You can see an example of how the span works in src/storage/postgresql_storage.cpp:reorganize_block_chain()... That function will be moved internally and within verify_block under include/bitcoin/verify.hpp

Indeed you're correct that all the spans of the parents would have to be incremented if you fork the block chain. This is OK since it can be done in a simple single UPDATE, inserting a new block is an infrequent operation (doesn't have to be as fast as SELECTs) and it's not a big deal if adding a new block isn't immediate.

Far more important is the select speed. If you run the poller and postgres benchmark:

Code:

$ ./bin/examples/poller dbname dbuser dbpass localhost:8333
... wait to download all the blocks
$ make psql
$ time ./bin/tests/psql
... run benchmark to fetch 100 blocks

With my current setup it takes 10 secs to fetch 100 blocks or 0.1 secs for 1 block. That means 4 hours to fetch the entire block-chain of 140k blocks. Using a rule of thumb that verifying the block-chain will be 10 times as slow, that might mean 40 hours to verify all 140k blocks. ATM I plan to just bundle the first 120k pre-verified blocks in the code, then later see about optimising this.

But I am running it over a VPS on rackspace, so that also could be a reason for the speed too.

It should be possible for you to easily modify the current storage engine I'm using to enable sqllite... The library supports multiple backends. See the constructor in src/storage/postgresql_storage.cpp:postgresql_storage() ... The sql_ object is being explicitly initialised with postgresql parameters. I don't mind making this a generic sql_storage backend if you're able to get it to work

The current way of joining the blocks using reorganize_block_chain() will be removed and done as part of the block-chain verification- that's the best way I see currently to do it... But I am doing more research of the current bitcoin sourcecode to make sure I understand everything fully. The current bitcoin is very inefficient/obtuse and there's a whole bunch of ways to do things quicker or nicer.

vector76

member

Activity: 70

Merit: 18

Great work, I just have a question about the span. I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock. I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.

ThomasV

legendary

Activity: 1896

Merit: 1355

this is fantastic. thank you!!!

davout

legendary

Activity: 1372

Merit: 1008

1davout

Quote from: jackjack on July 26, 2011, 07:33:51 AM

Monitoring this awesome thread!

+1

jackjack

legendary

Activity: 1176

Merit: 1280

May Bitcoin be touched by his Noodly Appendage

Monitoring this awesome thread!

A pic has been deleted: http://forum.bitcoin.org/index.php?topic=30646.msg384841#msg384841

genjix

legendary

Activity: 1232

Merit: 1076

ATM it's a very basic library. You can construct transactions and send them over the network. Also you can download blocks/txs and partially verify scripts.

What's missing mainly is the transaction abstraction engine.

TODO:
phase 2:
- OP_CHECKSIG
phase 3:
- EC keypair generating
- High level construction of transactions.
- session class (session->connect(); session->send_funds("1djkj3jk3kj3k", 1000000000); session->get_balance(); .etc)
phase 4:
- Lastly would be to make a gui, but I'm hoping others will build them as this library becomes more mature.
- Fully complete scripting engine that isn't disabled.

As you can see, ATM it's 0.1alpha release.

Progress can be followed at, https://bitcoinconsultancy.com/wiki/index.php/Libbitcoin#Roadmap

Mike Hearn

legendary

Activity: 1526

Merit: 1136

Could you summarize what this library is capable of and what remains to be done? I took a quick look but it seems to be mostly outlines so far.

grondilu

legendary

Activity: 1288

Merit: 1080

(registering to this thread)

genjix

legendary

Activity: 1232

Merit: 1076

Maybe there is some merit to storing the script as:

"dup hash160 [1a 4b ... data ... ff 65] equalverify checksig"

And parsing them in and out.

The primary reason I stored them the way I do is because that's how they're represented internally:

https://gitorious.org/libbitcoin/libbitcoin/blobs/master/include/bitcoin/script.hpp

LightRider

legendary

Activity: 1500

Merit: 1022

I advocate the Zeitgeist Movement & Venus Project.

Good effort, thank you so much for doing this work.

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

Quote from: jrmithdobbs on July 21, 2011, 12:48:13 AM

For a node, maybe. What about for data processing? Say you just want to quickly find all scripts with the same destination address.

Yes there are cases in which you want to search in a script, but they are not very common. Usually you want to optimize for the most common case first.

Quote from: jrmithdobbs on July 21, 2011, 12:48:13 AM

The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.

Maybe; I wouldn't be so quick to call it 'negligible', as I do know that for the current client, database accesses are by far the biggest source of performance issues.

Anyway, don't do too much premature optimisation, it might not be useful yet to discuss in this phase.

jrmithdobbs

newbie

Activity: 67

Merit: 0

Quote from: wumpus on July 21, 2011, 12:36:43 AM

I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.

For a node, maybe. What about for data processing? Say you just want to quickly find all scripts with the same destination address. The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.

wumpus

hero member

Activity: 812

Merit: 1022

No Maps for These Territories

I think it is a nice design overall, but

Quote from: genjix on July 20, 2011, 09:51:31 PM

A script consists of a serious of opcodes and optional data.

Code:

script_id_type = SEQUENCE

operations:
   operation_id
   script_id
   code
   data

A typical script might appear like:

Code:

operation_id script_id code data
132 45 dup NULL
133 45 hash160 NULL
134 45 special 89 ab cd ef ab ba ab ba ab ba ab ba ab ba ab ba ab ba ab ba
135 45 equalverify NULL
136 45 checksig NULL

To fetch the script:

Code:

SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;

I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.

genjix

legendary

Activity: 1232

Merit: 1076

Quote from: error on July 20, 2011, 09:46:19 PM

Nice work! You'll also have a C++ API, right?

Yeah the C++ API is the default, but I want to provide something for C users too later. And also Ruby, Perl, Python, Java .etc through SWIG bindings.

genjix

legendary

Activity: 1232

Merit: 1076

A document describing the rationale behind the database structure.

Axioms

Starting with these basic rules, we design our database.

   * Blocks are mostly sequential.
   * Blocks contain a list of transactions.
   * Transactions contain a list of inputs + outputs.
   * Blocks can be free floating (orphan blocks).
   * Transactions can be free floating (not yet in a block).
   * There is one sequence of events that we accept as the truth at any one time.

From big to small:

   * Blocks
   * Transactions
   * Inputs
   * Outputs
   * Scripts
   * Operations

Block-chain

The block-chain is a series of blocks chained on in a linear sequence. The block-chain is an append only database of all transactions in the network.

What's accepted as true, is the chain with the highest difficulty. Difficulty is computed using a function from the 'bits' field of the block. Given an entire chain, we can calculate the difficulty using:

Code:

SELECT SUM(COMPUTE_DIFFICULTY(bits)) FROM our_chain;

Occasionally this chain of events may be conflicting when the chain splits in half because there are two next rival blocks trying to append to the current chain. Always, the chain with the highest cumulative difficulty is preferred over the others.

It can happen that more blocks build on a chain that previously had lower difficulty and so take over the main chain. This is termed a block-chain re-organisation. But as more blocks build off the main chain, that becomes exponentially less likely until eventually it becomes a near impossibility. Branches off the main block-chain can be pruned once they reach a certain depth that it's near unlikely they will ever become the main branch again.

We can represent our blocks using this schema:

Code:

blocks:
   block_id
   block_hash
   depth -- block number in the chain
   span_left
   span_right
   ... other fields

Code:

block_id depth span_left span_right
0 0 0 3
1 1 0 3
2 2 0 2
3 2 3 3
4 3 3 3
5 3 0 2
6 4 3 3
7 4 0 0
8 5 0 0
9 4 1 2
10 5 1 2
11 6 1 2
12 6 0 0
13 7 0 0
14 8 0 0
15 7 1 1
16 7 2 2

In the diagram above blocks are represented by black dots. The block_id is a unique identifier for each new block inserted into the database. Each of the dots above would have a unique block_id.

Depth is the vertical number on the left, and is synonymous with a block's block-number.

Span left and right indicates the range of chains that a block exists in. A block with a span_left of 0, and a span_right of 3, would exist in chains 0, 1, 2 and 3. Using a span, we can easily select a particular chain:

Code:

SELECT * FROM blocks WHERE span_left=0 AND span_right=0;

Or select the common point between a series of chains:

Code:

SELECT MAX(depth) FROM blocks WHERE span_left>=0 AND span_right<=2;

Once span_right starts to exceed a value (say 10) and the depth since the chain forked exceeds a certain value, then that chain can be pruned:

Code:

DELETE FROM blocks WHERE span_left=10 AND span_right=10;

The total span range of all blocks should equal all the number of stored chains. If we have a span of [0, 10] then there should be 11 spans from 0 to 10. A table exists to keep track of all the chains.

Code:

blockchains:
   chain_id
   max_depth
   last_block_id
   total_difficulty

Transaction

Nothing unusual happens here, except that transactions can be included more than once in multiple chains. A seperate table exists to track the parents for a transaction.

Code:

transaction_parents:
   transaction_id
   block_id
   index_in_block

transactions:
   transaction_id
   transaction_hash
   ... other fields

outputs:
   parent_id -- transaction parent
   index_in_parent
   output_type -- normal, generated, other
   address
   ... other fields

inputs:
   parent_id
   index_in_parent
   previous_output_id
   ... other fields

Script

A script consists of a serious of opcodes and optional data.

Code:

script_id_type = SEQUENCE

operations:
   operation_id
   script_id
   code
   data

A typical script might appear like:

Code:

operation_id script_id code data
132 45 dup NULL
133 45 hash160 NULL
134 45 special 89 ab cd ef ab ba ab ba ab ba ab ba ab ba ab ba ab ba ab ba
135 45 equalverify NULL
136 45 checksig NULL

To fetch the script:

Code:

SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;

Broadcasted inventories

This will be stored in memory for now as there are potential attack vectors to serialising inv items. An attacker could cause many disk read/writes by spamming a host potentially. More thought needs to go into this later.

Main block-chain

A convenience table is provided for the end user to read the block-chain transaction history. It contains the currently accepted version of the truth. It provides a way for the user to quickly query transactions and see funds in addresses.

The inputs and outputs are joined in a view. All the outputs which are final and have not been spent, would have no equivalently joined on input.

error

hero member

Activity: 588

Merit: 500

Nice work! You'll also have a C++ API, right?

genjix

legendary

Activity: 1232

Merit: 1076

This is the inner core which consists of a bunch of subsystems co-operating over clearly defined interfaces.

As an example one such message could command the block_chain_manager to create a transaction. It then sends the new transaction to the network component to actualise it.

The inner core is very difficult to use so we abstract it's usage to a more usable outer core. Think of the inner core like X-windows and outer core an easy to use GUI toolkit like Qt (by analogy).

Into the core we can switch in and out different components. In this example we choose to replace a MySQL serialiser/de-serialiser with an SQLite one. Maybe there could also be in-memory serialisers (never store to disk) or whatever.

A serialiser takes in a data object and then stores it. The type of component decides where + how it looks when stored. When reading the object it de-serialises in reverse.

The whole point is not to just dump binary blobs into MySQL rows but to put them there in a readable format so an admin can dissect them in case of problems. We also will only create a MySQL serialiser for the time being unless it's possible to use standard SQL and simultaneously support MySQL, PostgresSQL, SQLite.

Above the diagram for the outer core indicated an exposed C api with auto-generated Swig bindings.

We try to keep the core very simple and focused; no mining, no accounts, no GUI, no RPC server. These things can all be implemented in Python. Already genjix has Spesmilo as a Qt GUI using bitcoind that we can easily use since it has a special interface that can be re-implemented for whatever Bitcoin implementation. Accounts can be provided as a Python module or special plugin in C or C++.

As another example, by being able to switch the networking component, we can replace it with a dummy networking component that is used for debugging.

Or use one networking component for multiple instances of Bitcoin or other currencies.

We try to split everything up as much as possible to help developers long term. It makes it easier for new contributors to get started. That's the free-software way: lots of components and clear separation to allow maintainers to work on their piece of turf.

Callbacks

A good plugin+component system is a must for this project. The core will probably have to be thread-safe if we use threads for networking. Otherwise an event system could be a bottleneck since we get the worst of both worlds.

If we make a block-chain verifier subsystem, is there an object to making it one-shot (verify once or every X blocks the block chain in one go) or does it need to be after every block is downloaded?

In any case, it would be nicer to be able to hook the verifier system to a generic callback as opposed to calling it manually in the code.

boost::signals2 seems like a good choice to be considered, although it has to be seen whether there is any bottleneck. Also any callback system would have to be exposed in Python somehow

Random notes:

* Our main target is large scale deployments. Bitcoin Enterprise (utilising our libbitcoin). The Red Hat of Bitcoin, not the Ubuntu.
* Use boost::property_map?
* boost::asio is a good choice for networking.
* Externally we expose a C api for people that cannot program C++ but use C++ internally. A C ABI is also nice since it doesn't get mangled by the compiler.
* Status during initialisation for debugging/showing progress in GUI client.

genjix

legendary

Activity: 1232

Merit: 1076

libbitcoin: Product of Calafou. Endorsed by Enric Duran.

Website, http://libbitcoin.dyne.org
IRC, Freenode, #darkwallet
Forum, https://forum.unsystem.net

libbitcoin is a community of developers building the open source library, tools and implementation necesary for a free, independent and vibrant Bitcoin. In this way we are helping to build a better future.

libbitcoin believes in the revoltionary promise of Satoshi's original protocol.

The libbitcoin development project aims to create an extendable, scalable and configurable architecture, along with useful software. Making Bitcoin super-pluggable, highly configurable and easy to interact with.

Our Values:

Privacy: Bitcoin should always remain as private as possible for its users.
Scalability: Bitcoin built today with the future in mind.
Integrity: No individual or group should have enough power over the network to compromise its original aims.

Here is the git repo,
https://github.com/spesmilo/libbitcoin

By having a bitcoin library, it enables everybody to build apps really quickly using a common stable codebase. This project aims to create that architecture to be extendable, scalable and configurable in the future.

rewrite bitcoin, make it super-pluggable, very easy to do and hack everything at every level, and very configurable

Documentation: http://libbitcoin.dyne.org/doc/index.html

Slides: http://libbitcoin.dyne.org/libbitcoin.pdf

The Zen of libbitcoin
Readability over speed.
Beauty over convenience.
Simplicity over complexity.
Architected, not hacked.
Flat, not nested.
Explicit, not implicit.
Errors should be loud.
Never is better than right now.
Now is better than never.
Be flexible and configurable.
Build houses from bricks, software from modules.

libbitcoin is licensed as Lesser AGPL:
- When used in the p2p network, you only need to provide changes on demand (LGPL).
- When used on a webserver, you must proactively provide sourcecode for any changes you have made to libbitcoin.
- Applications can link against libbitcoin, and they do not need to release their changes publically.
Thanks to Aaron Williamson of the SFLC and Richard Stallman of the FSF for helping draft up this license.

Mailing list: https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin

Why the GPL?

http://arstechnica.com/gadgets/2013/10/googles-iron-grip-on-android-controlling-open-source-by-any-means-necessary/

http://hintjens.com/blog:68

Topic: libbitcoin - page 13. (Read 92531 times)