Author

Topic: New blockchain management in Armory (Read 2604 times)

legendary
Activity: 2128
Merit: 1074
February 20, 2013, 04:25:32 PM
#12
Given the vast difference between these two their abstraction layers are also very different, so just by choosing one you have already locked yourself into a technology.
Actually the above is a common misconception. It is only true in a very generic cases. In the cases like a Bitcoin client we would have essentially a single application using a single database schema.

With such a combination it is possible to write a trivial custom database driver. It could use regexp to match the handfull of the necessary SQL statements like "SELECT * FROM values WHERE key=?" and return "not implemented" error in all other cases.

I don't have a firm opinion whether this is a suitable approach for Armory. But I'm positive that it could be a great win for the single-user-in-a-single-process database engines like LevelDB. It may be a simplest way to allow the database to be shared between multiple processes and multiple users as well as a true decoupling of the data storage from the user interface.
hero member
Activity: 560
Merit: 500
I am the one who knocks
February 20, 2013, 09:45:44 AM
#11
My 2 cents:

I work as LOB (line of business) application developer in the private sector and I am pulling my experience from there.

There are 2 basic DB types that have been brought up here: relational and K,V (or NoSQL).

Given the vast difference between these two their abstraction layers are also very different, so just by choosing one you have already locked yourself into a technology.

I would design an API that is purpose built for armory, a custom abstraction layer if you will.  That way if in the future you decide that you want to swap LevelDB our for membase, or mongo, or even a full blown Oracle instance you only need to change your underlying API calls.

Given what I have seen of your code so far I would bet you were planning on doing that already.

I hear 2112's pain as in our line of work you get many executives that read or hear buzzwords and then demand (for no good reason) that 'we are going nosql now!'

However I do not agree that K,V stores are necessarily a step backward, they do have their purpose and as you pointed out etotheipi they are damned fast.  (See http://readwrite.com/2010/08/18/membase-the-database-powering)

I think LevelDB is the correct choice at this time.  I don't think that most Armory users would benefit form the full features of a RDBMS, while at the same time any kinks that come up with LevelDB will most likely be addressed by the core team first (or at least will have some insight).

As an aside:
However I personally wish that as a community we could use SQLite for the underlying wallet format.  The reason is it would support a standard schema as well as purpose built client additions without breaking each other.  Also you wouldn't be 'locked out' you can read/write SQLite files on any OS.  However I don't think that is going to happen
hero member
Activity: 547
Merit: 500
Decor in numeris
February 20, 2013, 05:31:05 AM
#10

Apparently, Armory (and the Satoshi client) will face a tremendous challenge very soon.  In a very worrying post in these forums (can't find it now, sorry), the poster plotted the blockchain size versus time on a log scale.  It was a nice, straight line: exponential growth.  And much faster than Moore's law, the size growth by a factor 10 every year.

This is really bad news.  Today, the block chain is 5GB.  That is big, but manageable.  Next year it will be 50 GB, and take a significant fraction of the SSD space on my main laptop.  In the spring 2015 it will be 500 GB.  Armory will need a really good database structure, and preferably be able to not store all the block chain, but only unspent transactions.  Of course against exponential growth we all loose, and soon no-one be able to run a full node.  Assuming a solution is found to that, the amount of data that need to be stored by Armory is still going to be formidable.  People worry about whether future versions of Armory will exist and be able to restore their paper wallets.  They should not, instead they should worry about whether they will be able to find a computer able to run Armory Smiley

Of course there is a limit on the size of a block, which would switch the growth to linear.  But there is going to be a tremendous pressure to release that limit, as otherwise it will be the transaction fees that grow exponentially, killing bitcoin.

While writing I also want to throw in another comment:

Etotheipi, you should really consider 2112's suggestion to use an abstraction level seriously.  The Satoshi client has long been suffering from a badly performing database engine, obviously it has not been easy to replace it.  I suspect that the choice of engine was both rational and sensible when it was made, but was overtaken by the development.   An abstraction layer might have made it easier to replace.  Of course you worry about the performance penalty of such a layer.  However, most database applications use huge amounts of data in a performance critical way, so I would expect most popular abstraction layers to be quite efficient, otherwise they would have died out.  Unfortunately, I cannot be any more specific or helpful, as my knowledge of database technology is minimal:  I have heard that MongoDB should be worth looking into. Smiley
legendary
Activity: 2128
Merit: 1074
February 17, 2013, 05:29:16 PM
#9
2112,

You're still missing the point.  As I've said in other threads -- I get a sense that you know what you're talking about, but you provide nothing actionable -- only criticism.  Yet, threads like this would benefit from actual experience and advice you might have instead of just criticism.  I'd like to have a rational conversation with you, but I can't because the only tool in your toolbox is for insulting people's intelligence.  

Here's how this thread could've gone:

etotheipi:  Hey, how's this database structure look?
2112:  Have you considered using a relational DB engine instead of LevelDB?  It looks like that's what you're trying to recreate and LevelDB isn't production quality.
etotheipi:  Possibly.  LevelDB has properties A,B,C,D and E all of which fit perfectly into my app.  My use case is simple enough that I don't mind the lack of relation-handling.
2112:  You really shouldn't use LevelDB because it's a toy.  Take a look at SomeDB1, it's got properties A,C,D, and E, and also adds F and G
etotheipi:  But I really want property B for some reason, and G is not that important to me
2112:  Try SomeDB2, it has properties A,B,C,E,F.  You could change your structure to storing blocks like <> and headers and merkle roots like <> and you'd have everything you want.
etotheipi:  I've never heard of SomeDB2, I'll look into it.  Thanks!

See how pleasant that is?  No one has to be insulted, and you're experience/expertise is pleasantly communicated in a way that I don't have to lose face to even begin to agree with any of your points.  It's not that avoiding "losing face" is my number one priority, it's that your writing style (which I still can't tell if it's intentional or not) immediately turns the whole thread hostile.  You seem to want to insult more than help.  You could have a constructive relationship with the users on this forum that you are trying to help.
But I did provide an actionable advice:
I know that you have a copy of Visual Studio and you theoretically could prototype your design using the dirty, grimy, foul-smelling, ...ugh... relational ...ouch... SQL database engine really quick.
Of the "alternative clients" category I see only grau (and his supernode) as the only developer with the broad familiarity of the database technology. Maybe take a look at his code if you don't want to prototype with the Visual Studio? Database abstraction layers aren't a recent invention. They are well known, many of them are available, there's a plenty of places to look for inspiration and discussion of pros and cons.
In other words: you cannot get married to any database engine this early in the project. Any advice to the contrary would be a misdirection. You need to find either:

1) a database abstraction layer for Python that suits your style and needs

2) define your own abstraction layer in Python after prototyping using 1 or 2 (maybe 3) schemas/databases.

Either solution will work. I'm actually not familiar enough with your needs to give you definite answer. For Java I used JDBC. On Windows and for prototyping I used ODBC & ADO. I like using Windows for prototyping because the intense competition on that market facilitates making rational choices even if the real target isn't Windows. Most of the time I used solution (2) because of the existence of the domain-specific standards for database interface and I did my implementations in a mixture of C/C++/domain-specific languages, not Python.

Consider this hypothetical discussion:

etotheipi:  Hey, I've choosen Intel GMA for Armory display engine. Any comments?
2112: Dude, prototype first, then make a choice.
etotheipi: Die in a fire! AMD, NVidia, Intel or GTFO?
2112: No really, there are abstraction layers that will allow you to make that selection last, once you exactly know and can measure your needs.
etotheipi: OK, I hear ya. Qt looks like a decent layer that will isolate me from the vagaries of graphic display market. It looks like pain it the neck, but I need to learn some way of not painting myself into the corner.
2112: Hurray!

Honestly, I've never been so deep in the rabit hole that I've seen the firmament as only a single star. Even when I was in a hole I knew that there is more than one star in the sky and if I can get my nose away from the grindstone I'll see at least couple of stars, not just a single one. I've both: (a) seen the sky with my own eyes and (b) in the school they told me that there's a plenty of stars to choose from.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
February 16, 2013, 11:27:07 PM
#8
It is too bad that you've activated your "reading between the lines" skill and started writing emotional responses. I never called you "inadequate".

2112,

You're still missing the point.  As I've said in other threads -- I get a sense that you know what you're talking about, but you provide nothing actionable -- only criticism.  Yet, threads like this would benefit from actual experience and advice you might have instead of just criticism.  I'd like to have a rational conversation with you, but I can't because the only tool in your toolbox is for insulting people's intelligence. 

Here's how this thread could've gone:

etotheipi:  Hey, how's this database structure look?
2112:  Have you considered using a relational DB engine instead of LevelDB?  It looks like that's what you're trying to recreate and LevelDB isn't production quality.
etotheipi:  Possibly.  LevelDB has properties A,B,C,D and E all of which fit perfectly into my app.  My use case is simple enough that I don't mind the lack of relation-handling.
2112:  You really shouldn't use LevelDB because it's a toy.  Take a look at SomeDB1, it's got properties A,C,D, and E, and also adds F and G
etotheipi:  But I really want property B for some reason, and G is not that important to me
2112:  Try SomeDB2, it has properties A,B,C,E,F.  You could change your structure to storing blocks like <> and headers and merkle roots like <> and you'd have everything you want.
etotheipi:  I've never heard of SomeDB2, I'll look into it.  Thanks!

See how pleasant that is?  No one has to be insulted, and you're experience/expertise is pleasantly communicated in a way that I don't have to lose face to even begin to agree with any of your points.  It's not that avoiding "losing face" is my number one priority, it's that your writing style (which I still can't tell if it's intentional or not) immediately turns the whole thread hostile.  You seem to want to insult more than help.  You could have a constructive relationship with the users on this forum that you are trying to help.
legendary
Activity: 2128
Merit: 1074
February 15, 2013, 12:22:41 PM
#7
LevelDB is very simple, the code has a permissive license and can be bundled directly into the codebase, and it creates very space efficient databases that are encapsulated in isolated directories that are easy to bundle and move around.  And it's also damned fast.  Accessing the data in key-sort order is about twice as fast as I have been able to achieve with raw, low-level operations.  Some benchmarks with favorable performance.  (though, while looking up those links, I see some benchmarks for BangDB which are even better, but it sounds new and I've never heard of it)

I'm not saying other DB engines can't do that.  I'm saying that LevelDB meets my needs and has all the properties I want.  The fact that it isn't relational doesn't bother me because I don't really need it -- the data it is storing is rather simple.

So, instead of simply criticizing my ideas and telling me how inadequate I am at developing applications, why don't you make recommendations for how you would do it?  If you are familiar with other DB engines that have a permissive licence, will not result in terrible linking problems, create nice encapsulated DBs in directories, and still has very good performance, I'll look into it.  You seem awfully good at criticizing, but you'd be much more credible & useful if you actually contributed to the discussion.
It is too bad that you've activated your "reading between the lines" skill and started writing emotional responses. I never called you "inadequate".

What you are doing is the mistake of looking at the benchmarks of speed. When storing financial data the interesting benchmarks should include things like "time to detect & recover from a bit flip error". With your 32GB non-ECC RAM machine you've already encountered this problem.

I would be normally calling those mistakes "novice", "beginner" & "naive". But sometimes they aren't any of those. There may be some other, hidden, motive to choose to hard-code a toy database engine. For the outsider it may look irrational, but once the hidden motive is known they are rational. I admire Gavin Andresen for stating it plainly why the core development group wont fix this problem in the Satoshi's client:

https://bitcointalksearch.org/topic/m.1170970

You certainly have an adequate (or better) skill in choosing the level of general-purpose programming language. Your way of splitting Armory between the C++ & Python shows that you know how to make tradeoff between the execution speed and the programmer speed.

On the database programming language level you've made a choice that is an equivalent of programming in the assembly language. From the emotional tone of your last answer all I can say that there is some sort of unstated motive for this choice.

Of the "alternative clients" category I see only grau (and his supernode) as the only developer with the broad familiarity of the database technology. Maybe take a look at his code if you don't want to prototype with the Visual Studio? Database abstraction layers aren't a recent invention. They are well known, many of them are available, there's a plenty of places to look for inspiration and discussion of pros and cons.

I've made an attemt to share my significant experience with programming in the data-integrity-is-the-king sectors like finance, medicine and gambling. You've choosen to call my attempts an "insult to people's inteligence". What else can I do besides shrug my arms and say "fare thee well?".

https://bitcointalksearch.org/topic/m.1193260

This is a public forum, for each 1 person posting there will be approximately 10 who will just read and understand it. Of those 10 people there will be probably 2-3 who can learn on other people's mistakes and use that knowledge in their current or future projects.
hero member
Activity: 700
Merit: 500
February 15, 2013, 03:51:09 AM
#6
LevelDB is very simple, the code has a permissive license and can be bundled directly into the codebase, and it creates very space efficient databases that are encapsulated in isolated directories that are easy to bundle and move around.  And it's also damned fast.  Accessing the data in key-sort order is about twice as fast as I have been able to achieve with raw, low-level operations.  Some benchmarks with favorable performance.  (though, while looking up those links, I see some benchmarks for BangDB which are even better, but it sounds new and I've never heard of it)

I'm not saying other DB engines can't do that.  I'm saying that LevelDB meets my needs and has all the properties I want.  The fact that it isn't relational doesn't bother me because I don't really need it -- the data it is storing is rather simple.

So, instead of simply criticizing my ideas and telling me how inadequate I am at developing applications, why don't you make recommendations for how you would do it?  If you are familiar with other DB engines that have a permissive licence, will not result in terrible linking problems, create nice encapsulated DBs in directories, and still has very good performance, I'll look into it.  You seem awfully good at criticizing, but you'd be much more credible & useful if you actually contributed to the discussion.

Your database implementation is awesome. I've switched my hotwallet to Multibit for RAM related reasons, but Armory's database implementations are Primo.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
February 14, 2013, 10:50:44 PM
#5
LevelDB is very simple, the code has a permissive license and can be bundled directly into the codebase, and it creates very space efficient databases that are encapsulated in isolated directories that are easy to bundle and move around.  And it's also damned fast.  Accessing the data in key-sort order is about twice as fast as I have been able to achieve with raw, low-level operations.  Some benchmarks with favorable performance.  (though, while looking up those links, I see some benchmarks for BangDB which are even better, but it sounds new and I've never heard of it)

I'm not saying other DB engines can't do that.  I'm saying that LevelDB meets my needs and has all the properties I want.  The fact that it isn't relational doesn't bother me because I don't really need it -- the data it is storing is rather simple.

So, instead of simply criticizing my ideas and telling me how inadequate I am at developing applications, why don't you make recommendations for how you would do it?  If you are familiar with other DB engines that have a permissive licence, will not result in terrible linking problems, create nice encapsulated DBs in directories, and still has very good performance, I'll look into it.  You seem awfully good at criticizing, but you'd be much more credible & useful if you actually contributed to the discussion.
legendary
Activity: 2128
Merit: 1074
February 14, 2013, 11:25:12 AM
#4
I actually have no clue what you're talking about.  Storing transactions comments for display in the transaction ledger is completely unrelated to this thread.   That was a case of something that was really simple, turning into something still really simple, but the code wasn't optimized for the worst case.  I'm not going to bring in bulky, complicated relational database engines just to store some tx and address comments when I didn't need it for anything else. 
Actually it is 100% relevalnt. What you were doing with Inaba and what you are doing right now is designing a network-model database schema using a k,v-primitive storage. So you are approximately in 1970-1980 as far as database technology evolution went.

Relational-model databases and abstract query languages were invented to decouple the database schema from the application.

If you had data-bound control over relational-model database in Armory the whole problem posed by Inaba would be about 15 minutes of work: create index on comments and hook it up to the appropriate column in the Armory's window.
As for the comment that a relational database engine might be good -- well yes, that is a valid suggestion.  And one I'm not oblivious to.  There's a lot of value in keeping things simple, and dependencies to a minimum.  LevelDB is remarkably fast, and its databases are extremely efficient (space-wise), maintained in standalone directories (easy to zip/tar and distribute), and the source code can be bundled directly into the project.  These are all very valuable properties for me.  There is some relational nature to the data being stored, but it's really not that complicated, and I'd prefer the fine-grained control using a DB engine that is simple and I know how to optimize for it.

On the other hand, if there's a good reason to believe that it won't work, or that there's so many other reasons a relational DB engine would be preferred... I would appreciate that discussion.  But within the scope of what I've describe, the theoretical capability of LevelDB is perfectly fine.  As long as there is not some underlying vulnerabilities/problems with the implementation that will cause heartache later.
There isn't anything wrong about what you wrote above. You are just going to retrace the evolution of database technology from the last century. If this is your itch then you are free to scratch it, and I'm the last person that will be trying to dissuade you from that. So if your goal is to spruce up your resume before applying for a job at a NonSQL shop then go ahead.

On the other hand if you just want to develop an useable Bitcoin client then you are definitely on the wrong path. You are working alone. Google has hordes and they can afford parallel development by throwing one-coder per possible schema; code them all and pick the best. When the requirements change: demote the former best coder and promote the one with schema matching new requirements. The iteration time on schema redesign will drag down a lone programmer like you.

Beacuse you really don't know how the Bitcoin will evolve you can't rely on one fixed schema. The "view" concept from relational-model is a crucial prototyping tool. I'm positive that Google has such tools internally for their BigTable DB. LevelDB was just a toy to allow people to experiment with k,v-primitives without having to run the distributed farms where the k,v-databases shine and are really needed.

So what that LevelDB has sufficient theoretical capability? Single-tape Turing machine has it too. It is the programmer iteration time that matters. By the time you actually implement your network-model schema and profile its storage access patterns the relational-model developer could implement and profile hundreds or thousands. With that knowledge the relational-model developer can shift from rapid prototyping to implement the known-best-relational-model with the network-model tools for added performance.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
February 14, 2013, 10:26:44 AM
#3
So how can I further improve this?  Am I missing use cases?  Am I crazy?
About two months ago you had this discussion with Inaba:

https://bitcointalksearch.org/topic/m.1387216

The two of you have gotten really close to reinventing the data-bound control from Visual Studio. When you were talking about search & filtering functionality it sounded like you wanted to reinvent the visual query builder from Microsoft Access or Microsoft SQL Server.

I know that you have a copy of Visual Studio and you theoretically could prototype your design using the dirty, grimy, foul-smelling, ...ugh... relational ...ouch... SQL database engine really quick.

I also know that this wasn't a feeback you are looking for, therefore I voluntarily jump into your "ignore" file making the "plonk" sound while hitting the bottom. I just need to say things once to be able to say "I told them so" in the future.

Relevant posts from about half-a-year ago about playing the open-source poker at the NonSQL table:

https://bitcointalksearch.org/topic/m.1046848

https://bitcointalksearch.org/topic/m.1046473


I actually have no clue what you're talking about.  Storing transactions comments for display in the transaction ledger is completely unrelated to this thread.   That was a case of something that was really simple, turning into something still really simple, but the code wasn't optimized for the worst case.  I'm not going to bring in bulky, complicated relational database engines just to store some tx and address comments when I didn't need it for anything else. 

As for the comment that a relational database engine might be good -- well yes, that is a valid suggestion.  And one I'm not oblivious to.  There's a lot of value in keeping things simple, and dependencies to a minimum.  LevelDB is remarkably fast, and its databases are extremely efficient (space-wise), maintained in standalone directories (easy to zip/tar and distribute), and the source code can be bundled directly into the project.  These are all very valuable properties for me.  There is some relational nature to the data being stored, but it's really not that complicated, and I'd prefer the fine-grained control using a DB engine that is simple and I know how to optimize for it.

On the other hand, if there's a good reason to believe that it won't work, or that there's so many other reasons a relational DB engine would be preferred... I would appreciate that discussion.  But within the scope of what I've describe, the theoretical capability of LevelDB is perfectly fine.  As long as there is not some underlying vulnerabilities/problems with the implementation that will cause heartache later.
legendary
Activity: 2128
Merit: 1074
February 14, 2013, 09:59:27 AM
#2
So how can I further improve this?  Am I missing use cases?  Am I crazy?
About two months ago you had this discussion with Inaba:

https://bitcointalksearch.org/topic/m.1387216

The two of you have gotten really close to reinventing the data-bound control from Visual Studio. When you were talking about search & filtering functionality it sounded like you wanted to reinvent the visual query builder from Microsoft Access or Microsoft SQL Server.

I know that you have a copy of Visual Studio and you theoretically could prototype your design using the dirty, grimy, foul-smelling, ...ugh... relational ...ouch... SQL database engine really quick.

I also know that this wasn't a feeback you are looking for, therefore I voluntarily jump into your "ignore" file making the "plonk" sound while hitting the bottom. I just need to say things once to be able to say "I told them so" in the future.

Relevant posts from about half-a-year ago about playing the open-source poker at the NonSQL table:

https://bitcointalksearch.org/topic/m.1046848

https://bitcointalksearch.org/topic/m.1046473
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
February 14, 2013, 08:53:30 AM
#1
This is just a brain-dump to get a sanity check on my logic before going and basing future versions of Armory on it.  The idea here is to have Armory start managing its own blockchain data, but in a manner that is optimized for all forseeable use-cases.  Keep in mind, this is LevelDB, and part of my design is based on how LevelDB works best (which is probably common to a lot of DB engines).  LevelDB is a very simple but highly-optimized DB engine, which simply stores (key,value) pairs, where both keys and values are any string.   LevelDB handles some degree of "ACID" operations, and it should behave very much like a disk-based version of a C++ std::map.

So here's my list of considerations in designing the new database:

  • (1) LevelDB has key-order-optimized access:  if you access lots of data in key-sort order, it's super fast.  If you do lots of random access... well it's not optimized for that.  Here's how this matters:  if you have (TxID, TxSerialized) pairs in a database, and you want to scan all transactions in a given block, you are going to be accessing data all over the DB.  Each tx is retrieved from a different part of the disk and LevelDB doesn't know what data to prefetch for your next request.  On the other hand, if you were to prefix each TxID with the 4-byte block height, then doing a full blockchain scan would be very optimized because each the next tx you will be requesting is the next key when sorted.  LevelDB will be loading the subsequent blocks as you process the previous ones.  However, this particular example isn't useful, since this doesn't let you do random lookup of tx without knowing the block number -- this is just an example of how we might shape our data to leverage LevelDB optimizations.
  • (2) The structure should accommodate a clean transition to lite client mode (i.e. not have to redesign it)
  • (3) The structure should accommodate a clean transition to pruned-blockchain operations
  • (4) The structure should accommodate a clean transition to maintaining an address-indexed view of the blockchain
  • (5) Minimal duplication of tx data -- ideally we'd only ever have TxIDs in the database once.  Duplicating header data is not so bad, since that's strictly linear as a function of time, but tx data isn't (consider that there are 10 million tx, meaning that each extra copy of txids in the DB is another 320 MB).

With all this in mind, I have decided on the following set of databases (use || to represent concatenation of data):

Code:
HeaderHashDB:  key=HeaderHash(32),              value=BlockHeight(4)
TxHashDB:      key=TxID(32),                    value=BlockHeight(4)||TxIndex(4)
BlockDataDB:   key=BlockHeight(4)||TxIndex(4)   value=SpecialSerializedTx(VAR)
               key=BlockHeight(4)||FFFFFFFF     value=RawHeader(80)
MerkleDB:      key=MerkleRoot(4)                value=TxID0(32)||TxID1(32)||...||TxIDN(32)  (optional, not sure if it's really necessary)
OrphansDB:     key=HeaderOrTxHash(32)           value=RawHeaderOrTx(80orVAR)
EDIT:  Should probably just have different DB altogether for RawHeaders, so that it is easy to download separately as a way to seed a new node.
A few things about this:

(1) BlockDataDB can store a variable amount of data -- full nodes will have one entry for every tx, but you don't have to have every tx:  you can store just the ones you need.  The TxHashDB for a lite node only needs to store the transactions that are relevant to that node, and BlockDataDB only needs to have those particular transactions available (and the BlockHeight||TxIndex keys will be non-existent for those other tx).  You will have the 0xFFFFFFFF entry for each block height to maintain those headers.
(1a) The structure of BlockDataDB achieves the goal of #1 above -- the sort-order for the all the keys in the BlockDataDB is the tx-order in the blockchain.  If you're going to do a full scan, you only need to do a simple iteration over the entire database, and LevelDB will pre-fetch subsequent data for you.  I have tested this with some experimental LevelDB code and it's stupid fast... much faster than my own low-level, highly-optimized blockchain scanning code in C++.  

(2) TxIDs and HeaderHashes are only stored exactly once (unless you also store the merkle DB, which I'm not sure is needed).  If you need to verify the TxID, look it up in the TxHashDB know where to fetch the full Tx from the BlockDataDB and hash it to make sure it matches.  

(3) SpecialSerializeTx:  this is a special way of serializing the transactions to allow for variable amount of data depending on the node type.  This satisfies #3 above while still allowing for full reconstruction of the tx if you happen to have all pieces of it.
NormalSerializedTx:   Version||  N||TxIn0||TxIn1||...||TxInN||  M||TxOut0||TxOut1||...||TxOutM||LockTime
SpecialSerializedTx:  FFFFFFFF||(NormalSerializeTx-with-blank-TxOuts)||  TxOutIndex0(4)||TxOut0  ||  TxOutIndex1(4)||TxOut1  ||...||  TxOutIndexM(4)||TxOutM

(3a) Just like (1a) above, this serialization accommodates any storage level:  full nodes will have the full-tx-with-blank-txouts and every TxOut.  Super pruned-and-lite nodes can skip FFFFFFFF entry, and exclude TxOuts it doesn't care about.  So if you know that you own a TxOut in Tx X, then you can lookup Tx X and there will only be one entry: the TxOut that you own.

(4) The orphans DB is simply for juggling data that does not yet have a home -- i.e. if you just re-org'd you want to keep the old block/tx data around, but you don't have a blockheight to attach it to, so you put it in the orphans DB until you need it again.

The downside of this is that to get from a header hash to header data you have to do two DB lookups (HeaderHashDB to get blockheight, then go look up blockheight||FFFFFFFF in the BlockDataDB).  Same for looking up Tx hashes.  But I don't think it will be so bad -- at least the header map will probably be fully cached because it is small.

Haven't tackled #4 (address-indexed database view), but I think it can be just a separate database that piggybacks on the structures of the DBs described above.  I know it won't be simple, but it's also out of scope for now.  I just want to make sure I don't end up having to redesign this structure when I finally do decide to tackle #4.

So how can I further improve this?  Am I missing use cases?  Am I crazy?
Jump to: