Pages:
Author

Topic: fuck you wallet format!!!! (RANT) (Read 3248 times)

legendary
Activity: 1190
Merit: 1004
May 06, 2012, 03:08:34 PM
#31
There seems to be a few simple Reed Solomon algorithms available online so I don't think it would be a major problem implementing, assuming these algorithms work.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
May 06, 2012, 02:43:13 PM
#30
Your format seems pretty good. Where would I be able to find out more about the error-correcting checksums. You say they can fix up to one byte. Sounds good enough to me but what do I know? Would people say that's the only error correction needed?

I used dumb error-correction:  it's just regular checksums as seen elsewhere in the Bitcoin protocol.  All I do is hash the field, and add the first four bytes to the end of that field.  If a byte goes bad, I just iterate through the field changing single bytes until it matches the checksum again.

Hashing like this is not really intended for error correction, but 4 bytes is enough to do it reliably, and dead simple to use it.  And given the remarkably-low frequency of hard-drive errors, one-byte should be enough.  If more than one byte in the same checksummed field went bad, there's probably bigger problems.   

I decided not to use something more appropriate like Reed-Solomon because I thought it would obfuscate the fields (i.e. -- 256 bytes of data has to be converted to 268 bytes of coefficients, meaning you need another library in order to read the data even if you don't care about the error correction).  I just recently found out that's not true under my circumstances, so I may consider switching to it on my next wallet upgrade. 



legendary
Activity: 1190
Merit: 1004
May 06, 2012, 02:34:44 PM
#29
    What do people think about other wallet formats?

    For reference, I went as far in the opposite direction as I could, when creating the Armory wallet format.  I hate the Satoshi wallet format as much as kokjo.  Armory uses a simple binary format, easy to read, and only two operations on it are ever used:  append, or overwrite-in-place-with-same-data-size.   I documented it here: 

    http://bitcoinarmory.com/index.php/armory-wallet-files

    I had two goals in mind when I made the wallet format:

    • I want 100% control of what happens in the wallet file.  Inspired by the wallet-not-actually-encrypted bug in 0.4.0
    • I want it to be dead simple for other developers to be able to read (and maybe modify) the wallet files

    There's quite a bit of extra wallet-management code to protect against corruption & errors, and enforce atomic operations, but that's in code -- it doesn't affect the simplicity for other developers to read the files.    The most important feature is that when I encrypt my wallet, the encrypted key is guaranteed to overwrite the original unencrypted key, which prevents any leaks happening when I back it up to Dropbox, etc.  Same with deleting data:  it's overwritten with zeros in-place.  I know the overwrite may not happen in-place on-disk, but there's nothing I can do about that -- at least when someone copies the wallet file from my HDD, the binary file will not have any surprises in it.
    [/list]

    Your format seems pretty good. Where would I be able to find out more about the error-correcting checksums. You say they can fix up to one byte. Sounds good enough to me but what do I know? Would people say that's the only error correction needed?
    legendary
    Activity: 1526
    Merit: 1134
    May 06, 2012, 01:41:49 PM
    #28
    I'm not sure the performance impact of maintaining key/hash160 -> tx indexes is really worth having a simpler wallet format. People who feel really strongly about this can adjust the code to build such an index and measure the impact. Block chain download/processing is already expensive enough, IMHO.
    legendary
    Activity: 1428
    Merit: 1093
    Core Armory Developer
    May 06, 2012, 07:56:43 AM
    #27
    It's because the wallet is sensitive stuff, and one of the benefits of a database engine is the ACID/atomic operations:  it guarantees that data is written as intended or not written at all  to the database.  No matter what nanosecond the power goes out, it's supposed to be impervious to corruption.   When you're talking about private keys protecting millions of dollars, it's a very good idea to have atomic operations... but it comes with the downside that the database is a kind of blackbox and you don't always know what it's doing (hence the 0.4.0 wallet-not-actually-encrypted bug).

    The ability of a database engine to provide atomic operations is no stronger than the underlying OS's ability to reliably report that all writes to a certain point have been flushed to disk when asked - using operations that anybody can call from any application, not just a database engine.

    The magic that brings an atomic operation to a database is nothing more complicated than replacing "write a record" with the following flowchart:

    1. Write a record somewhere that says you intend to make a particular write, including details of the substance of the write, so the write can be repeated if this is the only record of it.
    2. Ensure that that record is committed to disk before continuing.
    3. Make the write as intended.
    4. Make sure the write done in step 3 is committed to disk before continuing.
    5. Eliminate the record you created in step 1.
    6. Ensure that the elimination done in step 5 has completed before allowing step 1 to occur on a future write.
    7. When your program starts up, have it so that it looks for any unremoved records similar to the one created in step 1.  Confirm that they were written completely.  If so, simply perform the write operation that the record says you planned to make (which will have no effect if the prior write was successful).  If such records were not written completely, discard them.

    This is simple enough for computer science students to implement in their homework.

    The only magic that a database engine brings to the table is the ability for these seven steps to run at a high level of performance with lots of concurrent operations, in an effort to mitigate the performance penalty of tripling the burden of doing writes.

    Since a Bitcoin wallet is only updated eternities apart (in terms of compute time, especially when it is limited only to data created or changed by the user), the perceived performance penalty of doing 3 writes instead of 1 ought to be so negligible as to make a full blown database engine completely unnecessary even when ACID properties are desirable.

    Well this is exactly what I do in my wallet, except with an extra step of repeating 1-5 on a backup immediately after the main file is updated.  I do this so that if the file gets corrupted I have a guaranteed working backup, and the flag files tell me which one is the corrupted one...

    HOWEVER: one of the criticisms of this technique (which I would think equally applies to any app trying to do atomic operations) is that you don't have control over when data actually gets written to disk.  And there's no guarantee that the writes happen in the same order that you issued them. 

    legendary
    Activity: 1862
    Merit: 1011
    Reverse engineer from time to time
    May 06, 2012, 07:01:02 AM
    #26
    Thread title made me laugh. A user in YT going by the name TheRadBrad often says such phrases. He is funny Cheesy
    vip
    Activity: 1386
    Merit: 1140
    The Casascius 1oz 10BTC Silver Round (w/ Gold B)
    May 06, 2012, 06:46:04 AM
    #25
    It's because the wallet is sensitive stuff, and one of the benefits of a database engine is the ACID/atomic operations:  it guarantees that data is written as intended or not written at all  to the database.  No matter what nanosecond the power goes out, it's supposed to be impervious to corruption.   When you're talking about private keys protecting millions of dollars, it's a very good idea to have atomic operations... but it comes with the downside that the database is a kind of blackbox and you don't always know what it's doing (hence the 0.4.0 wallet-not-actually-encrypted bug).

    The ability of a database engine to provide atomic operations is no stronger than the underlying OS's ability to reliably report that all writes to a certain point have been flushed to disk when asked - using operations that anybody can call from any application, not just a database engine.

    The magic that brings an atomic operation to a database is nothing more complicated than replacing "write a record" with the following flowchart:

    1. Write a record somewhere that says you intend to make a particular write, including details of the substance of the write, so the write can be repeated if this is the only record of it.
    2. Ensure that that record is committed to disk before continuing.
    3. Make the write as intended.
    4. Make sure the write done in step 3 is committed to disk before continuing.
    5. Eliminate the record you created in step 1.
    6. Ensure that the elimination done in step 5 has completed before allowing step 1 to occur on a future write.
    7. When your program starts up, have it so that it looks for any unremoved records similar to the one created in step 1.  Confirm that they were written completely.  If so, simply perform the write operation that the record says you planned to make (which will have no effect if the prior write was successful).  If such records were not written completely, discard them.

    This is simple enough for computer science students to implement in their homework.

    The only magic that a database engine brings to the table is the ability for these seven steps to run at a high level of performance with lots of concurrent operations, in an effort to mitigate the performance penalty of tripling the burden of doing writes.

    Since a Bitcoin wallet is only updated eternities apart (in terms of compute time, especially when it is limited only to data created or changed by the user), the perceived performance penalty of doing 3 writes instead of 1 ought to be so negligible as to make a full blown database engine completely unnecessary even when ACID properties are desirable.

    legendary
    Activity: 1428
    Merit: 1093
    Core Armory Developer
    May 05, 2012, 10:10:07 PM
    #24
    So you have a solution which combines the safety and the simplicity of each option, yet it hasn't been implemented into the mainstream client?  I might have to make armory my main wallet if something similar never hits the main client.

    Bear in mind that my solution is not a perfect replacement for ACID operations.  It's only a "good" solution (as far as I can tell).  However, I think it's a lot less important with deterministic wallets.  Would-be wallet corruption is so rare to begin with.  Now take into account that 90% of that will be recoverable.  Now take into account that your wallet is completely recoverable from your very first paper or digital backup*.

    And most users will make a first backup.  The issue with the Satoshi client is that it requires a persistent backup solution, and end-users are very bad at that.

    There's some semi-frequent discussion on IRC about what the devs would like to do with the wallet, but I haven't heard any consensus.  However, I do know that they will be implementing deterministic wallets soon, too.  So regardless of disk corruption/failure protections, the Satoshi wallets may have the same only-one-time-backup-needed property.  But of course, it will probably be a while before they get around to it...

    *Exception: this isn't true if you imported keys after making the first backup. 

    full member
    Activity: 176
    Merit: 100
    May 05, 2012, 09:59:45 PM
    #23
    I think the fact that it is a database in the first place is a silly design decision.
    I couldn't agree more.

    Why can't we use a simple format?

    It's because the wallet is sensitive stuff, and one of the benefits of a database engine is the ACID/atomic operations:  it guarantees that data is written as intended or not written at all  to the database.  No matter what nanosecond the power goes out, it's supposed to be impervious to corruption.   When you're talking about private keys protecting millions of dollars, it's a very good idea to have atomic operations... but it comes with the downside that the database is a kind of blackbox and you don't always know what it's doing (hence the 0.4.0 wallet-not-actually-encrypted bug).

    In my wallet format, I created atomic operations using a backup file and some flag files that detect when corruption has happened and to be able to detect and restore an uncorrupted version automatically.  It's probably only 90-95% as effective as a real ACID/atomic database, but I've tested the heck out of it and it does work.   In the end, it's pretty rare that this logic would even be triggered because Armory doesn't keep the wallet file open.  It only does open-modify-close operations occasionally.

    I shared this experience with the devs, and while some of them thought it was interesting, their attitude was "let's not reinvent the wheel -- this is a solved problem, let's use an existing solution."  I understand that attitude.  But to me, the simplicity of the file format with 100% control over the data is worth every ounce of effort I put into it.

    So you have a solution which combines the safety and the simplicity of each option, yet it hasn't been implemented into the mainstream client?  I might have to make armory my main wallet if something similar never hits the main client.
    legendary
    Activity: 1428
    Merit: 1093
    Core Armory Developer
    May 05, 2012, 09:48:25 PM
    #22
    I think the fact that it is a database in the first place is a silly design decision.
    I couldn't agree more.

    Why can't we use a simple format?

    It's because the wallet is sensitive stuff, and one of the benefits of a database engine is the ACID/atomic operations:  it guarantees that data is written as intended or not written at all  to the database.  No matter what nanosecond the power goes out, it's supposed to be impervious to corruption.   When you're talking about private keys protecting millions of dollars, it's a very good idea to have atomic operations... but it comes with the downside that the database is a kind of blackbox and you don't always know what it's doing (hence the 0.4.0 wallet-not-actually-encrypted bug).

    In my wallet format, I created atomic operations using a backup file and some flag files that detect when corruption has happened and to be able to detect and restore an uncorrupted version automatically.  It's probably only 90-95% as effective as a real ACID/atomic database, but I've tested the heck out of it and it does work.   In the end, it's pretty rare that this logic would even be triggered because Armory doesn't keep the wallet file open.  It only does open-modify-close operations occasionally.

    I shared this experience with the devs, and while some of them thought it was interesting, their attitude was "let's not reinvent the wheel -- this is a solved problem, let's use an existing solution."  I understand that attitude.  But to me, the simplicity of the file format with 100% control over the data is worth every ounce of effort I put into it.
    full member
    Activity: 176
    Merit: 100
    May 05, 2012, 09:35:16 PM
    #21
    I think the fact that it is a database in the first place is a silly design decision.
    I couldn't agree more.

    Why can't we use a simple format?
    legendary
    Activity: 1428
    Merit: 1093
    Core Armory Developer
    May 05, 2012, 08:30:46 PM
    #20
    What do people think about other wallet formats?

    For reference, I went as far in the opposite direction as I could, when creating the Armory wallet format.  I hate the Satoshi wallet format as much as kokjo.  Armory uses a simple binary format, easy to read, and only two operations on it are ever used:  append, or overwrite-in-place-with-same-data-size.   I documented it here: 

    http://bitcoinarmory.com/index.php/armory-wallet-files

    I had two goals in mind when I made the wallet format:

    • I want 100% control of what happens in the wallet file.  Inspired by the wallet-not-actually-encrypted bug in 0.4.0
    • I want it to be dead simple for other developers to be able to read (and maybe modify) the wallet files

    There's quite a bit of extra wallet-management code to protect against corruption & errors, and enforce atomic operations, but that's in code -- it doesn't affect the simplicity for other developers to read the files.    The most important feature is that when I encrypt my wallet, the encrypted key is guaranteed to overwrite the original unencrypted key, which prevents any leaks happening when I back it up to Dropbox, etc.  Same with deleting data:  it's overwritten with zeros in-place.  I know the overwrite may not happen in-place on-disk, but there's nothing I can do about that -- at least when someone copies the wallet file from my HDD, the binary file will not have any surprises in it.
    [/list]
    legendary
    Activity: 980
    Merit: 1004
    Firstbits: Compromised. Thanks, Android!
    May 05, 2012, 06:00:55 PM
    #19
    I think the fact that it is a database in the first place is a silly design decision.

    Yep.
    vip
    Activity: 1386
    Merit: 1140
    The Casascius 1oz 10BTC Silver Round (w/ Gold B)
    May 05, 2012, 05:42:28 PM
    #18
    It seems to me your largest issue is usability. And I agree, there are many improvements possible for the end-user using a local bitcoin wallet. But the solution is supporting multiple wallets, improve efficiency, improving the user interface, simplify saving and restoring backups (tagged with for example the first block that would need to be scanned for incoming transactions), .... Adding an extra index may be viable right now, but I can't believe that such a requirement will be necessary for how bitcoin wallets will be used in the future.

    Can I suggest that the ability to maintain, open, and close multiple wallets at will is a compelling benefit for a typical end user that would justify the index?  Also would be the ability to import or sweep funds from private keys in non-exponential time from handheld bitcoin cash.  I mean, that's a pretty huge benefit: I can hand someone bitcoins on a QR code, and they can scan and sweep the funds, either some or all of them.  End users can "be their own bank" by printing their own cash at home, and I can tell a restaurant (like Meze Grill in NYC who recently refused my bitcoins due to difficulty accepting them) that all they have to buy is a $250 USB QR code scanner, they can be accepting home-printed bitcoin cash with the official bitcoin client in no time.

    If you give us this index with the official blessing of being a core feature, then others can add the rest of these features that depend on it so you don't have to be burdened with it.  Give us the framework, the infrastructure, and let others put in the effort of carrying it to the level of practical application.
    legendary
    Activity: 1072
    Merit: 1189
    May 05, 2012, 05:14:27 PM
    #17
    I believe what you are looking for is Electrum or another lightweight client, which keeps all that data and I assume fast rescan ability on a server.

    Actually, given that I have no real unmet needs for the way I transact, I am not really looking for anything, other than to bridge the gap between the mindset of the developers and the mindset of the average user that will be downloading the client.  This way, new users have a greater likelihood of saying "Aha this is what I was looking for", rather than "WTF I don't understand".

    Let me try to formulate things differently. I'm only talking about the Bitcoin reference client (its wallet) being used by end users. Using address-to-block indexes in fat servers that serve many thin clients, is obviously a good idea.

    But for an end user who is running the bitcoin software (for now with a full node, later perhaps in SPV mode when it is implemented), I see no reason to replace storing the transactions in the wallet by an on-the-fly scan through the block database (even with an extra index to speed it up). First of all, the endpoints (in particular the receiver) is ultimately responsible for having the transaction around: in case the transaction is not yet in the blockchain, sender and receiver of the transaction are those who will keep broadcasting them. In case of a reorganisation, a transaction may be lost and again the owners are responsible for keeping the transaction alive. Second, further developments with multisig transactions will require transactions being negotiated (which is non-trivial and requires many communication steps) before they can be published and mined into the chain.

    It seems to me your largest issue is usability. And I agree, there are many improvements possible for the end-user using a local bitcoin wallet. But the solution is supporting multiple wallets, improve efficiency, improving the user interface, simplify saving and restoring backups (tagged with for example the first block that would need to be scanned for incoming transactions), .... Adding an extra index may be viable right now, but I can't believe that such a requirement will be necessary for how bitcoin wallets will be used in the future.

    That said, I didn't say such an index is a bad idea - there are certainly uses (in particular when the nodes functions as a back-end for thin clients, or runs a large service). It's not a priority right now, but that can change. I just don't think it is the thing end users need now.

    legendary
    Activity: 1190
    Merit: 1004
    May 05, 2012, 04:31:52 PM
    #16
    What do people think about other wallet formats?
    vip
    Activity: 1386
    Merit: 1140
    The Casascius 1oz 10BTC Silver Round (w/ Gold B)
    May 05, 2012, 03:21:20 PM
    #15
    That requires the full blockchain (which is something that will get removed further and further from wallets in the future, imho), plus an extra additional index on top of it. Furthermore it misses any ability to store local data (address labels, accounts, comments, ...).

    I am not sure I see it the same way.

    I have proposed building an index on a set of data (the block chain) which is data maintained by the client, separate from the wallet.  The fact that down the road, that set of data maintained by bitcoind may be reduced (a partial block chain) has nothing to do with whether an index can be built upon it.  Especially when the index is only going to be referencing the portion of the block chain that will be kept not discarded.

    Importantly, a properly implemented index can always be thrown away and rebuilt, so if you ever change your mind in the future as to what the index should look like, or will be making a major change to how much or how the blockchain data is kept, the new client version can simply dump the index and rebuild it upon installation.

    Local data like address labels, accounts, and comments are fair game for the wallet file (the same way that if I yellow-highlighted cells in Excel, that that highlighting would be persisted in my .xls file when I went to save it).  They are the user's data, which is what a typical user would expect to be saved in a user file.

    I believe what you are looking for is Electrum or another lightweight client, which keeps all that data and I assume fast rescan ability on a server.

    Actually, given that I have no real unmet needs for the way I transact, I am not really looking for anything, other than to bridge the gap between the mindset of the developers and the mindset of the average user that will be downloading the client.  This way, new users have a greater likelihood of saying "Aha this is what I was looking for", rather than "WTF I don't understand".
    legendary
    Activity: 1072
    Merit: 1189
    May 05, 2012, 03:11:21 PM
    #14
    That requires the full blockchain (which is something that will get removed further and further from wallets in the future, imho), plus an extra additional index on top of it. Furthermore it misses any ability to store local data (address labels, accounts, comments, ...).

    I believe what you are looking for is Electrum or another lightweight client, which keeps all that data and I assume fast rescan ability on a server.

    I agree with you that we shouldn't need a lengthy rescan to switch wallets (afaik, we don't, but switching is far from how easy it should be), but the solution is adding multiple wallet support to the client. Using the blockchain as your transaction store may sound fun, but I don't think it's a viable end-user way of working in the future.
    vip
    Activity: 1386
    Merit: 1140
    The Casascius 1oz 10BTC Silver Round (w/ Gold B)
    May 05, 2012, 03:07:22 PM
    #13
    I don't understand the request to have wallets not contain transactions. You need transactions to create spends, which is what a wallet is for.

    Satoshis wallet design was clearly built with lightweight/SPV mode in mind, though the rest of it was never fully implemented. In that design you MUST store the transactions that are relevant to the keys in your wallet.

    In my mind, that need should be accommodated by the client maintaining an index that allows a rapid lookup of all of the transactions that are associated with any given hash160, directly from the block chain... the same way it already maintains an index of all unspent transactions.

    If it worked this way, then it would be trivial for a user to close one wallet (File - Close) and open another (File - Open), the same way I might close one spreadsheet and open another.  And commands like importprivkey would run rather instantly (or O(log n) to be specific).

    Right now, the idea that one must perform a lengthy "rescan" to switch to another wallet defies all common sense from the perspective of a typical user, and adds no useful benefit (except perhaps the non-consumption of the disk space that such an index would require).

    That index, by the way, doesn't have to be exhaustive to be effective... a hashtable index that had just 32 bits of key and 32 bits of block reference (at the expense of collisions causing occasional unnecessary reads of blocks) would still be highly useful for finding all relevant transactions given a set of addresses in a reasonable timeframe without being unduly large.
    legendary
    Activity: 1526
    Merit: 1134
    May 05, 2012, 02:28:31 PM
    #12
    I don't understand the request to have wallets not contain transactions. You need transactions to create spends, which is what a wallet is for.

    Satoshis wallet design was clearly built with lightweight/SPV mode in mind, though the rest of it was never fully implemented. In that design you MUST store the transactions that are relevant to the keys in your wallet.
    Pages:
    Jump to: