Pages:
Author

Topic: A proposal for a scalable blockchain. (Read 6232 times)

hero member
Activity: 815
Merit: 1000
June 15, 2012, 10:08:19 AM
#54
I like this proposal and I think it would work.

However to avoid miners printing money I think the snapshots should be larger than 2 weeks: If someone had 51% for two weeks (not THAT hard to imagine) he could basically rewrite ledger history.

Maybe make it 5 YEARS: Keeps the chain from becoming infinite, but is still quite safe + most inputs before such date should have been spent, so the ledger becomes smaller too.

Win-win.


Anyone from this thread like my swarm proposal? (other thread)
hero member
Activity: 910
Merit: 1005
June 15, 2012, 06:51:27 AM
#53
Up to date test re-run:

Quote
Building Test Ledger
Ledger generation took 128s
Number of unspent outputs 1053033
Original disk size 1716.7MB
Compressed Ledger size 71.9706MB
4.19238% of the original size

Since the last test the blockchain has more than doubled in size but the ledger has only increased by 20MB.

In my opinion this is the only way that bitcoin will scale. Merkel tree pruning is a dead end because not even the centralised nodes holding the full blockchain will be able to scale.

Keep variable number of blocks (Default ~2 weeks) unspent outputs in blocks older than that are compressed into a ledger.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Then I arrange a bunch of transactions under that key and unbalance your tree. Updates become O(N). Yuck.

I don't follow.  Or rather, I don't know why this is any more computationally inconvenient than a pure unspent-txout-list-merkle-tree.  Much of the tree would probably have to be recomputed anyway when you have hundreds of transactions removing and adding branches to the tree.  The only difference is that your main tree now only holds all "recipients" in stead of all unspent outputs, and each recipient has its own sub-tree of unspent outputs.   Either way, you have N unspent outputs that have to be represented in the tree, and updating 300 of them on a new block still requires substantial recomputations.  

In this case, each op requires updating a subtree and then the master tree after that.  If all unspent outputs were conentrated behind a single recipient, then we're back to the original proposal of unspent-TxOut-list trees, which was the basis for this thread.

Unless your point has to do with the general feasibility of maintaining such a tree.  Then it's a fair point, but I'm simply building on the original proposal (and its flaws).  

If one argument against it was that the 90% space savings isn't worth the tremendous complexity changes to the client and protocol, then my question is "is it worth revisiting it knowing that a change of similar complexity could save you 90% and give nodes the ability to verify balances without downloading the whole tree?"


(And bumping zombie threads? double yuck!)

Would you have preferred I started a new thread?  Then you could complain about the dozens of threads already out there discussing this general topic.  Plus, this isn't a "bump"... it's a legitimate expansion of the exact, original idea proposed in this thread.



donator
Activity: 853
Merit: 1000
watching
staff
Activity: 4284
Merit: 8808
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?

Then I arrange a bunch of transactions under that key and unbalance your tree. Updates become O(N). Yuck.   (And bumping zombie threads? double yuck!)
donator
Activity: 1736
Merit: 1014
Let's talk governance, lipstick, and pigs.
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?   By doing this, lite-weight nodes could retrieve the entire unspent-TxOut-list of an address by downloading only a couple kilobytes but still have the same security of a full-node!   
So now you are a lite-weight node getting onto the network for the first time.
Oh, and you'd get the benefit of blockchain pruning, too.  A small bonus...

Someone had an epiphany.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?   By doing this, lite-weight nodes could retrieve the entire unspent-TxOut-list of an address by downloading only a couple kilobytes but still have the same security of a full-node!     

Like the OP's generic merkle-tree-of-unspent-TxOuts, this would provide the same level of compression but additionally enable lightweight nodes to verify address balances within seconds.  That's because the leaf nodes of the ledger tree are actually lists of unspent outputs, instead of individual outputs (organized in sub-merkle-trees).

For convenience, I will use the vague term "recipient" to refer to the pub-key-hash160, or a P2SH script.  I'm sure a scheme could be created that takes into account arbitrary TxOut scripts and still have benefit of uniform distribution.  Here's how the "ledger" is created on a given block:

  • All nodes traverse the blockchain and collect unspent outputs (can be done on an already pruned tree to be updated)
  • The unspent outputs are sorted by their recipient, which can be done in linear time using bucket-sort (because the sorting keys are hashes which are uniformly distributed)
  • For each recipient all of its unspent outputs are put into a merkle sub-tree.
  • The sub-tree root for each individual recipient is put into a master merkle tree
  • The fingerprint of the blockchain on a given block is the master-merkle-tree root.

Therefore, each address/recipient is one leaf node of the master merkle tree, and the value of each leaf node is the merkle root of the unspent-TxOut-list for that recipieint.

There's plenty of discussion already about how this fingerprint is distributed, so we assume that all full nodes have it and agree on it for each recipient.  So now you are a lite-weight node getting onto the network for the first time.  You have a list of addresses ("recipients") to collect balances for.  

  • You download the headers (80 bytes each) plus the ledger-fingerprint at each block (32 bytes each).  Compute the longest chain.
  • For each address, you query your peers with the recipient string.
  • Each peer responds with the sub-merkle-root of that recipient, along with the entire merkle branch -- which should be 2*log(N) hashes, where N is the number of "recipients" in the blockchain with balance > 0.  If the blockchain has 100,000,000 addresses, this is about 2 kB worth of data.
  • You verify the merkle branch against the ledger fingerprint for this block.  You now know that the sub-merkle-root is valid.
  • You request the sub-merkle tree from your peers (which is just the unspent TxOut list for that address).
  • You compute the merkle-root of the unspent TxOuts and verifies it matches the sub-merkle-root previously verified.

As long as the ledger-fingerprint (which is just an enormous-merkle-tree root) is somehow included in the protocol, then lite-clients could import addresses and verify balances for only a couple kilobytes and with the same security as downloading the whole blockchain!  

Oh, and you'd get the benefit of blockchain pruning, too.  A small bonus...
staff
Activity: 4284
Merit: 8808
December 07, 2011, 09:24:31 AM
#47
If a miner did this, how would they verify all transactions? And if they don't verify transactions, how will they know what to include in blocks? If they include bad transactions in blocks then the block is bad and they just wasted a whole lot of time and electricity.

They don't include transactions they can't verify.  Their only risk is that someone else wasted a lot of time and electricity in the prior block they received to make them waste a lot of time and electricity.

It's also possible to do the opposite of what bytecoin suggests e.g. _don't_ sort the tree by the sizes, sort it by txn hash, so that you can't escape the space requirement of small transactions easily. (if you prune them out of the ledger you'll have to keep their hashes in order to be able to update the tree— so the space savings isn't great).   

In any case if the ledger is committed the pruning is less bad because someone who wants to spend a pruned input could provide the tree fragment themselves... though it's not clear to me how a miner could add new outputs to a size sorted tree for a size they weren't maintaining.

Another random point is that where the ledger is deterministic the bigger issue is attackers intentionally creating unbalanced trees, so you'd have to penalize txn outs which would attach at ugly (too deep) points.

hero member
Activity: 714
Merit: 500
December 06, 2011, 04:44:03 AM
#46
Gavin says CPU is the bottleneck right now,
recently i downloaded the whole blockchain again (it takes me almost whole day), and i see the harddisk light shining all along, and the CPU usage is 20%~30%.

Maybe some blk0001.dat reorganize will help?

hero member
Activity: 910
Merit: 1005
December 01, 2011, 10:43:55 AM
#45
Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.
If you're feeling keen could you please plot a cumulative distribution of TxOut value? Possibly on a log value scale?
I have a strong feeling that the vast majority of the ledger will be taken up with very small TxOuts, many of them vanity addresses.

The fact that a ledger system seems to result in reclaiming 90% of the disc space is encouraging. Now that many bitcoins have been exchanged, the space saving results of a ledger system are more obvious than when I first proposed it over a year ago

The fees system could be reworked to provide incentives for keeping the ledger small rather than the blockchain. Some transactions would result in ledger size decreasing and could perhaps be encouraged by refunding some previously paid fee.

As Gavin implies, all nodes would have to verify ledger integrity in the same way that block integrity is verified. If this were implemented, a couple of extra optimizations should accompany: the transaction signature shouldn't contribute to the hash and the signature should not be recorded in the block or ledger. This would result in further considerable space savings and pave the way for enabling transaction replacement and other advanced contract features.

Note that if the ledger is implemented as a hash tree and incoming transactions are incorporated into the tree according to a suitable algorithm then when a new block is found, each client can recalculate the ledger independently and hence the whole ledger need only be downloaded once.

ByteCoin

Here's the output filtering all output size <= 0.01BTC

Quote
Building Test Ledger
Ledger generation took 128s
Original disk size 750.153MB
Number of unspent outputs 733392 Ledger size 50.6512MB
6.75211% of the original size

To be honest I would have thought the saving would be greater. Regardless unless the official position is that bitcoin won't support transactions under 0.01BTC you couldn't filter them. The majority of space is taken up by transaction hashes and scripts so if there was a good way to compress these then you could reduce the size equally as well.

One possible scheme would be:
Order the ledger by transaction hash in ascending order. The hash is recorded as a var_int value between the difference of the previous hash.
Scripts are then stored at the end of the ledger, with duplicate scripts being written only once. The CTxOut then points to an script index instead of having the script written inline.

I will play around with some different formats.

Adjusting fees based on the ledger size effect is an excellent idea.


full member
Activity: 385
Merit: 110
December 01, 2011, 07:58:57 AM
#44
This is against the idea of bitcoin.

Since there is a limited supply of coins, at least in theory, some day a bitcoin could be worth 1000 dollars.

Thus 0.001 could be worth 1 dollar.

Do you really want to throw away dollars like that ? I don't think so Wink Smiley
sr. member
Activity: 249
Merit: 251
November 30, 2011, 11:11:42 PM
#43
On the other hand, similar pressures exist under the current system if block chain pruning were implemented. There would be a temptation to prune small unspent transactions to free up disk space.
If a miner did this, how would they verify all transactions? And if they don't verify transactions, how will they know what to include in blocks? If they include bad transactions in blocks then the block is bad and they just wasted a whole lot of time and electricity.
sr. member
Activity: 416
Merit: 277
November 30, 2011, 10:13:42 PM
#42
Interesting idea!

somebody should hack together a client or miner that uses a ledger system; it could refuse to relay or include any transactions with inputs smaller than 0.001 BTC, so it only needed a truncated ledger to create new blocks

Quite so. If the transactions were sorted in the merkle tree according to value, then a client (or miner) not interested in fiddling small change could stub off whole chunks of the tree. I agree that "There Doesn't Have To Be One Way To Do It" but one has to be careful - going too far down that route can result in TxOuts being shunned. If a majority of miners don't relay and don't accept transactions using the tiny TxOuts then such transactions take a long time to confirm - if at all.

I'm in two minds as to whether this scenario is natural optimisation at work or a bad thing.

On the other hand, similar pressures exist under the current system if block chain pruning were implemented. There would be a temptation to prune small unspent transactions to free up disk space.

ByteCoin
legendary
Activity: 1652
Merit: 2311
Chief Scientist
November 30, 2011, 10:01:12 PM
#41
Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.

Interesting idea!

Thinking out loud...  There Doesn't Have To Be One Way To Do It.  Piuk (or somebody) should hack together a client or miner that uses a ledger system; it could refuse to relay or include any transactions with inputs smaller than 0.001 BTC, so it only needed a truncated ledger to create new blocks (if it is a miner, maybe it connects to a trusted 'traditional' bitcoin node to make sure it only builds on valid blocks which might contain tiny inputs).

hero member
Activity: 714
Merit: 500
November 30, 2011, 09:52:27 PM
#40
Scalability is a problem. Any proposal should be welcome.  Embarrassed
sr. member
Activity: 416
Merit: 277
November 30, 2011, 09:30:15 PM
#39
Building Test Ledger
...
10.5081% of the original size

Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.
If you're feeling keen could you please plot a cumulative distribution of TxOut value? Possibly on a log value scale?
I have a strong feeling that the vast majority of the ledger will be taken up with very small TxOuts, many of them vanity addresses.

The fact that a ledger system seems to result in reclaiming 90% of the disc space is encouraging. Now that many bitcoins have been exchanged, the space saving results of a ledger system are more obvious than when I first proposed it over a year ago

The fees system could be reworked to provide incentives for keeping the ledger small rather than the blockchain. Some transactions would result in ledger size decreasing and could perhaps be encouraged by refunding some previously paid fee.

As Gavin implies, all nodes would have to verify ledger integrity in the same way that block integrity is verified. If this were implemented, a couple of extra optimizations should accompany: the transaction signature shouldn't contribute to the hash and the signature should not be recorded in the block or ledger. This would result in further considerable space savings and pave the way for enabling transaction replacement and other advanced contract features.

Note that if the ledger is implemented as a hash tree and incoming transactions are incorporated into the tree according to a suitable algorithm then when a new block is found, each client can recalculate the ledger independently and hence the whole ledger need only be downloaded once.

ByteCoin
hero member
Activity: 910
Merit: 1005
November 30, 2011, 03:45:52 PM
#38
If you want to prove me wrong, go and calculate the savings that this would currently provide, along with the savings of block pruning. Bonus points if you model out, simulate, and calculate this on projected future growth.

I wrote some test code to produce a vector of all unspent outputs (Basically the ledger).

Quote
#include
#include
#include
template < typename KeyType, typename MappedType,
typename Comp = std::less< KeyType > >
struct linked_map {
    typedef KeyType key_type;
    typedef MappedType mapped_type;
    typedef std::pair< const key_type, mapped_type > value_type;
private:
    typedef std::list< value_type >      list_type;
    typedef typename list_type::iterator list_iterator;
    struct compare_keys {
        Comp the_order;
        compare_keys ( Comp o )
        : the_order ( o )
        {}
        bool operator() ( list_iterator lhs, list_iterator rhs ) const {
            return ( the_order( lhs->first, rhs->first ) );
        }
    };
    typedef std::set< list_iterator, compare_keys > set_type;
    typedef typename set_type::iterator             set_iterator;
    list_type the_list;
    set_type  the_set;
public:
    typedef list_iterator iterator;
    typedef typename set_type::size_type size_type;
    linked_map ( Comp o = Comp() )
    : the_list()
    , the_set ( compare_keys( o ) )
    {}
    iterator find ( key_type const & key ) {
        value_type dummy_value ( key, mapped_type() );
        list_type  dummy_list;
        dummy_list.push_back( dummy_value );
        set_iterator where = the_set.find( dummy_list.begin() );
        if ( where == the_set.end() ) {
            return ( the_list.end() );
        }
        return ( *where );
    }
    iterator insert ( value_type const & value ) {
        list_type dummy;
        dummy.push_back( value );
        set_iterator where = the_set.find( dummy.begin() );
        if ( where == the_set.end() ) {
            the_list.push_back( value );
            list_iterator pos = the_list.end();
            -- pos;
            the_set.insert( pos );
            return ( pos );
        } else {
            (*where)->second = value.second;
            return ( *where );
        }
    }
    iterator erase ( iterator where ) {
        the_set.erase( where );
        return ( the_list.erase( where ) );
    }
    iterator begin ( void ) {
        return ( the_list.begin() );
    }
    iterator end ( void ) {
        return ( the_list.end() );
    }
    size_type size ( void ) const {
        return ( the_set.size() );
    }
    mapped_type & operator[] ( key_type const & key ) {
        iterator pos = insert( std::make_pair( key, mapped_type() ) );
        return ( pos->second );
    }
};

void buildTestLedger() {
    
    cout << "Building Test Ledger" << endl;
    
    float start = time(NULL);

    vector > vSortedByHeight;
    vSortedByHeight.reserve(mapBlockIndex.size());
    BOOST_FOREACH(const PAIRTYPE(uint256, CBlockIndex*)& item, mapBlockIndex)
    {
        CBlockIndex* pindex = item.second;
        vSortedByHeight.push_back(make_pair(pindex->nHeight, pindex));
    }
    sort(vSortedByHeight.begin(), vSortedByHeight.end());
    
    linked_map< COutPoint, CTxOut > unspent;
    
    long originalSize = 0;
    
    BOOST_FOREACH(const PAIRTYPE(int, CBlockIndex*)& item, vSortedByHeight)
    {
        CBlockIndex* pindex = item.second;

        CBlock block;
        
        block.ReadFromDisk(pindex);
      
        originalSize += GetSerializeSize(block, SER_DISK);
        
        BOOST_FOREACH(CTransaction & tx, block.vtx) {
            
            //Check each input and remove spent
            BOOST_FOREACH(CTxIn & in, tx.vin) {
               linked_map< COutPoint, CTxOut >::iterator it = unspent.find(in.prevout);
                
                if (it != unspent.end()) {
                    unspent.erase(it);
                }
            }
            
            int ii = 0;
            
            //Add each output to unspent
            BOOST_FOREACH(CTxOut & out, tx.vout) {
                COutPoint point(tx.GetHash(), ii);
                
                linked_map::iterator it = unspent.insert(make_pair(point, out));
                
                ++ii;
            }
        }
    }
        
    //Here you would write the ledger to disk
    
    float end = time(NULL);
    
    long ledgerSize = unspent.size() * (sizeof(COutPoint) + sizeof(CTxOut));

    cout << "Ledger generation took " << end - start << "s" << endl;

    cout << "Original disk size " << originalSize / 1024.f / 1024.f << "MB" << endl;

    cout << "Number of unspent outputs " << unspent.size() <<  " Ledger size " << ledgerSize / 1024.f / 1024.f << "MB" << endl;

    cout << (ledgerSize / (double)originalSize) * 100 << "% of the original size" << endl;
}


Sample output:

Quote
Building Test Ledger
Ledger generation took 128s
Original disk size 748.074MB
Number of unspent outputs 1212159 Ledger size 78.6083MB
10.5081% of the original size

+ You would need to hold a certain number of recent blocks, at least until all chain forks can be resolved or 2016 to calculate the difficulty target. Your probably looking at 85% reduction in disk size and the same in block validation time. I wouldn't even know where to begin calculating an for estimate for merkel pruning would you have to download the full merkel trees?

Edit: Come to think about it, assuming the requirement of a correct balance ledger was enforced you don't even need the latest full blocks. You could take the latest balance ledger from any chain head and download the block headers only to verify the proof of work. That way your looking at more than 85% reduction in chain size.
legendary
Activity: 1204
Merit: 1015
November 30, 2011, 12:37:32 PM
#37
The most obvious way to prevent this from being exploited is by incorporating it directly into the protocol as a requirement for block acceptance. Of course, this would initially require 50% of the miners in order to work, but that's standard procedure for adding new requirements to the protocol. It also might only need to be included in certain blocks, such as every difficulty change. However, that might make miners lazy and just not check the hash when it comes up, so it'd have to be included fairly commonly - likely every block. As for how it'd be checked, that's simple: it'd be independently calculated by everyone.

That being said, I don't really see what this would accomplish. The difference between this and block pruning would likely be small. People bootstrapping with this ledger still would need to at least check the block headers all the way back to the last checkpoint in order to verify that they are actually on the main chain and not a fake one that is only a few thousand blocks long that isn't actually attached to the main chain. The savings just aren't enough to justify this large overhead.

If you want to prove me wrong, go and calculate the savings that this would currently provide, along with the savings of block pruning. Bonus points if you model out, simulate, and calculate this on projected future growth.

There are only two major benefits I see this ledger proposal:
1) A client doesn't need to hold on to the merkle branch of all their unspent transactions to ensure that they will always be able to spend their coins, even if the miners don't (easily) have access to old merkle trees.
2) It is not possible for clients to provide over-pruned merkle trees (where they prune even unspent transactions) to new clients attempting to bootstrap.

On the plus side, this could be used as a low-certainty check (where this hash isn't require to be in blocks, nor is it required to be checked) so that new clients know that they have all of the unspent transactions and didn't receive over-pruned blocks. But at that point, it might just be easier to request this hash outside of the chain from trusted peers/multiple untrusted peers.
legendary
Activity: 1652
Merit: 2311
Chief Scientist
November 30, 2011, 11:29:59 AM
#36
A solution this problem would be to have miners include the hash of previous ledger and block height they believe to be correct.
So go implement it and see how well it works.

Create a little HTTP-based protocol with, oh, three methods:
  • You send a block height or block hash, you get back a ledger hash.
  • You send a ledger hash, you get back the full ledger or an "I have no idea what you're talking about, that's not a valid ledger hash".
  • You send two ledger hashes, you get back the changes from one to the other or an "I have no idea what you're talking about, one of those isn't a valid ledger hash".

Then you just need to convince a bunch of semi-trustworthy people to run "ledger servers." And maybe have some mechanism for reporting when a ledger server has a bug or 'goes rogue' and reports a ledger hash that is different from everybody else.

Oh, and you might need to solve the incentive problem of "why would I run a ledger server if I'm not paid for it" (and maybe write a bunch of denial-of-service-prevention code in case some jerk with a botnet decides to ask for 10,000 full ledgers from 10,000 different IP addresses).
hero member
Activity: 910
Merit: 1005
November 30, 2011, 11:17:17 AM
#35
I agree that there are better optimisations that can be implemented first but I still think it might be worth discussing for the future. Can you see any obvious flaws in the proposal Gavin?
Ummm, yes.

It seems to me miners will have an incentive to lie about the transaction ledger, and put fake ledger hashes in their blocks. Either so their transactions might be considered 'unspent' by unsuspecting nodes that trust them, or so that other miners that don't have the full block chain create invalid blocks (eliminate the competition!)

And I don't see a proposal that everybody check the ledger and reject blocks that contain invalid ledger hashes.

I also don't see what the ledger hash accomplishes.  If you're going to trust some other node's version of unspent-transaction-reality, then you could just ask "send me the ledger state before (or after) the block with THIS block hash".

But if you're going to trust one or more nodes anyway... then it seems to me sending an ever-increasing-in-size ledger is a bad way to get scalable. If size-of-full-blockchain becomes a problem before the mining pools and big exchanges/merchants/transactions processors all have transaction processing clusters with a terabyte of ram and petabyte hard drive array then I think extending the protocol to make it easy to request all transactions involved in a given Merkle branch will probably be the way to go.

But before then I expect the bitcoin network will look very different from the way it looks today, and I expect there will be several different solutions for how to scale up. If (when!) Bitcoin gets that successful, there will be serious money hiring the same smart people who figured out how to scale up PayPal and Visa.

It would be much simpler if enforcing a correct ledger hash could be done, but it would make adoption much more difficult as it would require 50% of all miners to upgrade at once.

A solution this problem would be to have miners include the hash of previous ledger and block height they believe to be correct. During the initial blockchain download the client would continue to download blocks until there at least one ledger can be agreed upon. Some kind of decaying time window would need to be implemented so if the majority of hashing power is agreed on one "ledger chain" a minority of clients cannot force a full blockchain download.

You don't have trust any node's version of unspent reality, you have to trust 50% of the hashing power's version of unspent reality - something which you kind of have to do anyway. Although the the consequences for a malicious entity gaining 50% hashing power would be more severe (though they would need to have 50% for a long time).

Skybuck: I'm not sure were on the same page. I wasn't really talking about a separate balance chain. The problem with a chain is its a chain so grows indefinatly thus block validation time, disk space, bandwidth requirements all grow indefinitely. At some point you should be able to look into the past and  say these transactions are no longer under contention and be able to archive/compress them.
Pages:
Jump to: