Author

Topic: Increasing speed of transaction with the Data Sharding Architecture? (Read 173 times)

member
Activity: 78
Merit: 28
Please do read up more on how sharding could be applicable. Good resources to look into would be articles by Vitalik Buterin, danksharding, and those stuff. These are complex topics and I wouldn't be able to give you as thorough of an explanation. Feel free to cite and raise any issues afterwards.

Could you, please, define what is "blockchain sharding" based on those good resources? Could you answer the question of the topic starter:

If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

member
Activity: 78
Merit: 28
If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

What you call "data sharding" has been already implemented in Bitcoin from the day one. In Bitcoin, transactions within one block often could be processed in parallel by a node.

Yes but there's a limit to how effective the concurrency is because of Amdahl's Law.

It basically says "The performance speedup in a program is limited by the amount of time that the parallelize-able part uses".

So in this case the methods that verify the transactions inside the block can work in parallel, but there is a host of other things inside that process that can't be sped up:

- Time spent obtaining the block from a peer (even if multiple peers are queried at the same time, the code running the block retrieval runs at least once)
- Writing the block and chainstate to LevelDB
- For each UTXO in the block, rewinding the transaction history to check that it's indeed not spent
- All of the checks involved in CheckBlock and CheckBlockHeader functions must ultimately be performed in one way or another, so there will always be a minimum resource consumption associated with it.

Sharding is not going to make any of this go faster, assuming that theoretically the system had enough threads to verify a single transaction in a block per thread, if all this stuff is already running in parallel.

Yes. Notice, your argument highlights the fact that blockchain network != database
All these points are relevant to blockchain networks. Databases don't necessarily have those steps and bottlenecks. Although blockchain network has a database under the hood, it's not everything. Some people oversimplify a blockchain network to a database which could be optimised by sharding. This viewpoint is inaccurate.

If some step of the process gets optimised, it doesn't mean the whole process gets optimised, especially, if optimisation of the one part goes ar the cost of other components. 
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
You mean Amdahl's Law does NOT apply to the blockchain?

If yes, then there is probability to increase the transactions or may be speed up the processing power and reduce the network difficulty so that we can more hashing power thus indirectly giving out more outputs of confirmations.

Am I right to think this way or this assumption is just going to the south pole? Since your statement and whatever NotATether has referred to, contradicts a lot.

Having latency due to addition of Data Sharding architecture is like calling for the roadblocks intentionally. This is based on the Amdahl's Law as explained by the NotATether. On the other hand there are different assumptions to the Sharding architecture by other members.

It seems the concept is either a mismatch or it has not been properly understood.
You have to read and understand the law yourself. That law encompasses the problem surrounding parallel computing and the likes of it. That is very different from sharding and how we can apply it to Bitcoin. You can think of this as being applicable to IBD when we improved the speedup through parallelization.

Confirmation or block intervals is NOT related to mining or anything similar. It is purely a design choice and can be improved albeit with tradeoff. You have to understand how sharding with blockchain really works to be able to understand what we are getting at and what we have to address before even thinking about adopting it.

Please do read up more on how sharding could be applicable. Good resources to look into would be articles by Vitalik Buterin, danksharding, and those stuff. These are complex topics and I wouldn't be able to give you as thorough of an explanation. Feel free to cite and raise any issues afterwards.
full member
Activity: 1092
Merit: 227
Yes but there's a limit to how effective the concurrency is because of Amdahl's Law.

It basically says "The performance speedup in a program is limited by the amount of time that the parallelize-able part uses".

So in this case the methods that verify the transactions inside the block can work in parallel, but there is a host of other things inside that process that can't be sped up:

- Time spent obtaining the block from a peer (even if multiple peers are queried at the same time, the code running the block retrieval runs at least once)
- Writing the block and chainstate to LevelDB
- For each UTXO in the block, rewinding the transaction history to check that it's indeed not spent
- All of the checks involved in CheckBlock and CheckBlockHeader functions must ultimately be performed in one way or another, so there will always be a minimum resource consumption associated with it.

Sharding is not going to make any of this go faster, assuming that theoretically the system had enough threads to verify a single transaction in a block per thread, if all this stuff is already running in parallel.
That's not entirely the concept behind sharding when we are talking about blockchain.

Yes, you are right if you want Bitcoin to work as is, but retrieving the blocks individually, checking all of them is computationally difficult and doesn't scale linearly. However, sharding works with the nodes having their groups of shards and storing them. You have fraud proofs, ZK-SNARKS and other detections as counters to penalize rogue actors within the system. If you were to use random sampling to sample the correctness of the blocks, you'll probably be able to ascertain that the blocks are available to the network and corresponds to the fraud proof. However, you need to evaluate the efficacy of the measures again.

The problem doesn't lie with the parallelization of it, but rather the tradeoff with security. Bitcoin doesn't exactly need to make that decision as of now.

You mean Amdahl's Law does NOT apply to the blockchain?

If yes, then there is probability to increase the transactions or may be speed up the processing power and reduce the network difficulty so that we can more hashing power thus indirectly giving out more outputs of confirmations.

Am I right to think this way or this assumption is just going to the south pole? Since your statement and whatever NotATether has referred to, contradicts a lot.

Having latency due to addition of Data Sharding architecture is like calling for the roadblocks intentionally. This is based on the Amdahl's Law as explained by the NotATether. On the other hand there are different assumptions to the Sharding architecture by other members.

It seems the concept is either a mismatch or it has not been properly understood.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
Yes but there's a limit to how effective the concurrency is because of Amdahl's Law.

It basically says "The performance speedup in a program is limited by the amount of time that the parallelize-able part uses".

So in this case the methods that verify the transactions inside the block can work in parallel, but there is a host of other things inside that process that can't be sped up:

- Time spent obtaining the block from a peer (even if multiple peers are queried at the same time, the code running the block retrieval runs at least once)
- Writing the block and chainstate to LevelDB
- For each UTXO in the block, rewinding the transaction history to check that it's indeed not spent
- All of the checks involved in CheckBlock and CheckBlockHeader functions must ultimately be performed in one way or another, so there will always be a minimum resource consumption associated with it.

Sharding is not going to make any of this go faster, assuming that theoretically the system had enough threads to verify a single transaction in a block per thread, if all this stuff is already running in parallel.
That's not entirely the concept behind sharding when we are talking about blockchain.

Yes, you are right if you want Bitcoin to work as is, but retrieving the blocks individually, checking all of them is computationally difficult and doesn't scale linearly. However, sharding works with the nodes having their groups of shards and storing them. You have fraud proofs, ZK-SNARKS and other detections as counters to penalize rogue actors within the system. If you were to use random sampling to sample the correctness of the blocks, you'll probably be able to ascertain that the blocks are available to the network and corresponds to the fraud proof. However, you need to evaluate the efficacy of the measures again.

The problem doesn't lie with the parallelization of it, but rather the tradeoff with security. Bitcoin doesn't exactly need to make that decision as of now.
full member
Activity: 1092
Merit: 227
If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

What you call "data sharding" has been already implemented in Bitcoin from the day one. In Bitcoin, transactions within one block often could be processed in parallel by a node.

Yes but there's a limit to how effective the concurrency is because of Amdahl's Law.

[snip]
Sharding is not going to make any of this go faster, assuming that theoretically the system had enough threads to verify a single transaction in a block per thread, if all this stuff is already running in parallel.

Do you want to say that if there is segregated chunks of nodes then a computer solving the hash that is about to get confirmed will need to aggregate all the chunks first and then solve it because without that the info would be incomplete?

So if we apply the Amdahl's law in that way then it could lead to slower processing or latency in the calculation. Thus if compared with the traditional block solving speed Vs Sharding data set, traditional might just win.

Obviously this we are talking on the level of nano seconds to pico seconds level. But on vast level on the blockchain it might have signifcant effect.

I liked the explantion of this the law Amdahl's given in the wikipedia. It does make sense in terms of what I am saying and NotATether explained.

Quote
Assume that a task has two independent parts, A and B. Part B takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this reduces the time of the whole computation only slightly. In contrast, one may need to perform less work to make part A perform twice as fast. This will make the computation much faster than by optimizing part B, even though part B's speedup is greater in terms of the ratio, (5 times versus 2 times).
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

What you call "data sharding" has been already implemented in Bitcoin from the day one. In Bitcoin, transactions within one block often could be processed in parallel by a node.

Yes but there's a limit to how effective the concurrency is because of Amdahl's Law.

It basically says "The performance speedup in a program is limited by the amount of time that the parallelize-able part uses".

So in this case the methods that verify the transactions inside the block can work in parallel, but there is a host of other things inside that process that can't be sped up:

- Time spent obtaining the block from a peer (even if multiple peers are queried at the same time, the code running the block retrieval runs at least once)
- Writing the block and chainstate to LevelDB
- For each UTXO in the block, rewinding the transaction history to check that it's indeed not spent
- All of the checks involved in CheckBlock and CheckBlockHeader functions must ultimately be performed in one way or another, so there will always be a minimum resource consumption associated with it.

Sharding is not going to make any of this go faster, assuming that theoretically the system had enough threads to verify a single transaction in a block per thread, if all this stuff is already running in parallel.
member
Activity: 78
Merit: 28
I was reading through a very interesting topic on Database Sharding that seems to be designed for handling huge datasets and distributing them to various channels so that single system doesn't have to overload itself.

What I understand from the concept is either you can distribute the data to various nodes in horizontal manner that is increasing the capacity of processing with more number of devices and also it has virtually no limit on how far you go.  Or one can simply start sharding vertically but in that you have to make the machine more powerful.

Such papers often confuse people. Before diving into this topic, you should understand that the database is not a blockchain network. "Database sharding" has nothing to do with "blockchain sharding". Think about this and don't get confused by plausible reasoning of prominent blockchain gurus.

Will it help in blockchain scaling?
I think the concept is pretty straight. If we are already using...

No, it doesn't work that way.

So if we do apply the horizontal sharding to the blockchain generated nodes then each node can further be divided into more data sets / nodes and thus processing could be accelerated based on more "machines" will be working on same problem.

No, it doesn't work that way. Let's put aside decentralized blockchain networks and consider original "database sharding". Database sharding is not always an improvement. It's a trade-off that is helpful only in certain cases in certain circumstances. When you split data into "shards", split processes into threads and distribute everything between separate hardware devices you might get some benefit from parallelism. However, you often get an overhead created by managing concurrent threads and associated delays. Sometimes this overhead is so huge that gains from parallelisation can't outperform it.

Therefore, database sharding is often a trade-off. It might be acceptable and it might be unacceptable. Database sharding is not a magic stick which solves all problems of databases. Neither it's not a magic stick in the context of decentralised blockchain networks. On top of that, as I said, the concept "database sharding in blockchain context" is not clear and straightforward.

If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

What you call "data sharding" has been already implemented in Bitcoin from the day one. In Bitcoin, transactions within one block often could be processed in parallel by a node.
full member
Activity: 1092
Merit: 227
Ok this means data sharding is one of the good architecture for data distribution but the only fact that is not letting us use it us due to "invalidation of tamperproof concept" as mentioned by @Findingnemo.

Hmm, interesting and makes complete sense. Anyway this was just raw concept that was playing back of the mind while reading the Database Sharding. Since it had concept of distributing the nodes into various chunks and then solving the problems in the parts made me think this could be something that is applicable for the blockchain and thus scaling the network issues.

Your forget one important thing. If a node only hold a shard (part of the blockchain), that means the node couldn't perform full verification. Without reward to run shard node, most people would keep using SPV wallet while those who care about decentralized would simply keep running full node.

If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

Few altcoin claim they use sharding to "tackle" scaling problem. But AFAIK only Ethereum being serious about implementing sharding[1].

[1] https://eips.ethereum.org/EIPS/eip-4844
[2] https://github.com/ethereum/pm/blob/master/Breakout-Room/4844-readiness-checklist.md#client-implementation-status

Yes I am also looking at project named NEAR who claims to have developed the Nightshade and it is nothing but the NEAR sharding architecture. I am wondering how they are playing their blockchian with it and making transactions more faster and congestion free.

I don't know, but they claim that their blockchain can literally "handle the millions of users and transactions"

Though I am limited by my knowledge about how this technology actually works but they have bunch of Sharding techniques that are being used to preform the transactions.

I am refering this from the NEAR technical paper here: NIGHTSHADE
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
As mentioned, confirmation has nothing to do with data sharding.

Data sharding is great if you can guarantee the authenticity of the data and there is no risk of anyone trying to defraud you. Data sharding is quite complex by nature and you need fraudproofs to verify that the data that you're having is valid and correct. You also run into the issue of the security being lower than what we have right now, fraud proofs are not exactly very well implemented so far.

Even if you were to look at Ethereum, their shard scheme is still far from completion: https://eips.ethereum.org/EIPS/eip-4844.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
Your forget one important thing. If a node only hold a shard (part of the blockchain), that means the node couldn't perform full verification. Without reward to run shard node, most people would keep using SPV wallet while those who care about decentralized would simply keep running full node.

If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

Few altcoin claim they use sharding to "tackle" scaling problem. But AFAIK only Ethereum being serious about implementing sharding[1].

[1] https://eips.ethereum.org/EIPS/eip-4844
[2] https://github.com/ethereum/pm/blob/master/Breakout-Room/4844-readiness-checklist.md#client-implementation-status
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
The so-called "centralized bottleneck" people are talking about in Bitcoin is actually each of the nodes trying to verify the transactions for themselves.

As it has been said and rehashed on this board, there only three viable options for increasing transaction speed:

One, decrease the block generation time, but this will destroy the fee economy and put miners out of business, sending BTC into a death spiral

Two, increase the block size, but this will also destroy the fee economy like (1) and it does not decrease confirmation time, only throughput

Three, offload accounts and transactions to a second layer (like Lightning Network), which the majority of people can use and have its state reflected in that network, which is then translated into BTC transactions at a throughput appropriate for Layer 1.
hero member
Activity: 2366
Merit: 793
Bitcoin = Financial freedom
Individual full blockchain nodes are very important to keep bitcoin decentralized and tamperproof so what you are suggesting is completely against the concept of bitcoin.

FYI, You cannot increase the speed of Bitcoin transactions by reducing the overload on nodes cause nodes have nothing to do with confirmation time since it totally depends on the miners so technically you can increase the transaction speed by increasing the hash rates or increasing the block size.
full member
Activity: 1092
Merit: 227
I was reading through a very interesting topic on Database Sharding that seems to be designed for handling huge datasets and distributing them to various channels so that single system doesn't have to overload itself.

What I understand from the concept is either you can distribute the data to various nodes in horizontal manner that is increasing the capacity of processing with more number of devices and also it has virtually no limit on how far you go.  Or one can simply start sharding vertically but in that you have to make the machine more powerful.

Will it help in blockchain scaling?
I think the concept is pretty straight. If we are already using huge data or collecting enormous data and perform the confirmations over blockchain then the nodes that are created getting filled quickly.
So if we do apply the horizontal sharding to the blockchain generated nodes then each node can further be divided into more data sets / nodes and thus processing could be accelerated based on more "machines" will be working on same problem.

If Data Sharding is already on the blockchain then how do we know that it is already implemented? Is there any example of such implementation to relieve the Blockchain Burden?

I initially thought that Lightening Network is working on the similar processing however it seems that if Database sharding is implemented then it would be Layer one technology while LN is Layer two blockchain and works "off-chain"


Quote
What is database sharding?
Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system. See more on the basics of sharding here.

Similarly, by distributing the data across multiple machines, a sharded database can handle more requests than a single machine can.

Sharding is a form of scaling known as horizontal scaling or scale-out, as additional nodes are brought on to share the load. Horizontal scaling allows for near-limitless scalability to handle big data and intense workloads. In contrast, vertical scaling refers to increasing the power of a single machine or single server through a more powerful CPU, increased RAM, or increased storage capacity.
Jump to: