Question on general node performance

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

Why im asking this is because im wondering about the best way to go with global optimisation policy.

Is it best to focus on udp asynchronous io, keep the code simple, because anyway as you said blockchain is mostly sequential and there is no real room for core scaling and parallelisation.

Or is it best to focus on something that can have true robust handling of parallele processing , even if it doesn't seem all that useful in general use case.

Ive been watching the code for a few month already, im more on blackcoin than bitcoin, but the two are still similar, and I dont see that much place where parallelisation can make huge diff.

Outside of the ibd, which still not a minor issue imo because it's a big road block toward centralisation of big blockchain, because it take too much resources and time to keep running a full node at home. Specially for casual users.

And beyond this, there is the question of "blockchain processing speed" in general.

Also when dealing with many nodes, maybe there can be room for core scaling, but from my experience the best to handle message based io is asynchronous interrupt based io, it's much better than threading a blocking io or using non blocking sockets. And I dont see a very good way to scale the processing that much.

Well clearly checking block headers hashes from get header request could be scaled, computing tx hashes from a block too.

Tx processing it make sense if many tx has to be processed by batch, which involve increased latency in the processing, even with higher throughput, which im not sure is better vs lower latency in the processing to have tx processed as it come with lowest latency even if processing 100 tx one/one will take more time than waiting to have the 100 tx to process them in a batch.

For ibd clearly it make sense, because it doesn't matter if 2 year old blocks are processed with zero latency, so synching by batch to me anyway seem to give only benefit.

Cause the question to me is for eg I have 10M/s downlink, there is virtually infinite number of node that have the data, it should take 1h to synch. Why it takes 3 day ?? Cheesy

And even going in more advanced thinking of batch processing from multiple node instead of logic of syncing one block / one block im pretty sure there are things to do.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

Quote from: achow101 on February 06, 2017, 07:15:05 PM

Quote from: IadixDev on February 06, 2017, 07:09:04 PM

The only thing that is really sequential is the txins checking, and as far as i know, a tx in a block cant have input from a tx in the same block, so on a per block basis that shouldn't be a problem ?

A transaction can in fact have a parent transaction in the same block.

Then only the txins checking need to be done sequentially, can maybe try to find sequences of tx without child, not sure what would be the average, but this could be done without checking the scripts in a first pass, and then do the scaled script execution by batch. Script data is not influenced by the out come of previous transaction is it ?

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: IadixDev on February 06, 2017, 07:09:04 PM

The only thing that is really sequential is the txins checking, and as far as i know, a tx in a block cant have input from a tx in the same block, so on a per block basis that shouldn't be a problem ?

A transaction can in fact have a parent transaction in the same block.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

The only thing that is really sequential is the txins checking, and as far as i know, a tx in a block cant have input from a tx in the same block, so on a per block basis that shouldn't be a problem ?

Computing block headers hashes and tx hashes doesn't require previous block validation.

if all the new blocks to be batched are already in memory, the script data is also available , either from the previously stored blocks, or the new ones to be batch checked.

As long as all the data from the tx are in memory, or stored on hard drive ( validated) , the sig scripts can be checked in parallele no ? Outside of the initial data from the tx, which are available in memory and constant,they are self contained, the outscrig signature is constant as well as the txins data, so shouldn't be a pb for scaling.

The only thing is that it mean blocks have to be rejected by batches too, if one block in the batch have bad tx. But maybe not all the work spent on valid blocks can be wasted.

Blocks with bad headers hashes would be rejected in the first pass. Maybe last valid block can be returned.

After either being optimistic or pessimistic on ratio of bad block, can do certain check in firsts passes to potentially eliminate bad blocks quicker not to waste batch of long operations.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: IadixDev on February 05, 2017, 10:23:41 PM

To me it still look like some sort of batch processing could speed up ibd.

Like processing blocks by batch of 100

Pass 1 , threaded check of blocks headers hashes , and computation of tx hashes

Pass 2 non threaded check of merkkle root

Pass 3 check of tx in/out , potentially threaded on per block basis

Pass 4 threaded check of all transactions sig scripts

Pass 5 batch write of the blocks

I have in the idea it could improve synch time no ? Maybe im missing dependency issue, or something, or it already been thought of and tested, or it could miss something in the check, but to me it looks it could improve synch time quite signifciantly.

I don't think that would work. Blocks must be processed sequentially. A block can only be valid if the parent block (referenced in the block header) is also valid. So processing 100 blocks at a time means that you could be wasting processing power and time on invalid blocks.

Quote from: IadixDev on February 06, 2017, 09:28:26 AM

Processing sig scripts a priori i dont see why it wouldn't scale well. As far as i can tell scripts are self contained, they dont make reference to data outside of themselves. As long as public keys and sig and script data is loaded in memory from the tx, I dont see why the script checking wouldnt scale.

Scripts are not necessarily self contained. Checking the validity of a transaction means that you need both the input script of the input and the output script of the output it references. For signatures, you need the output script of the referenced transaction because that is part of the signing procedure. You can't check signatures without knowing the previous transaction.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

I dont know if accurate profiling data are available somewhere from a recent source tree , with the compiler version, boost version, options etc to have accurate informations on this.

If there is no such test done already, I May get into it

Im a programmer not a trader, im not too much into speculation, I prefer Turing Cheesy

And with threading and parallelisation with batch processing, if it scale well, it can divide total processing time , and db batch write is supposed to decrease a lot total processing time on 100 or 1000 writes.

Things like computing block headers hashes , tx hashes should scale very well.

Processing sig scripts a priori i dont see why it wouldn't scale well. As far as i can tell scripts are self contained, they dont make reference to data outside of themselves. As long as public keys and sig and script data is loaded in memory from the tx, I dont see why the script checking wouldnt scale.

If algorithm can be made for open cl for hash computation & sig check on 1000 block batch that would make substantial difference

not sure how the total script op code execution would fit with open cl, but for simple p2sh it could be worth it, à good proportion of transactions are p2sh or regular multi sig without complex script.

Merkkle root computation can't be scaled easily I think, the check on the txins can probably be scaled per block.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

I'm also interested in this.

Can I check the time it took to verify (and commit) each new block?

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

To me it still look like some sort of batch processing could speed up ibd.

Like processing blocks by batch of 100

Pass 1 , threaded check of blocks headers hashes , and computation of tx hashes

Pass 2 non threaded check of merkkle root

Pass 3 check of tx in/out , potentially threaded on per block basis

Pass 4 threaded check of all transactions sig scripts

Pass 5 batch write of the blocks

I have in the idea it could improve synch time no ? Maybe im missing dependency issue, or something, or it already been thought of and tested, or it could miss something in the check, but to me it looks it could improve synch time quite signifciantly.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

From what I can see, it doesn't seem there is threading for intra block transaction checking, or that it try to use avx or sse , there are Indeed internal function for 256 bits arithmetics, but it looks it's not especially made to exploit simd or parallelisation.

I will probably do some testing at some point, also to evaluate performance of storing engine , and see what take the more time with the block processing, and experiment with udp.

It doesn't look like it try to exploit db batch writing either which could probably have significant impact on synchronisation speed.

I saw a blog earlier they said they made test using Zlib or other to compress block and they were saying 20 to 30% gain.

https://bitco.in/forum/threads/buip010-passed-xtreme-thinblocks.774/

Datastream compression: Testing various compression libraries such as LZO-1x and Zlib have shown it is possible to further reduce block and transaction sizes by 20 to 30% without affecting response times which could also be applied to thinblocks.

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

Quote from: IadixDev on February 05, 2017, 07:29:25 PM

Quote from: achow101 on February 05, 2017, 06:56:26 PM

IIRC Bitcoin Core has stuff for benchmarking so you can see how long it takes for it to do certain things. You will need to use command line options to enable it though.

Where can I find more information on this ?

I was mistaken. It is not build into bitcoind/bitcoin-qt but rather a separate binary for benchmarking hashing, encoding/decoding base58, signing, verifying, and a few other things.

I'm pretty sure that most, if not all, of the things that you have mentioned have already been done in Bitcoin.

Quote from: IadixDev on February 05, 2017, 05:28:22 PM

Optimising signature check, most coin use openssl for signature, but there are other libraries like micro ecc or others , often made for embeded system, would could be potentially faster than openssl.

Bitcoin Core no longer uses OpenSSL for signing and verifying. Rather the Core devs wrote their own library which operates much much faster than OpenSSL. I believe the same thing was done for AES for wallet encryption.

Quote from: IadixDev on February 05, 2017, 05:28:22 PM

Block compression potentially could help too to reduce slightly download time.

Not really. Blocks don't really have a lot of repeated data that can be easily compressed as a large portion of blocks and transactions consist of hashes and signatures which are inherently random.

Stuff like Compact Blocks (and potentially XThin blocks, depending on certain things) makes block relay faster, but not IBD (initial block download).

As for the other stuff you mentioned, I can't really say as I have not looked into whether those are done.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

Also maybe having udp version of certain request could improve performance, as it's still in the over all message based protocol rather than stream based, and data integrity is already verified in the applications layer, not sure the impact it would have.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

Quote from: achow101 on February 05, 2017, 06:56:26 PM

IIRC Bitcoin Core has stuff for benchmarking so you can see how long it takes for it to do certain things. You will need to use command line options to enable it though.

Where can I find more information on this ?

achow101

staff

Activity: 3458

Merit: 6793

Just writing some code

IIRC Bitcoin Core has stuff for benchmarking so you can see how long it takes for it to do certain things. You will need to use command line options to enable it though.

IadixDev

full member

Activity: 322

Merit: 151

They're tactical

So I was wondering if any attempt has been made to profile node performance in the over all, and where bottleneck could be, and what has been attempted to improve it ?

Im not talking about transaction confirmation time, or protocol related things, but how fast node can process blocks, mostly in order to speed up synchronisation process, or node latency in general in serving the different requests.

Synchronisation time is still big problem, and from what I could see in quick tests, it doesn't seem to come mainly from raw download time, or network speed that much, so I was wondering what has already been thought about or tried to improve this.

In the few things id try it would be :

Optimising operations on hash, like at least hash comparison, possibly with sse or avx,as there is probably a good number of those being executed for each block process.

Asynchronous i/o, using asynchronous ( interrupt based) io I guess could speed things up signifiantly, as cpu can process data while the io operation take place, from my experience with video streaming it make a good difference.

Optimising storage engine, I already saw certain coin bitmonero tweaked with different kind of storage engine, either Berkeley db, level db, or other things, or even zero cache nodb disk storage like in purenode, wonder how much of a difference it really makes.

Parallelisation/threading , parallelizing as many things as possible to take full advantage of multi core architecture, it cant be applied to block processing because each block validation depend on the previous one , but for processing all transactions and signatures in a block, it could make a difference if total processing time is long enough to benefit from threading.

Optimising signature check, most coin use openssl for signature, but there are other libraries like micro ecc or others , often made for embeded system, would could be potentially faster than openssl.

Block compression potentially could help too to reduce slightly download time.

Saw about thin block but from what I can understand, it wouldnt change much on synchronisation time or request processing.

I guess that's it, ill add more if more things come to mind.

Also is there any profiling data available somewhere ?

Topic: Question on general node performance (Read 1031 times)