Pages:
Author

Topic: Looking into forking the core wallet to use parallel computing to verify blocks (Read 3542 times)

sr. member
Activity: 467
Merit: 267
I'm not using bitcoin core and my code is performing cross block full validation. In fact, Bitcoin core is taking longer without doing validation.
I know my desktop is not underpowered but 24h+ is not the same as 5 mn. Just saying that validation is not such a big deal.

legendary
Activity: 1512
Merit: 1036
Just signature and Merkle tree analysis of the whole blockchain (which is disabled by checkpoints) would take ~24+ hours of modern CPU processing power. These need to be performed by every user in a trustless environment. Optimizations are beneficial to every user, whether by utilization of available hardware resources, algorithmic shortcuts, or even hand-tuned x64/SSE assembly code routines.

It isn't so bad. On a i7 desktop ivy bridge 8-core machine, I could validate all the scripts up to block #295000 in just about 5 1/2 mn.

Code:
Succeeded #36249675
Failed #0
Elapsed 334s

I don't have a more recent bootstrap file, but by interpolating it shouldn't take more than 10 mn to do the complete chain.


Verification is disabled up to the most recent checkpoint, block 295000. The hash and headers leading to it is known trustworthy, so signature checking is turned off. That's what I said that you just quoted. If you remove all but the first checkpoint from chainparams.cpp and recompile, you will find a quite different result.

(295000, uint256("0x00000000000000004d9b4ef50f0f9d686fd69db2e03af35a100370c64632a983"))

BTW, you have 4 cores with hyperthreading, CPU-Z your machine. Still, a $300+ CPU is not the lowest bar.
sr. member
Activity: 467
Merit: 267
Just signature and Merkle tree analysis of the whole blockchain (which is disabled by checkpoints) would take ~24+ hours of modern CPU processing power. These need to be performed by every user in a trustless environment. Optimizations are beneficial to every user, whether by utilization of available hardware resources, algorithmic shortcuts, or even hand-tuned x64/SSE assembly code routines.

It isn't so bad. On a i7 desktop ivy bridge 8-core machine, I could validate all the scripts up to block #295000 in just about 5 1/2 mn.

Code:
Succeeded #36249675
Failed #0
Elapsed 334s

I don't have a more recent bootstrap file, but by interpolating it shouldn't take more than 10 mn to do the complete chain.
legendary
Activity: 1512
Merit: 1036
But isn't massive parallel computing precisely what the Bitcoin network as a whole does? Why parallelize (SIC) locally?
Just signature and Merkle tree analysis of the whole blockchain (which is disabled by checkpoints) would take ~24+ hours of modern CPU processing power. These need to be performed by every user in a trustless environment. Optimizations are beneficial to every user, whether by utilization of available hardware resources, algorithmic shortcuts, or even hand-tuned x64/SSE assembly code routines.
sr. member
Activity: 467
Merit: 267
It's decentralized, not parallel. Every full node has to verify the whole blockchain.
full member
Activity: 173
Merit: 101
But isn't massive parallel computing precisely what the Bitcoin network as a whole does? Why parallelize locally?
sr. member
Activity: 467
Merit: 267
It's an old video card and there are a few improvements that I can see. It's a baseline.
L
staff
Activity: 4284
Merit: 8808
Phase 1 & 3 are on CPU. Phase 2 is on GPU. 0.067733 ms per verification.
I have a GTX 560 Ti.
So thats 14763 per second?  Thats not favourable compared to running on the cpu but there may be many optimizations still available for you.
legendary
Activity: 3920
Merit: 2349
Eadem mutata resurgo
Quote
I made an openCL version of sipa's secp256k1 library. It's on github if someone is interested.

Link?

AMD's probably be better then Nvidia if there a lot of integer steps.
sr. member
Activity: 467
Merit: 267
I made an openCL version of sipa's secp256k1 library. It's on github if someone is interested. It executes faster but only after you throw hundred of thousands of signatures to verify.
On a sample test run
# of sigs = 262144

Times in ms per phase.

prepare -> 2518
compute -> 15201
check   -> 35
total   -> 17755
           0.067733

Phase 1 & 3 are on CPU. Phase 2 is on GPU. 0.067733 ms per verification.
I have a GTX 560 Ti.
sr. member
Activity: 467
Merit: 267
Do you have recommendations?

My current thinking is that the best bang for the buck is parallalizing the two group multiplications (by G and Q).
They use the field doubling and multiplication that were optimized in asm and they must be put on the device.
The rest that uses GMP isn't very much. It's a modular inverse and a few mult/add. It barely registers on my execution profile.

Thanks
staff
Activity: 4284
Merit: 8808
Could you rephrase that slightly? I am not sure if you mean it will work or,

There is no speed up over what is already implemented? or,
The Bitcoin Core already sneaks some code onto the GPU, and is already fully implemented? or,
The GPU would be of no advantage, the most parallel it can become can be ran on a 2+ Core CPU as fast as possible (High step complexity overall)?
It is already parallel, it does not use the GPU.  Using the GPU effectively will be difficult because there is not that much parallelism available unless you are verifying multiple blocks at once, and communicating with the GPU has huge latencies.  Competing with state of the art cpu code will take state of the art GPU code (my own vanitygen code on the cpu is roughly as fast as the published gpu vanitygen).

I think you should try it, a port of libsecp256k1 to the OpenCL would be pretty interesting and useful. If it were faster we'd probably be willing to do the architectural changes (making it possible to verify multiple blocks at once) to enable using it.
full member
Activity: 160
Merit: 100
Bitcoin core's verification is already parallel.  But there isn't as much parallelism to extract as you might guess, not without extensive architectural changes (e.g. pipelining multiple blocks and accepting partially verified ones). You should not, however, have been seeing it using only one cpu... in fact windows users have frequently complained that it must be broken because it was using all of their cores for minutes at a time. Smiley


Could you rephrase that slightly? I am not sure if you mean it will work or,

There is no speed up over what is already implemented? or,
The Bitcoin Core already sneaks some code onto the GPU, and is already fully implemented? or,
The GPU would be of no advantage, the most parallel it can become can be ran on a 2+ Core CPU as fast as possible (High step complexity overall)?
staff
Activity: 4284
Merit: 8808
Bitcoin core's verification is already parallel.  But there isn't as much parallelism to extract as you might guess, not without extensive architectural changes (e.g. pipelining multiple blocks and accepting partially verified ones). You should not, however, have been seeing it using only one cpu... in fact windows users have frequently complained that it must be broken because it was using all of their cores for minutes at a time. Smiley
full member
Activity: 179
Merit: 151
-
Interested in this also. Maybe begin looking at how parallelisable sipa's libsecp265k1? ...which is basically the crypto verifying engine.

https://github.com/bitcoin/secp256k1

You should be using libsecp256k1 if you are doing verification. It is trivially parallelizable in the sense that you can verify many signatures at once...however, it does not do batch verification (it is not obvious that this is even possible for ECDSA since there is a sign ambiguity in the k value). There are some coming improvements but even today it is something like 6x faster than openssl.
legendary
Activity: 4214
Merit: 1313
As an academic exercise this is an interesting discussion.  From a coding perspective too. I did some code for a hyper-cube (and some other highly parallel machines) in grad school and love the area.
legendary
Activity: 3920
Merit: 2349
Eadem mutata resurgo
Interested in this also. Maybe begin looking at how parallelisable sipa's libsecp265k1? ...which is basically the crypto verifying engine.

https://github.com/bitcoin/secp256k1
member
Activity: 100
Merit: 10
Very intresting conversation! I'll follow up.
full member
Activity: 160
Merit: 100
It's not so much that they are far away from each other (though they are), it's the fact that they can be that hurts parallelizability because you need access to a lot of data. The way to handle this is to maintain an "output set" of all unspent outputs; you add every output for each transaction and delete every output that is referenced in each transaction. Maintaining this data structure (which requires random insert/delete access in roughly equal and enormous proportions, which is a nasty data structure problem ... not sure you can do better than a hashmap) is why you have to go through the song and dance I described.

To see what's involved with verifying the blockchain, you should check out my high level overview and also check out the Bitcoin Wiki pages on transactions and Script. It's a bit involved.


Ahh I see. So I might launch a kernel to verify a block. I would have to pass the block data, along with a pointer to an array of unspent coins in device memory, so that it can do this referencing "[sic]up and down" the chain. The issue is though that the RW bandwidth to the device memory would be massive. You would have to do a lookup on it once per unique address in the blockchain, because at some point all coins are unspent.? (is that correct logic).

To get around this you propose to verify all blocks in parallel, and write any unspent coins for each block into memory. Then do a second pass and verify for each unspent coin that it has some link in the blockchain? Then remove that unspent coin from the list, and if the list is empty after the algorithm runs then the chain is verified?

// I read a lot of the wiki in the past, I need to get back to it. Thanks for the reminder!

Pages:
Jump to: