These are not just thoughts, but description of a working system that syncs entire blockchain and creates the above data structures in about half hour if you have enough bandwidth and 8 cores. It is bandwidth limited, so for a slow home connection of 20mbps, it takes 6hrs for full sync.
Bandwidth limited = I/O and cpu limited then?
We had a similar discussion in another thread, but anyway I want to write it more concisely here: I really believe scaling can be achieved through tradeoffs of what a user has and what he hasn't. Every machine will hit a bottleneck somewhere, the issue from that point onward is how to juggle the tradeoffs to bypass your bottleneck.
Examples:
-A user has a slow internet connection but plenty of CPU => He should have the option of downloading (and uploading - if he is running a node) very compressed blocks with other cooperating nodes that are willing to use compressed blocks.
-If the same user has slow CPU as well => He should be able to tap into GPU resources for parallel compression/decompression (it's feasible - I've seen pdf's with research on the subject of parallel compression on GPU).
-If a user has slow I/O but fast CPU => Compress the blockchain for more local throughput
-If a user has low RAM but fast CPU => Compress the RAM (for things like compressed cache)
-If a user has slow I/O or low RAM but will bottleneck on CPU if he goes to a compressed scheme => tap into GPU
-If a user wants to increase cache size to increase performance, but lacks ram => Use compressed RAM, tap into more CPU, to reduce I/O.
Essentially you try to trade what you have with what you want to achieve. One size fits all solutions won't work particularly well because one user has 1gbps connection, another has 4mbps. One is running on a xeon, the other is running on a laptop. One has 16gb memory, another has 2. One can tap into a GPU for parallel processes, another can't because he doesn't have one.
IMO the best scaling solutions will be adjustable so that bottlenecks can be identified in a particular system and make the necessary tradeoffs to compensate for deficiencies. Compression is definitely one of the tools that can balance these needs, whether one lacks RAM, disk space on their SSD, bandwidth on their network etc. And GPUs can definitely help in the processing task - whether the load is directly related to btc operations or compression/decompression.
The problem is that a lot of the data is high entropy and not compressible. I believe I have identified and extracted most of the redundancy. compressing the dataset is a onetime operation and doesnt really take much time to decompress.
The CPU is a bottleneck only if your connections is faster than 500megabits/sec, probably half that as that is without verifying the sigs yet, but there is plenty of idle time during the sync.
With the bandwidth varying, we have 30 minutes to 6 hours time to sync. The slower connection provides plenty of time for the CPU
I eliminated HDD as a bottleneck by eliminating most of the seeks and using an append only set of files direct into memory mappable data structures.
Each bundle can then be verified independently and a chain of the bundle summaries can then be used for a quick start, with the validity checked over the next 6hrs. Just need to not allow tx until it is fully verified, or limit the tx to ones that are known to be valid, ie fresh addresses getting new funds sent to it.
I use adaptive methods so it goes as fast as the entire system is able to. After all if you only have one core, then it will be the bottleneck to valiate 100,000 blocks of sigs. The way to have different "sizes" is to have a full node, validating node, truncated node (dump sigs after they are verified), lite nodes.
James