Author

Topic: Creating semi-nodes to sustain the network (Read 519 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
I'm not sure as to how this would change if the node communicates over Tor instead but I supposed it does alleviate this scenario somewhat.

The connection would be encrypted with Tor protocol, but leave vulnerability on exit node if you're connected to node with transparent IP.
It's different case if you're connected to another node with .onion address.

that was partly what i had in mind when i asked the question above and it could become concerning, but since nothing is going on against bitcoin in countries that such companies as Amazon are located in nobody worries about these things. which is probably why BIPs such as the one you mentioned aren't pursued either.

Maybe, but it doesn't change the fact they have capability of monitoring your traffic, which might be used for some bad/controvesial purpose.
legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
reading the previous comment makes me wonder whether these VPS providers like Amazon,... have the capability to alter anything about the blocks/transactions before they give it back to the user. for example if the user who is running his node on an Amazon server could see his coins spent in a fake transaction that the VPS shows confirmed in a fake block?

They can't because the Linux VPSs they offer aren't authenticated with a password but with a private key that can only be downloaded once, when the VPS is created. Also they can't recover said private key if you lose it, you'd have to make a new one.

CMIIW, but they don't need to interact with the VPS directly since they can perform MITM attack and AFAIK Bitcoin Core connection isn't encrypted (there's BIP 151 to handle encryption, but it's withdrawn).
legendary
Activity: 2128
Merit: 1293
There is trouble abrewing
CMIIW, but they don't need to interact with the VPS directly since they can perform MITM attack and AFAIK Bitcoin Core connection isn't encrypted (there's BIP 151 to handle encryption, but it's withdrawn).

that was partly what i had in mind when i asked the question above and it could become concerning, but since nothing is going on against bitcoin in countries that such companies as Amazon are located in nobody worries about these things. which is probably why BIPs such as the one you mentioned aren't pursued either.
legendary
Activity: 3038
Merit: 4418
Crypto Swap Exchange
reading the previous comment makes me wonder whether these VPS providers like Amazon,... have the capability to alter anything about the blocks/transactions before they give it back to the user. for example if the user who is running his node on an Amazon server could see his coins spent in a fake transaction that the VPS shows confirmed in a fake block?

They can't because the Linux VPSs they offer aren't authenticated with a password but with a private key that can only be downloaded once, when the VPS is created. Also they can't recover said private key if you lose it, you'd have to make a new one.

CMIIW, but they don't need to interact with the VPS directly since they can perform MITM attack and AFAIK Bitcoin Core connection isn't encrypted (there's BIP 151 to handle encryption, but it's withdrawn).
It isn't. That's why ISPs are potentially the ones that could conduct sybil attacks on Bitcoin nodes. There are attacks which service providers could try to execute, with or without private key authentication.

I'm not sure as to how this would change if the node communicates over Tor instead but I supposed it does alleviate this scenario somewhat.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
reading the previous comment makes me wonder whether these VPS providers like Amazon,... have the capability to alter anything about the blocks/transactions before they give it back to the user. for example if the user who is running his node on an Amazon server could see his coins spent in a fake transaction that the VPS shows confirmed in a fake block?

They can't because the Linux VPSs they offer aren't authenticated with a password but with a private key that can only be downloaded once, when the VPS is created. Also they can't recover said private key if you lose it, you'd have to make a new one.
legendary
Activity: 2898
Merit: 1823
reading the previous comment makes me wonder whether these VPS providers like Amazon,... have the capability to alter anything about the blocks/transactions before they give it back to the user. for example if the user who is running his node on an Amazon server could see his coins spent in a fake transaction that the VPS shows confirmed in a fake block?


Anything Amazon can alter, if it's even possible, will be validated by the other nodes, and anything invalid/not following the rules will not be relayed.

Amazon can censor "your node" though.
legendary
Activity: 2128
Merit: 1293
There is trouble abrewing
reading the previous comment makes me wonder whether these VPS providers like Amazon,... have the capability to alter anything about the blocks/transactions before they give it back to the user. for example if the user who is running his node on an Amazon server could see his coins spent in a fake transaction that the VPS shows confirmed in a fake block?
legendary
Activity: 2898
Merit: 1823

Using data center (along with VPS) is somewhat common practice when setup full node client due to lower costs, we only can hope that not everyone use same hosting provider (e.g. AWS and DigitalOcean).
Well, Chlotide's reply sounds more like talking about the centralization of nodes hence my reply:

~
Indeed, there is the possibility that nodes will be held in the future only in data centers and the network would rely mostly on big data companies. Who knows ?
Maybe they will launch 1000 blockstream satellites, maybe Elon installs a node on his 40.000 satellites
~


Plus it actually defeats the purpose of running a node. Like "Not your keys, not your coins" meme, in this instance, it's "Not your hardware, not your node". You can be censored by Amazon/DigitalOcean.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I asked some friends about accessing paper behind paywall and they introduce me to Sci-Hub, so i managed to read the paper. Some things that i notice :
1. They mention they use standard  PC with specifications: Intel (R) Core (TM) i5-5200U CPU @ 2.20GHz (4 CPUs),  NVIDIA  Geoforce  840M,  256GB  SSD  memory,  8GB  RAM, Windows 10 Pro 64 bit & python compiler version 2.7. Unfortunately they don't mention time or resource usage when they perform compression.
2. There's no mention about time/resource usage on decompression and indexing in case you run full node and need to share block with other node.
3. On chapter "Experimental Result", they mention they tested the compression technique without keeping smart contract feature.

IMO i doubt it's practical, at least without improvement and further testing.

Yes that's the paper, I was hesitant to link the sci-hub page here because I wasn't sure of your point of view regarding the site.

Most research papers are rough drafts of ideas that need a lot of refinement to become practical. If someone were to make a compression of the blocks they would need to take into account the compression time and memory usage, because in sophisticated compression algorithms, the compression memory usage is several times larger than the decompression memory usage. It would limit the machines that can run a full node to the ones which have enough memory for block compression.
legendary
Activity: 3472
Merit: 10611
Say you want to import a private key and the blockchain has to be rescanned - wouldn't that lead to the decompression of all blocks, leading to a VERY long rescan time?

it depends on the method of compression that was used. usually you still save the same raw bytes in a different format like what i previously explained. in which case there isn't any difference in time for rescanning since for example if you import 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 (address from wiki) then instead of searching for 76a91477bff20c60e522dfaa3350c39b030a5d004e839a88ac script among outputs in each block you'll search for XX77bff20c60e522dfaa3350c39b030a5d004e839a where XX is a single byte saying this is a P2PKH script.
legendary
Activity: 1134
Merit: 1599
no because it is not changing anything about the protocol. and what you do locally can be done to everything that you want from block zero to any other future blocks that are created. there really isn't any limitations. it could be performed on already stored on-disk blocks and transactions and it could also be performed on blocks and transactions that are sent between nodes.
Oh, that makes sense. Thanks Smiley



I asked some friends about accessing paper behind paywall and they introduce me to Sci-Hub, so i managed to read the paper. Some things that i notice :
1. They mention they use standard  PC with specifications: Intel (R) Core (TM) i5-5200U CPU @ 2.20GHz (4 CPUs),  NVIDIA  Geoforce  840M,  256GB  SSD  memory,  8GB  RAM, Windows 10 Pro 64 bit & python compiler version 2.7. Unfortunately they don't mention time or resource usage when they perform compression.
2. There's no mention about time/resource usage on decompression and indexing in case you run full node and need to share block with other node.
3. On chapter "Experimental Result", they mention they tested the compression technique without keeping smart contract feature.

IMO i doubt it's practical, at least without improvement and further testing.
They missed mentioning exactly the most important parts then. Cheesy I think it'd be fine if compression takes long as soon as it works and the blockchain size decreases substantially. The problem might come with decompression though. Say you want to import a private key and the blockchain has to be rescanned - wouldn't that lead to the decompression of all blocks, leading to a VERY long rescan time?
legendary
Activity: 3472
Merit: 10611
But doesn't that mean that only future txs will be compressed this way? Could all the previous ones be compressed too without having to redo all the work?

no because it is not changing anything about the protocol. and what you do locally can be done to everything that you want from block zero to any other future blocks that are created. there really isn't any limitations. it could be performed on already stored on-disk blocks and transactions and it could also be performed on blocks and transactions that are sent between nodes.
legendary
Activity: 1134
Merit: 1599
78.104% space saving sounds good, but IMO it's still too big for many smartphone these days.
True. With a 500 GB blockchain, 78% space saving still means 110GB to store. It's not a bad idea as it'd allow smaller storage devices (cheap laptops with 256GB SSD, for example) to become a full node without having to compromise almost the entire hard disk, but it still wouldn't allow most smaller devices such as smartphones to sustain the network. The thing is, our PCs are only turned on while we're at home while most keep their smartphones turned on 24/7, which would make many semi-nodes/full-nodes run almost non-stop as soon as their phones have internet connection.



i believe most ideas depend on the fact that we can safely remove a big chunk of data from each transaction while still knowing what those bytes were. for example we don't have to store OP_DUP, OP_HASH160 0x14 OP_EQUALVERIFY OP_CHECKSIG in a P2PKH output (they are currently 60% of daily transactions). all we have to do is to add a single byte saying this is a P2PKH output so we ca replace 5 bytes with 1.
or every coinbase transaction input is byte[32] and -1 with 630k blocks that is 22.68 GB saved!
these 2 examples have virtually zero computation cost.
But doesn't that mean that only future txs will be compressed this way? Could all the previous ones be compressed too without having to redo all the work?
legendary
Activity: 3472
Merit: 10611
I don't know about compression type used on the paper, but wouldn't it introduce trade-off between storage (HDD/SSD), memory (RAM) and computational power (CPU).
78.104% space saving sounds good, but IMO it's still too big for many smartphone these days.

it depends on the methods that were used in such compression techniques. i wasn't able to read the paper linked here since it requires signup/payment but usually most ideas are pretty simple and they don't really require that much computing power (eg. removing the useless DER encoding from signatures) but some ideas could require more computation but still a small one (eg. using compressed public keys everywhere but with a flag indicating the y was dropped for those transactions that had uncompressed pub keys).

i believe most ideas depend on the fact that we can safely remove a big chunk of data from each transaction while still knowing what those bytes were. for example we don't have to store OP_DUP, OP_HASH160 0x14 OP_EQUALVERIFY OP_CHECKSIG in a P2PKH output (they are currently 60% of daily transactions). all we have to do is to add a single byte saying this is a P2PKH output so we ca replace 5 bytes with 1.
or every coinbase transaction input is byte[32] and -1 with 630k blocks that is 22.68 GB saved!
these 2 examples have virtually zero computation cost.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
How about instead of having 10% of blocks stored locally at any given time, each block is just compressed after they are downloaded, and uncompressed when a transaction needs to be verified? Something such as what's described in this paper https://ieeexplore.ieee.org/document/8605487. It's behind a paywall but the general idea is that instead of storing all the blocks you store a block that contains a summary of all the block transactions, and the hashes of the blocks it contains, moving around block headers if necessary, all while preserving the block file format so that the blockchain doesn't have to be altered retroactively. It also mentions that since these summary blocks might be bigger than the original blocks they are compressed with Huffman and LZ77.

If this were to be implemented the first summary block would be placed at the current block height and it will summarize the first N blocks that come after it i.e. everything before this block is not summarized and left untouched, then the next summary block will summarize the next N blocks and so on.

We're stuck with the blockchain size that we have today but this causes it to grow more slowly.
legendary
Activity: 2898
Merit: 1823
I believe the OP is talking about something like pruned semi-archival nodes, but with incomplete parts of the blockchain archived in different "slices".

It still downloads, and validates everything, with only a "slice" of the blockchain is in storage, plus the latest blocks of the pruned node of course.

That's right. If 10 users were to be assigned a slice, you would have to download only 10% of the network. If I've been using SPV only until now on my mobile phone, this would make it possible for me to also store a part of the blockchain and support the network. I understand the advantages of running a full node, but having a device with smaller storage space doesn't really give you any choice besides running a wallet in SPV mode (or with pruning).


Your SPV in your mobile phone would store a "slice" for archiving for full nodes to download from you? No, it MUST be a full node that downloaded, and validated the blocks as valid that stores a "slice" for archiving.
legendary
Activity: 1134
Merit: 1599
Really we do put some trust in central servers in the codebase with the use of the DNS seeds those nodes are normally what a new node will connect to in most cases so we already have some hard coded nodes in the codebase to trust when connecting to the network  I think the above is a very interesting idea and is worthy of more discussion.  Great topic!
Thank you!

Oh, now I get it! I do remember now there was a list you could check on Electrum where you'd see all accepted/trusted nodes. IIRC, it asked you even upon install to which server you'd like to connect. The years of not using any other wallet besides Ledger are now starting to show up.. Cheesy

The problem would be if only a handful of servers (say Blockstream and 2-3 more "big data companies" as Chlotide says) were to hold the full nodes we'd have to trust. As far as I can tell, that could easily lead to a few of them working together on attacking the network.
hero member
Activity: 1241
Merit: 623
OGRaccoon
Really we do put some trust in central servers in the codebase with the use of the DNS seeds those nodes are normally what a new node will connect to in most cases so we already have some hard coded nodes in the codebase to trust when connecting to the network  I think the above is a very interesting idea and is worthy of more discussion.  Great topic!
legendary
Activity: 1134
Merit: 1599
I believe the OP is talking about something like pruned semi-archival nodes, but with incomplete parts of the blockchain archived in different "slices".

It still downloads, and validates everything, with only a "slice" of the blockchain is in storage, plus the latest blocks of the pruned node of course.
That's right. If 10 users were to be assigned a slice, you would have to download only 10% of the network. If I've been using SPV only until now on my mobile phone, this would make it possible for me to also store a part of the blockchain and support the network. I understand the advantages of running a full node, but having a device with smaller storage space doesn't really give you any choice besides running a wallet in SPV mode (or with pruning).

But yeah, the idea sounds easier than done as always and I do get that. Sounds easy to talk about "assigning a slice to every new wallet user choosing semi-nodes" but to accomplish an idea and still keep everything 100% decentralized and at least a bit conveninent is probably much harder.



Using data center (along with VPS) is somewhat common practice when setup full node client due to lower costs, we only can hope that not everyone use same hosting provider (e.g. AWS and DigitalOcean).
Well, Chlotide's reply sounds more like talking about the centralization of nodes hence my reply:

~
Indeed, there is the possibility that nodes will be held in the future only in data centers and the network would rely mostly on big data companies. Who knows ?
Maybe they will launch 1000 blockstream satellites, maybe Elon installs a node on his 40.000 satellites
~
legendary
Activity: 2898
Merit: 1823
bitcoin is trustless, which basically means you don't trust anybody else. instead you verify everything there is to verify from the very first block to the last.

another thing you have to keep in mind is what (i believe core calls) chainstate. when you receive a new block you have to know what outputs are still unspent so that you can verify that new block. to do that you have to have verified each block from 1 to now and have created that database. that is not something that you can trust others with. and it is not something that  you can build without downloading EVERY block.

=> that means "not downloading everything" is not an option if you want to be considered a "full node".

=> but that also means that all you need is that UTXO database and everything else is delete-able. which is what pruned mode does. it discards old blocks after they are verified so that the storage requirement is small.

with your idea when a node doesn't download a "slice" it is not capable of updating its UTXO set, now it has to "trust" a third party to update it for them. i'd say use a simple SPV client instead if you want to put trust in another node.


I believe the OP is talking about something like pruned semi-archival nodes, but with incomplete parts of the blockchain archived in different "slices".

It still downloads, and validates everything, with only a "slice" of the blockchain is in storage, plus the latest blocks of the pruned node of course.
full member
Activity: 305
Merit: 106
Holding full nodes on satellites and/or data centers would be centralizing the network, defeating Bitcoin's purpose.

I know but would be a plausible reality in the future. I agree it would undermine the basics of desentralization but it could happen never the less.
I don't know what tommorow will bring, not to mention 10-20 years from now. But the way I see it we should talk about thse types of scenarios than to be passive about them and be caught by surprise.
legendary
Activity: 1134
Merit: 1599
I see & understand what you guys mean. Thanks for the explanation - I appreciate it. Smiley



I see it as an "all or nothing" approach. You either use the pruned version where you see the latest blocks (case in witch you also download and verify every single block from Satoshi's basement to date and keep the prunes or keep a complete ledger for the reasons mentioned above by mocacinno.

Indeed, there is the possibility that nodes will be held in the future only in data centers and the network would rely mostly on big data companies. Who knows ?
Maybe they will launch 1000 blockstream satellites, maybe Elon installs a node on his 40.000 satellites, maybe martians come with their evolved tech and take control of all our crypto Smiley)) but trusting an almost full node somehow does not resonate well. Not much difference between that and blockchain.com ... still need trust Smiley

Interesting idea tho  
Holding full nodes on satellites and/or data centers would be centralizing the network, defeating Bitcoin's purpose.
legendary
Activity: 3472
Merit: 10611
bitcoin is trustless, which basically means you don't trust anybody else. instead you verify everything there is to verify from the very first block to the last.
another thing you have to keep in mind is what (i believe core calls) chainstate. when you receive a new block you have to know what outputs are still unspent so that you can verify that new block. to do that you have to have verified each block from 1 to now and have created that database. that is not something that you can trust others with. and it is not something that  you can build without downloading EVERY block.
=> that means "not downloading everything" is not an option if you want to be considered a "full node".

=> but that also means that all you need is that UTXO database and everything else is delete-able. which is what pruned mode does. it discards old blocks after they are verified so that the storage requirement is small.

with your idea when a node doesn't download a "slice" it is not capable of updating its UTXO set, now it has to "trust" a third party to update it for them. i'd say use a simple SPV client instead if you want to put trust in another node.
full member
Activity: 305
Merit: 106
I see it as an "all or nothing" approach. You either use the pruned version where you see the latest blocks (case in witch you also download and verify every single block from Satoshi's basement to date and keep the prunes or keep a complete ledger for the reasons mentioned above by mocacinno.

Indeed, there is the possibility that nodes will be held in the future only in data centers and the network would rely mostly on big data companies. Who knows ?
Maybe they will launch 1000 blockstream satellites, maybe Elon installs a node on his 40.000 satellites, maybe martians come with their evolved tech and take control of all our crypto Smiley)) but trusting an almost full node somehow does not resonate well. Not much difference between that and blockchain.com ... still need trust Smiley

Interesting idea tho   
legendary
Activity: 3612
Merit: 5297
https://merel.mobi => buy facemasks with BTC/LTC
It's an interesting idear, however, it would require you to trust the other nodes.

One of the mayor reasons for running a full node is because you can verify every transaction in every block, build your own utxo db, fill your own mempool, verify every new block, check each and every transaction funding one of your address without trusting somebody else... As soon as you enter a system as described in your OP, you'd be trusting other people again.
legendary
Activity: 1134
Merit: 1599
As far as I know, the more full nodes there are on the network, the more trustless the blockchain becomes. But the large size Bitcoin's blockchain will sooner or later become a problem when it comes to users' capacity of holding an entire node on a hard drive.

I do get the idea that hard disks will increase in size in time. I mean, we see this with every new phone, PC and other devices. But I think it'd still be a good idea to somehow make it so that a larger percentage of people's devices could become at least semi-sustainers of the trustless manner of BTC.

So what if, besides pruning, using an SPV and owning a full node, we had the option to only hold 10% of the blocks on our device in combination with an SPV node? As far as I know, there's constant communication between the Core users (can't find a more specific term - I mean communication between those who are using the Core wallet), right? Well, then the blocks could be split in 10 slices and whoever chooses this option I'm suggesting would have to only store 10% (a slice) of the blocks downloaded on their device. The newly installed Core wallet would receive the information regarding which slice they should become in order to know which blocks to download and store on their device.

How would it work?
Say there were exactly 999,999 blocks mined on the network. Each of the 10 slices of semi-nodes would have 10% (so 100,000 blocks) downloaded and stored. As soon as block no. 1,000,000 is mined, the last slice (slice no. 10) gets to download it. All the other slices would delete the first downloaded block (except slice no. 1) and then download the next one after the last.

For example,
  • Slice 1 (storing blocks 0 - 99,999) will download block no. 100,000.
  • Slice 2 (storing blocks 100,000 - 199,999) will delete block no 100,000 (as it's now stored by slice 1) and will download block no. 200,000.
  • Slice 3 (storing blocks 200,000 - 299,999) will delete block no. 200,000 (as it's now stored by slice 2) and will download block no. 300,000.
  • ... and so on.

Now let's say there are 1,000 semi-nodes split into a perfect balance of 10 slices of 100 users. After a while, 300 users disconnect at the same time. Obviously, these 300 aren't split perfectly into 30 users from each slice so the balance is disrupted. What happens next? Well, the next one who downloads the Core wallet and decides to take only 10% of the blocks will download the slice that has the least users, helping maintain a balance as close to 10% as possible between all slices.

I have also made a table (forgot to complete the first row and lost the excel file) if it helps but I am not sure how easy-to-understand it is. It was a big headache as the first block is block "0" and not "1" Cheesy

With a blockchain size of 500GB, a Bitcoin supporter/user could sustain the network even by downloading a slice of 10% on a 64 GB phone, hence becoming a semi-node. We could have the option to download 10, 20, 30, ..., 90% of the blockchain if we have enough storage. For example, if the blockchain had a size of 500GB, maybe I have a 256GB phone and I want to become a semi-node downloading 200GB (so 40% or 4 slices) of the block data.

Is this plausible & a good idea or is there a large flaw I'm missing?
Jump to: