Author

Topic: Fast synchronization idea (Read 1356 times)

newbie
Activity: 14
Merit: 0
September 01, 2014, 11:41:31 AM
#6
When you say "they just have to install TOR", you're building the Linux experience, where you have to get countless random libs to make it work, and you have to be a wizard in his 20s. And then it will be as popular as Linux, 20 years later. It has to be the Apple experience: "Here's everything you need", "It just works", it has to be literally a button in the client, or normal people won't use it, because it's moonspeak to them. It's not easy for everyone. One month ago I had someone ask me to help with his computer, he typed the URL in Google and didn't even know Ctrl-C. But he owned a few buildings. This is the target audience, or it should be. He doesn't know TOR but he deserves privacy. The other problem is that then to support Bitcoin TORing you have to support illegal data TORing, it seems. Of course I'm pro-TOR and maybe I'm grasping at straws, but still. I'll address your criticism, it's great, you're awesome.

- Attacks are possible and surely always will be when it's a matter of sending through proxy. And I was indeed imagining a tracker of nodes, shared between clients, as it's complex enough, though it'd be less Sybil resistant than TOR. The goal is not absolute anonymity, but moderate. My thinking is that it's better than nothing, and enough for most uses. What I mean is that if you're Al Qaida or a big corporation you wouldn't use the fast synchronization idea, like you wouldn't trust TOR 100%, it's for casual use. Maybe the limitation should be more reflected in the name not to misled people, like 'partial synchronization', with what it entails. But I agree there'd be people trying to gather information from the service, because such information surely has value, so it's a sensitive issue, it can't be too easily exploitable. You convinced me that we could probably only query one address at a time. Maybe it should just go through TOR servers then, integrated in the client, but it's annoying to lose control over how it works when it could be adapted to Bitcoin's needs. I'm worried about address-linking. But I guess we shouldn't reinvent the wheel, though it'd be fun. For example nodes could be required to sign with an address owning bitcoins or having solved a block in the past to be considered favorite as relays in the tracker, and other nodes could be filled by mining pools depending on the node's hashrate, so there could be some Sybil but not infinitely. Tricks like that.
- I didn't think of that timing issue, good point. Honestly, I'm not an expert on TOR. I'd say requests for multiple addresses shouldn't be synchronous (there's time during block downloading; maybe we'd tick the addresses we care about and the rest comes later) and there could be constant dummy queries to hide, at random intervals. Apparently it's an actual strategy, there are many. There's little security in obscurity, but it's not a matter of absolute anonymity, because that probably can't be done.
- The incentive problem was on my mind too. There are people who seed torrents even on Pirate Bay, so I'm thinking many people will turn it on just to help or because they don't care or are too lazy to turn it off after using it, if it costs little to run, like for tx propagation. It'd just become natural that cryptocurrencies are slightly like torrents, clients participate. Many people would turn it on ideologically, because they want to help Bitcoin in the 'fight' against banks & co, but can't mine or run full nodes as it requires massive money or bandwidth. They'd manage transaction TOR-ing, while full nodes do the synchronization, it'd be a beautiful collaborative effort. Also, enabling the feature increases utility, the value of Bitcoin, so that's some minor incentive to have enough such nodes. But at worse it's 3mb every 10min, 3 relays x 1mb block, shared across all nodes, so 10kb/10min each with only 300 nodes, + negligible synchronizing data. Maybe 200kb/10min with fake data, 0.3 kbps + low level data. It seems like not much, and we'd set a speed limit in options. At worse, fees, but it'd be beautiful if it could just be free. Integrated proxying between nodes.
- TOR channels data constantly and massively, surely it's not the same, I think changing paths could be afforded here, though the maths should be done and I'm not the best at networking. But if it's the same path then the full node can link addresses too easily. Also I was mistaken, it would not be a path per output, but a path per address, as we'd query by pubKeyHash, so it's output-linking.

For the meta-argument, overcomplexity, if it's what it takes then it's what it takes. We want to trustlessly query specific outputs minimizing who knows whose is what. Maybe it's the naive approach, but there are probably not many ways to go about it. Fast & light clients with minimal loss of trust & privacy is worth a complex solution. And it'd be fun stuff to code, if you ask me, I wish I had what it takes. But if someone has an easy solution, I'm all for it. But this is what I would do, and if it was there, I would use it. In fact I don't have a client running because my HDD is full and I have like 0.7 BTC. But if I could just download 5gb from the chain, for the massive distributed PoW trust it gives, combined with multiple full-node 3rd parties telling me the 1st ledger hash is valid, it'd lighten the load on full nodes and it'd be just perfect for my use, as an attack combining both would have little effect and is very very unlikely. Unlikely enough not to bother with 21gb and counting.  
https://blockchain.info/charts/blocks-size?timespan=1year&showDataPoints=false&daysAverageString=1&show_header=true&scale=0&address=
Maybe it's not that much but it'll be like 100gb in 6 years at this rate. I can't afford it, there are just too many games and movies I can't delete. Worse comes to worst I'll buy a hard drive, but still.

edit: Andytoshi, I didn't understand the part about having to check every transaction in the future, and how it's bad and should be avoided or something. Isn't it already how it works? I should read the code it seems. I don't think normal nodes check the full validity, that's for full nodes, but they're tracking their own outputs in the block, aren't they? So it would change little, to my understanding. Download a ledger and update with downloaded blocks tx. It's just that instead of using the special case Genesis block with Ledger = empty set, it's a later block with non-empty ledger. There's a trade-off of trust for space and speed, but god knows I'd take it, it's surely unavoidable. And like I said, it could just be to speed up synchronization, then download in reverse to the genesis block if people want, for complete trust.
full member
Activity: 179
Merit: 151
-
August 31, 2014, 11:31:52 AM
#5
Avoiding IP-linkage is easy by using Tor directly.

To avoid address linkage by using a "Tor-like protocol" has some serious problems:
- How do you avoid isolation or Sybil attacks? Tor itself uses a centralized repository of all nodes on the network. This doesn't prevent some TLA from running a huge chunk of the nodes, but it does make sure that every user can see a full list of nodes (and build circuits spanning a wide IP mask).

- Timing analysis. This is a big problem for Tor and would be absolutely enormous for this application, where every user downloads all their outputs in one shot then nothing ever again.

- How do you incentivize non-attackers to run nodes? There is a strong moral incentive to run Tor, plus running a node helps to shield your own traffic. If you are running a full node, then you don't need this service because you know all the outputs, and presumably you care enough about the network to be providing information about its state...so why do you want to help others who are avoiding doing this?

- Creating an entire new circuit for every output would be extremely expensive. Tor doesn't do this. It doesn't even create a new circuit for each new connection.

Generally any problem solved with a "Tor-like protocol on top of Bitcoin" would be better solved with Tor itself. In this case, using Tor only solves the first of my objections (Sybil resistance). There is also the meta-argument of this being an extremely complex solution to a specific problem.
newbie
Activity: 14
Merit: 0
August 31, 2014, 05:55:12 AM
#4
Thanks andytoshi, I thought about the privacy problems. The 1st is that the IP is linked to the addresses, the 2nd is that the addresses are linked together, the 3rd is that it leaks activity of the owner. 1 and 2 could be alleviated with a TOR-like protocol. A node asks a node to ask a node to ask a node to ask the full node about one of the output, with multiple layers of encryption along the way for each node, like an onion. And another path of nodes to different full nodes for the other outputs. It's some work but it could also be used to send transactions more anonymously. And I don't think bandwidth is such a problem because synchronizing is a rather rare event and it's only 47 hashes or so per output. For the 3rd problem, maybe by default we'd also ask for the balance of some others, to mitigate. This TOR thing would increase bandwidth but not for everyone. They could click for 'Privacy Mode' and enter this TOR state with other clients who did so.
legendary
Activity: 2053
Merit: 1356
aka tonikt
August 29, 2014, 03:09:49 PM
#3
Yeah
full member
Activity: 179
Merit: 151
-
August 29, 2014, 02:36:45 PM
#2
Some thoughts:

What I believe you are proposing is to store the root of a Merkle Tree of UTXOs in each block. This would be useful for a number of applications, and if it did not require a softfork to implement, I suspect it would have happened already. Such a commitment would allow fast proofs of inclusion or exclusion of individual UTXOs at individual block heights. The cost in blockchain space would be roughly a single hash: the root.

Note that actually using the root to check inclusion/excusion/insertion/deletion of any UTXO requires a Merkle proof, which as you observe has size logarithmic in the total number of UTXOs. To be able to generate such a proof for an arbitrary output would require access to the entire UTXO set, while verifying it requires only the root. This means that without attaching proofs to transactions (which would be a huge increase in their size!), non-full validators could not follow updates to the Merkle root, nor could they generate their own proofs. This puts a big damper on the usefulness of this proposal, because you are forcing any users who don't have the UTXO set (i.e., everyone who would benefit from having a commitment in the first place) to request proofs from some full-node peer. They don't need to fear false proofs, since they can validate themselves, but there is a privacy cost, not to mention bandwidth.

It would let users quickly synchronize the blockchain by doing a SPV validation, requesting proofs of their outputs in some recent block, and checking the transactions in every block after that, but they wouldn't be able to produce their own proofs and they would need to request new proofs for each block if they wanted to avoid checking every transaction in the future.

Oh, and by the way at time of this posting there are 13057707 possibly spendable UTXOs plus 4886 provably unspendable ones. (Note that my spendability checker is really weak right now so maybe this number is quite low.) We rolled over 13 million some time last night.
newbie
Activity: 14
Merit: 0
August 25, 2014, 09:35:56 PM
#1
We add a ledger hash in the block header

The ledger hash is the root of a tree of hashes, of the unspent outputs
When a client wants to synchronize, we send first the branches of his outputs, and the hashes of other branches, to reconstruct the ledger hash

So the client is quickly synchronized with his balance. He'd select a difficulty, the depth of blocks to download, and check the hash
When it's downloaded, it could keep downloading in reverse, to increase difficulty and trust in the ledger hash
It could be combined with the merkle tree chain idea to be even faster

I'm not sure how to organize the tree of hashes of outputs
But for example if there were 9 outputs, we could make 3 stacks of 3, and only send 1 stack + 2 hashes + ledger hash.
Does anyone know how many unspent outputs there are, and the size of the whole set? Can't find the information

I hope you don't mind I post this suggestion on the side, I find it interesting, let me know what you think



edit:
I did some maths
Let's say there are 10 millions spendable outputs
We hash each of them
Then we'll hash by pair
Then hash by pair of pairs, ect...
Binary tree fashion

There'd be log2(10M) levels to the tree, so 23

So if the user has only one output for his balance, then we'd just have to send him his output and roughly 23*2 hashes
*2 because for each level of the tree we need to give the hash of the unexplored path too, so the client can prove it hashes correctly

One output and 46 hashes is negligible data, so the download size would mostly be the blocks we require for PoW trust.

I understand the merkle chain would be very lightweight too, but I'm thinking that eventually even the light chain will be heavy (the amount of hashes to send increases linearly)
Whereas this allows fast synchronization even far in the future, by letting the user decide how deep in the chain he polls the ledger (and he can download to the genesis block to prove the integrity)

edit:
Ah turns out it's not new, I found the idea in the wiki, deep in the Thin Client page, so this thread is now about why it's not happening, I think it'd be a great change, and probably better early than later.
But I guess I can understand not wanting to change until it's desperately needed. But it's still another thing slowing adoption. And imagine if the 1mb limit is lifted one day.
I guess allowing such partial synchronization is a trade-off on many levels, but well it's an optional trade-off.
Jump to: