Currently, all lightweight nodes have to more-or-less give a list of all of their addresses to someone else in order to find their incoming and outgoing transactions. This totally destroys their privacy because it connects all of their addresses together, even if they're behind Tor. Some wallets try to obfuscate this with bloom filters, but research has shown that this isn't all that effective at maintaining privacy. Bitcoin Core avoids this problem by downloading the entire block chain, but this uses a lot of bandwidth, and if you don't store the entire block chain, you have to download everything again if you want to rescan.
I found that there is a (beta) library called
XPIR which may be able to improve this. It uses homomorphic encryption to allow for a server to have a database which clients can query, but without the server or any listeners knowing which entries in the database the client is querying/receiving. So clients could ask the server (encrypted), "please give me all transactions associated with address ____", and the server would be able to provide this data without knowing anything about the client's query. This would significantly improve privacy.
Running a server will be pretty resource-intensive, so this isn't something that every full node is going to be doing. However, it would be appropriate for Electrum servers, sites like GreenAddress, etc.
I haven't actually tried the software, but looking at the
paper, here's how I think it'd work on a technical level:
- The database to be queried must be structured as
index -> value, where indices must be sequential integers starting at 0, and clients can only query for specific indices. So the first step is for the server to publish a mapping between all possible scriptPubKey queries and their indicies. This can be done by hashing each scriptPubKey seen in the block chain (maybe with wildcards to handle cases like stealth addresses), sorting the hashes, and giving each one a sequential index. All clients need to download this initial mapping. If the hash is 128 bits and the index is 64 bits (both of these could maybe be shortened), currently the mapping would be around 10.3 MB with today's 429k unique addresses. Everyone gets the same mapping, so there's no need to download it in any special anonymous way.
- The efficiency of the database goes down with the number of entries. Here, "entries" is the number of possible scriptPubKey queries. So each server should actually have several XPIR databases, each of which will contain perhaps around 100,000 possible scriptPubKeys. Info about which scriptPubKey ranges belong to which database will also have to be downloaded by clients, but this is a negligible amount of data. Each database needs to start its indices at 0, so the client will have to adjust the global index down according to which database it's querying. Segregating the scriptPubKeys like this allows the server to know that the client has
some address in that range, but there are enough addresses to make this not-very-useful. The client could also send dummy queries to disrupt even this.
- The client would just send one query for every address in its wallet. For each one, they'd get all of the address's transactions along with its merkle branch linking it to the block chain. From page 14 of the XPIR paper, it looks like you can very roughly expect to download your transaction data at 12.5 kB/s, with a latency between first query and initial transaction data of 10-500 seconds depending mainly on the client's upload speed. These speeds seem OK.
- If clients are only interested in addresses that currently have BTC, you can significantly reduce the size of the database, and therefore increase efficiency. Similarly, the database size can be reduced for clients that have been online for a while and are therefore only interested in new/recent blocks.
- I'm not sure how many clients a reasonable server could comfortably handle. If it's not very many, then servers might need to charge for this. Maybe they could have a free queue, and then the possibility of paying to jump the queue. They could use blinded tokens to accept micropayments anonymously. The current XPIR library seems not to be very optimized, especially for multiple simultaneous clients, so performance improvements are probably possible.
Thoughts?