Author

Topic: Revised swarm node proposal (Read 1939 times)

hero member
Activity: 815
Merit: 1002
January 06, 2016, 08:54:39 AM
#9
@Realpra thanks for the answers.

Id also argue that the recent attack with high S signatures was a major concern (quick fixed in 0.11.1) and IIRC seg wit is planed as hard fork in the long run.
Np Smiley Thank you for the interest.

Only if you had logic relying on pre-calculated hashes.

Since at the moment and historically TX hashes are never expected to be un-malleable any good solution should have looked for the expected outputs instead of a specific TX hash.
Outputs are signed and 100% reliable.

The only legitimate cases I know were TX chaining and the lightning network which benefits from virtual transactions with a known specific hash.
At any rate I'm not against fixing that IF done nicely, but it doesn't solve scaling.
hero member
Activity: 815
Merit: 1002
January 06, 2016, 08:46:50 AM
#8
Its not trust as trust between humans - it is just anti-DDOS between machines using the same hashing that Bitcoin uses same as hash cash etc. etc. etc..
You can call it what you like but it's a persistent identity which can be targeted for abuse. If an attacker wants they can farm them and have an advantage against honest users, potentially an avenue for coercion and theft, etc.
I'm calling it what it is DDOS protection.

Please do explain how you plan to steal from or coerce a randomly generated key with 2 minutes of hashing work behind it.

I'll wait.

Quote
The scheme you've described is largely incompatible with pruning too, since to validate nodes randomly need historic data that today full nodes don't even need to retain.
Why?

Nothing prevents nodes from removing TXs with spent outputs in their chunk range although it would require some polling/notifications.

Not that it matters either way because these swarm nodes would cost 1.6 $ a year in HD space...

Quote
If it's sufficiently minor then it provides little protection. In either case it's a concrete cost against running a node.
A concrete cost of 2-10 minutes of CPU? Are you being serious right now?

Even a small thing like that would make spam attacks difficult.

Quote
That it also make it possible to verify blocks fractionally (at your random choice) without having to trust data from third parties is what I was mostly referring to...
How do you verify no double spends "randomly at your choice"? You need the complete history to do that (minus some swarm design ala mine).
I don't see how a minor change in signature storage changes that.
copper member
Activity: 1498
Merit: 1562
No I dont escrow anymore.
January 06, 2016, 07:11:08 AM
#7
@Realpra thanks for the answers.

-snip segwit-
Did I miss something? Seems like a very expensive way to solve malleability and not much else.
You've missed 85% of the benefits of the design. The fact the it increases effective space is largely an incidental (but useful) side-effect. That it also make it possible to verify blocks fractionally (at your random choice) without having to trust data from third parties is what I was mostly referring to...
[/quote]

Id also argue that the recent attack with high S signatures was a major concern (quick fixed in 0.11.1) and IIRC seg wit is planed as hard fork in the long run.

hero member
Activity: 815
Merit: 1002
January 06, 2016, 01:01:53 AM
#6
Quote
Today the order is random and without any effect.
No the transactions are required by the consensus protocol to be dependency ordered. Any other ordering which does not meet this criteria potentially creates some super-linear costs in block validation.
My bad.

However once TXs are sorted by hashes and you have lookup tables for them it would not be linear to make that look up anymore and removing this dependency ordering would be fine.

Quote
Quote
Each node will have an identity such as an ECDSA-AES public key over which all communication and chunk signing occurs.
A nit: "ECDSA-AES" isn't a thing...
Sorry it should have said ECDH-AES (Elliptic Curve Diffie Hellman key exchange using AES symmetric encryption).
Quote
but critically introducing persistent trusted "identity" into the bitcoin protocol would be a radical regression against the insights that Bitcoin brought to the first place. I believe it is neither useful nor desirable to do so, and that it may present considerable risks.
Its not trust as trust between humans - it is just anti-DDOS between machines using the same hashing that Bitcoin uses same as hash cash etc. etc. etc..

Quote
Quote
2. Proof of work burn done for that specific key.
Creating a real monetary costs for operating a node is not a move which seems likely to result in a meaningfully decentralized system.
Same as mining real cost doing "nothing".
100% the same thing, nothing new.

For honest nodes this cost would be minor and onetime only.

Only attackers that keep getting blocked would need to keep burning hash power.

Quote
Quote
Using the table and another node we request and get the unknown TX.
Excepting spam and mistakes (e.g. key loss) all outputs that are created are eventually consumed. A single transaction can create many outputs, and they usually do create multiple ones. They also usually consume multiple inputs.  In the approach presented this fact will result in most transactions being transmitted to most nodes in any case, losing much of the intended advantages.
 
I thought about this, but it is actually not that bad.

You would process on avg. 512 transactions even if you had to get them all and then all their outputs say avg. 3 that would only be 0.68 mb per block. It is trivial to poll for this amount of data.

So yes all nodes would need to get maybe 2048 TXs, but not 1 million. No advantages lost Wink

Quote
Now, instead of sending just a block, you set a part of a block but need to receive every transaction those transactions spend (which will be a considerable multiple, and almost all of them will need to come from remote nodes-- which, if they're offline, will leave you would be stuck and forced to either reject the longest chain temporarily (and potentially begin mining a fork) or accept the block without even verifying your part).
You would always have many backup connections for each chunk range.

Getting stuck would only happen if you have no connection, can't be helped.

Quote
I think its unfortunate that you've dismissed segwitness without, apparently, really understanding what it provides. Segwitness sets up a framework for the kinds of efficiency gains you're hoping to achieve here but without the same problems and without depending on introducing identified bonded nodes.
I understand it quite well.
You take out the signatures from the TXs in a seperate merkle tree which allows 1mb blocks to hold 4 mb tx data - but only if done in a hacky way otherwise a hardfork is needed.
I don't believe in hacky solutions for 5 billion dollar systems so it should be done as a nice hard fork or not at all.

Since you're hard forking anyway just raising the limit to 4mb would do the same thing.

If you're doing full validation seg witness doesn't help because you still have to get the signatures, you're still getting 4mb data, it is not magic.

It DOES solve signature malleability, which is nice, but it was never a major concern unless you were a bad programmer at mt gox.
Did I miss something? Seems like a very expensive way to solve malleability and not much else.
hero member
Activity: 815
Merit: 1002
January 06, 2016, 12:29:15 AM
#5
Firstly I like the general idea and I had thought that pruned nodes work somewhat like this, at least in terms of storage. Im not entirely sure I understand every detail of your suggestion though. E.g., not that its of importance here, so feel free to ignore this question: How is it easy for governments to target companies unless they are all in the same country?
You're right it is not 100% easy for them, but consider this: When Snowden was on the run the USA was able to ground a plane with the Spanish PRIME MINISTER because they though MAYBE Snowden was there.

If Bitcoin was eroding taxation national governments might indeed take global action.

That said I also wrote that centralized Bitcoin hosting would still be better than the fiat we have now.

Quote
5.5:
I dont agree with the permanent blacklisting. You can either blacklist the ECDSA key, which is usueless because the cheating node can just create a new one or you can blacklist the IP. The cheating node will get a new IP and that banned as well. If some spoofed IPs you can completly isolate a node or even render the entire network useless.

"Proof of work burn done"? So I would need to pay (or burn) bitcoin in order to run a node?

Proof of work does not involve burning Bitcoin, just hashing a minute or two as a DDOS prevention. Very standard same as in hash cash or Bitcoin mining. I wrote "work burn" because you would be "wasting" CPU/GPU power.

This loss is what prevents a spam attack using many newly generated keys.

Quote
5.7:

I am not entirely sure, but how do the tables you use to store information about other nodes work if you consider home run nodes? They will have e.g. daily new IP addresses and might be offline for the majority of the day.
If the nodes change IP they would somehow have to announce their presence when going online. Otherwise nodes can store key/IP mappings along with some anti-DDOS trust information.

Each node would know many other nodes for each chunk so some being offline temporarily would not be an issue.

Quote
Also why not validate the entire block yourself, but only keep your chunk instead of requesting data from several peers that not might not be responsive. Block validation is crucial for propagation and if this becomes a slow process we might have a significant increase on orphans.
At 1600 TPS the blocks are 330 mb each it would be slower to wait for all nodes getting that. Wouldn't even be able to cross the firewall of China within 10 min. if sent directly from one node to another. (according to Chinese miners not me)
Much faster to have a "slower", but massively parallel process.

On average you would would check 512 transactions. If you got all their inputs - say avg. 3 - that would only be 4 * 330 bytes * 512 =  0.68mb. This represents most of the data you would need to get per block so maybe 1-2mb over 10 minutes, not a huge load.

Finding online nodes would not take much time.

Quote
6.1:

Im not sure I can follow your arguments on bandwith usage. 156GB per year will not be evenly distributed.
See above, I estimate 1-2 mb on average per block over 10 minutes. Even if this fluctuated to 10 mb sometimes that would not cripple a normal PC connection at all.

Quote
You also as far as I can tell did not account for transaction relaying, which is part of a full nodes work. If we assume 10^6 TX per block this is no small part to handle.
Correct I forgot, however we can just relay based on script range so if the TX claims a TX script with a hash in the range 000-0CF you send it to nodes handling that script range.

Quote
Other things I might have missed/dont understand:
- how are chunk sections select? Do I pick them by hand? Are they random?
Always intervals of 512. This gives a nice trade off in terms of data you need to store and data you need to poll for.
512 also means that your entire chunk will equal exactly one merkle tree node in the block merkle tree.

Running a new node the software should select to store a chunk range that has the fewest other swarm nodes hosting it.
Range 51200-51712 for example.

Quote
- how do you suggest to start with an "all trust" approach considering the current state of the network "dont try anyone"?
Lets say you are a Chinese mining operation and there are multiple openings in and out of China, but each is limited to say half the block size/10 min.
So you run two swarm nodes that trust each other (because you operate both) and place one near each of the firewall openings. Think one at the border of Russia and one at the south west border.

And tada now Chinese miners are ready for bigger block sizes although their individual connections suck.

Quote
- What if there are not enough swarm nodes to cover all chunk ranges? Or, less crucial, what if there are not enough swam nodes to cover all chunk ranges all the time?
Each range should always have 200+ nodes covering at a minimum.

As for ensuring coverage you could do either of:
A. Limit blocks by TX number instead of size.
B. Nodes can cover more or all chunks if they choose, so just have some that cover the range 1mil. to infinity even if this is mostly empty.
C. Calculate the maximum number of TX based on block limit and min. TX size.
copper member
Activity: 1498
Merit: 1562
No I dont escrow anymore.
January 05, 2016, 07:05:05 AM
#4
Firstly I like the general idea and I had thought that pruned nodes work somewhat like this, at least in terms of storage. Im not entirely sure I understand every detail of your suggestion though. E.g., not that its of importance here, so feel free to ignore this question: How is it easy for governments to target companies unless they are all in the same country?

5.5:

I dont agree with the permanent blacklisting. You can either blacklist the ECDSA key, which is usueless because the cheating node can just create a new one or you can blacklist the IP. The cheating node will get a new IP and that banned as well. If some spoofed IPs you can completly isolate a node or even render the entire network useless.

"Proof of work burn done"? So I would need to pay (or burn) bitcoin in order to run a node?

5.7:

I am not entirely sure, but how do the tables you use to store information about other nodes work if you consider home run nodes? They will have e.g. daily new IP addresses and might be offline for the majority of the day. Also why not validate the entire block yourself, but only keep your chunk instead of requesting data from several peers that not might not be responsive. Block validation is crucial for propagation and if this becomes a slow process we might have a significant increase on orphans.

6.1:

Im not sure I can follow your arguments on bandwith usage. 156GB per year will not be evenly distributed. You also as far as I can tell did not account for transaction relaying, which is part of a full nodes work. If we assume 10^6 TX per block this is no small part to handle.

Other things I might have missed/dont understand:
- how are chunk sections select? Do I pick them by hand? Are they random?
- how do you suggest to start with an "all trust" approach considering the current state of the network "dont try anyone"?
- What if there are not enough swarm nodes to cover all chunk ranges? Or, less crucial, what if there are not enough swam nodes to cover all chunk ranges all the time?
hero member
Activity: 815
Merit: 1002
January 04, 2016, 09:58:40 AM
#3
There is another solution. Reward people (other than miners) with micro-payments in return for running full nodes.
Ok, but how to do that?
How do you know 100 "different" nodes aren't all the same person running a single datacenter with some proxy addresses?

And even getting past that you would still have all nodes being big data centers with state of the art fiber optic connections - not very resilient and very easy to take down.

Chinese miners have directly said that due to the "Great Firewall of China" they cannot handle more than 8mb blocks. We need to be able to go to 330 mb blocks to reach near-VISA levels.


Bitcoin as it is now works, I think, because normal Bobs and Joes can and do run the nodes - more or less.
Take that away and it would be a lot more tempting to block addresses that governments don't like if you risk losing your fancy data center otherwise.
legendary
Activity: 2828
Merit: 2472
https://JetCash.com
January 04, 2016, 03:35:59 AM
#2
There is another solution. Reward people (other than miners) with micro-payments in return for running full nodes.
hero member
Activity: 815
Merit: 1002
January 04, 2016, 01:43:37 AM
#1
Let me know what you think.

---BITCOIN SCALING---
-- 1 ABOUT ME--
Martin Clemens Bloch. CEO of BlochsTech and creator of an easy to use Bitcoin smart card hardware wallet.
(www. BlochsTech.com)

-- 2 GOAL--
We want to design a partial node that together with other nodes can do full validation of the chain.
By having many of such 'swarm' nodes the chain will remain secure and duplicated many times.

Because each single swarm node would only store and validate a part of the chain a CRITICAL scaling issue would be solved:
Keeping nodes diverse.

If there are many full nodes, but all of them are big companies it becomes easy for governments to target them.

If there are very few full nodes the network can easily fail.

With small swarm nodes the network could be supported entirely by a little spare capacity on users normal PCs.
Such a network might have a few full nodes, but would not require them and still the entire chain would be duplicated
and validated thousands of times.

Each time the amount of users increased the network would also gain more resources.

-- 3 WHY--
Because if the cost of running a validating node is trivial many many people will do it.

This is not a postulation; all torrent files are shared free of charge by people who are done downloading it
and have every incentive to stop sharing.

However because the cost of sharing a file a few times is trivial they - or enough of the users anyway - keep
sharing.

This is the ONLY true solution. The debates of 1mb vs. 8mb vs. XT are misleading and can never satisfy the demands for price and decentralization at the same time.
Segregated witness does absolutely nothing to solve the issue.

The lightning network is a beautiful idea, and while it has great merit it gives too much power to the payment channel operators. Larger payment channel operators can
pay greater fees than smaller ones which together with a 1mb limit means total centralization.

Bitcoin has only 3 real choices:
1. Do nothing and die/be replaced.
2. Increase the limit and accept centralization towards big miners and big node hosting companies.
This would still be much more decentralized than our central banks so I wouldn't see that as a total failure.
3. Split up validation work among multiple nodes - swarm nodes.

This example goes through how to do the 3 option assuming a standard PC with less than 1mb/s available and 1 million transactions per block - 1600 TX per second.

-- 4 TO BE VALIDATED--
1. Nothing can be withheld from a validated block.
2. Valid scripts and signatures.
3. Correct transaction fee total (coinbase).
4. No double spends.
5. Correct mining reward/mining difficulty (trivial will not discuss).
6. Block size.

-- 5 SOLUTION--
- 5.1 REQUIRED CHANGES-
TXs in blocks should be sorted by their TX hashes. Today the order is random and without any effect.

After sorting a TX hash starting with 000 may be first in the merkle tree leaf index and a hash
starting with FFF last.

This will be needed in order to easily request data from other swarm nodes. (See section 5.7)

Swarm nodes could be made without this change, but it makes data handling easier.

- 5.2 SWARM NODE CHUNKS-
Swarm nodes would break the Bitcoin transactions into chunks based on their index in the block.
For this example we will just say 512. (current block limit corresponds to about 3000 txs)

The block will thus be broken into some number of chunks with exactly 512 TXs each and one last chunk
of 1-512 transactions.

We will go through this example assuming our swarm node validates just one chunk.

Chunk sizes like this would mean we would have 1954 chunk ranges for 1 million TX per block - VISA
levels or 1600 TX/s.

A swarm node would only need 15.6 GB per year to keep up at these levels (see section 6.1).

At even higher volume bigger chunks would be better to reduce connection count, but at that market cap
it would not be a problem if running a swarm node required a single dedicated PC full time.

- 5.3 SWARM NODE SCRIPT RANGES-
In addition to chunks we will also partition data long claim script hashes.
All Bitcoin outputs are in the form of a script, the hashes of all those scripts are unique per script.

Swarm nodes would subscribe to certain ranges. So if hashes could be only 1 or 0 node A might subscribe
to the 0 hashes and node B subscribe to 1 hashes.

If node A sees a 1 hash in its chunk range 0-512 it will send it to B. If B sees a hash 0 in its chunk range
513 to 1024 it will send it to A.
B will give its signature to A that it has sent all 0 hashes - for example by A asking B "Does Tx534 and Tx567
contain all the 0 hashes outs in chunk range 513-1024 for block xyz?" and B signing that + "yes".

It can thus easily be proven later if B withheld data.
How this helps us will become clear later.

- 5.4 NO WITHHOLDING-
Our swarm node gets its chunk from the network as well as a list of the other chunks merkle hashes.

Using this information it can determine:
1. The transactions it received are in the block and that from that chunk nothing is missing.
Otherwise the merkle root would not match.

2. Again if an entire chunk is withheld the merkle root will not match.

In this way if we had 1 million transactions per block this would only mean 62.5 kilobytes of merkle hashes
to validate.

- 5.5 NO WITHOLDING - DISHONEST NODES-
Now while this takes care of our own chunk, the other nodes we connect to could cheat.
Other swarm nodes can do two things:
1. Simply sign an invalid chunk.
2. Withhold TX data that we subscribe to from their chunk.

To limit this risk we rank all the nodes we can find for each chunk range. Each node will have an identity such
as an ECDSA-AES public key over which all communication and chunk signing occurs.

We rank their public keys by the following and in this order:
1. We permanently blacklist* nodes that have signed script ranges that turn out to be invalid.
or that have withheld** TXs from us. Ranking of nodes that have recommended a blacklisted node the
past 2 years is reduced by 95%.
2. Proof of work burn done for that specific key. (top 10 we can connect to)
3. Latency. (top 5)
4. How long we have known that node and gotten messages from it. (top 5)
5. Whether other nodes we trust "recommend" that node. (top 5)
(ie. if someone ran a node on two separate devices they would of course recommend their own nodes)

25 peers for 10^6/512 = 1953 chunk ranges with 2000 bytes of ID data each would be a fixed 100 mb cost.

Before a block would be considered valid the node would require 15 of its 25 peers per chunk to sign off on their
chunk* and at least one peer per peer type group (ie. top history, top burn, top ping and top recommends).

* Blacklisting due to invalid script range would be based on a merkle branch proof showing that a TX is being
double spent and that a node signed its script range as valid.

** Blacklisting due to withholding would be based on a signature by the dishonest node that all information
was shared even though it was not. (Chunk range includes relevant TXs 1, 2 and 3 while dishonest node signed
that 1 and 3 were the only relevant)

- 5.6 BLACKLIST REPORTS-
To validate an error report you would only have to get the offending chunk, ether to check the merkle hashes,
to check for missing data or to check invalid scripts.

To validate a double spend would only require the merkle branch of the earlier spending transaction.

If an error report turns out to be false the false reporter is blacklisted.

Error reports are accepted by the top 95% nodes by hash burn/proof of work.

This prevents cheap DDOS spam while still allowing error reports to quickly propagate through the network
from so much as a single honest node.

- 5.7 VALIDATING SCRIPTS-
To validate scripts we need the output scripts that the containing TX references in its inputs section.

When the referenced outputs are not known to the node it will request those TXs from a node known to process that
chunk range.

Since TXs are sorted by their hashes in the block and since hence each chunk will have a highest hash we can
somewhat easily keep track of which chunk a hash is in.
We do this by maintaining a table of highest chunks per block number (3.3 GB/year).

Using the table and another node we request and get the unknown TX.

If the node refuses to give the data the block validation will simply be delayed. In extreme cases
blacklisting of the withholding node is a possibility.

Once we have all input TXs we can validate the scripts and either sign our script range for the block or reject the block.

- 5.8 VALIDATING BLOCK SIZE -
To validate block sizes nodes check the size of their chunks transactions and signs off on it ie. "chunk 45 1.7 kb signed A".
All the nodes can then quickly count the 1954 sizes add them and see if it is below the limit.

If a node sees different sizes for the same chunk the offending node can be blacklisted after getting the chunk and checking its
size.

- 5.9 VALIDATING TOTAL FEES -
Same mechanism is used as for size in section 5.8.

Total fees are then also compared with the coinbase transaction.

- 5.10 CHECKING FOR DOUBLE SPENDS -
Each node will store a table indexed by script hash containing a list of unspent transaction hashes and spending TX hashes per index.
Ie. a list of tuples ordered by an script hash index.

To save hard disk space nodes will not store the full information about transactions in their script range. Instead they store what is
in their chunk range and ask for all other TX information from other nodes.

Since each swarm node processes a small subset of the blockchain bandwidth is a cheaper resource than disk space and not a big worry.
(A 1 mb/s connection can fill 31 TB of disk in a single year - our swarm nodes use only 15.6GB)

-- 6 NUMBER CRUNCHING--
- 6.1 STORAGE REQUIREMENTS PER YEAR -
These numbers are for 1,000,000 transactions per block and with each swarm node processing 512 transactions
each.
Each node will store exactly 512 transactions from its chunk range and approximately 512 transactions
from its script range per block.

We do NOT store script range TXs, only chunk range. Storing script range data would cost almost 30 gb/yr. Instead we store only a table of TX hashes in our script range
and the TX hashes of their spending transactions.
When needed we fetch the script range from our peers which means extra bandwidth usage. This however is not a problem as our yearly bandwidth usage tracking a small subset
of the blockchain is quite low.

8.9 gb for chunk transactions (avg. 330 bytes/tx * 512 * 6 * 24 * 365).
3.4 gb for spending TX hashes ((32 tx hash + 32 * 3 avg. outputs spending TX hashes) * 512 * 52560 blocks/year script range spending TX hashes.)
(log2(10^6 TXs)*32 = 640 bytes extra per TX in merkle branch information)
3.2 gb for TX hash to block and chunk indexing. (1954 highest chunk hashes in sorted order * 32 bytes per hash * 52560 blocks/year)
~0.1 gb for blacklisting and node reputation information (+~1gb fixed one time cost).

Connection usage will likely be up to 10 times this amount (mostly polling for unknown outputs) so 156 gb per year
or 0.005 mb/s - plenty to spare there.

While the nodes will need many other nodes they need not be connected at all times. Nodes can be notified or
polled at need and then the connection closed again.

In other words a normal PC with a 100$ 1TB HD using negligible broadband could run a swarm node and help run
the Bitcoin network at 1600 TPS for 32(!) years and only use half the disk.

It would only take 1954 of such computers to equate one full node - so a small group of enthusiasts could run the
entire Bitcoin network at VISA levels even if all else failed.

- 6.1 ESTIMATED FULL COPIES AT 1600 TPS -
At such a market cap it would be fair to assume that only a few big companies and miners would be running normal full
nodes while a much larger assortment of smaller companies and individuals would be running swarm nodes.

If there are 2 million bitcoiners today and a resulting 6000 full nodes - that today require about the same as a
swarm node would at 1600 TPS - we can try to extrapolate.
At VISA levels one would expect something like 0,7 billion users and if the ratio holds 0.5-2 million swarm nodes.

At 1600 TPS this would equate 255-1023 full copies of the blockchain hosted in a very diverse, robust and decentralized
way.

-- 7 QUICK SUMMARY--
- 7.1 WHAT SWARM NODES SIGN/VALIDATE -
1. Script validity of script range and chunk range.
2. No withholding claims for chunks.
3. No double spends in script range (based on no withholding signatures by other nodes).
4. Chunk size and total fees.

- 7.2 SWARM NODES ARE DIVIDED BY -
1. Chunk ranges.
2. Script ranges.

- 7.3 HOW TO DO IT -
A first version could be done where all nodes trust each other. This would eliminate the need for blacklists and other code dealing with
trust of peers.

This could then be added later at leisure.
Jump to: