Segwit details? N + 2*numtxids + numvins > N, segwit uses more space than 2MB HF

2112

legendary

Activity: 2128

Merit: 1074

OK, so somebody posted then quickly deleted a follow up to my messages above. I only took a glance before I was interrupted, but the main take-away was that indeed I should clarify what I meant.

So lets roll back to the original Satoshi's transaction design.

There are basically 3 main goals that the transaction format has to fulfill:

1) reference source and destination of funds, as well as amounts
2) cryptographically sign the source references to prove that one has control over them
3) detect (and possibly correct) errors in the transmitted transaction: both intentional (tampering) and unintentional (channel errors)

The original Satoshi's design used single SHA256 hash to cover all tree goals. It was a neat idea to kill 3 birds with one stone. But then it turned out that only 2 birds get killed, the center one is only getting injured. At it has about two lives: low-S and high-S.

So then we start trying to address those 3 main goals using a separate fields in the new transaction format. I'm not really prepared to discuss all the possibilities.

Lets just discuss a possible encoding for single UTxO reference. The current design is an ordered pair of (256 bit transaction id,short integer index of the outputs within that transaction). Lets just also assume that for some reason it becomes extremely important to shorten that reference (e.g. transferring transactions with a QR code or some other ultra-low-power-and-bandwidth radio technology).

It may turn out that the better globally unique encoding is an ordered pair (short integer block number in the blockchain, short integer index to the preorder traversal of the Merkle tree of transactions and their outputs). It may be acceptable to be able to refer only to the confirmed transactions in this format.

I'm not trying to advocate the change of the current UTxO reference format. All I'm trying to convey is that there are various ways to achieve the required goals, with various trade-off in their implementation.

Both the original Satoshi's design as well as the current SegWit design suffer from "just-in-time design" syndrome. The choices were made quickly without properly discussing and comparing the alternates. The presumed target environment is only modern high-power high-speed high-temperature 32-bit and 64-bit processors and high bandwidth communication channels.

Around the turn of the century there was a cryptographic protocol called https://en.wikipedia.org/wiki/Secure_Electronic_Transaction . It was deservedly an unmitigated failure. But they did thing right in their design. The original "Theory of operations" SET document did a thorough analysis of design variants:

1) exact bit counts of various representations and encodings
2) estimated clock counts of the operations on the then-current mainstream 32-bit CPUs
3) estimated clock counts of the operations on the then-current 8-bit micro CPUs like GSM SIM cards
4) estimated line and byte counts of the implementation source and object codes
5) range of achievable gains possible by implementing special-purpose hardware cryptographic instructions with various target gate counts.

Again, I'm definitely not advocating anything like SET and its dual signatures. I'm just suggesting spending more time on balancing various trade-off and possible goal of the completed application.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: gmaxwell on March 18, 2016, 04:25:16 AM

Quote from: JorgeStolfi on March 17, 2016, 03:57:16 AM

* Malleability could be fixed by just skipping the signatures (the same data that SegWit would segregate) when computing the txid.

That _is_ segregation of the signatures up to completely non-normative ordering of data transferred. Segwit could just as well order the data into the same place in the serialized transactions when sending them, but its cleaner to not do so.

The "cleaner" part is true only to subset of people: those that were actually considering the original Satoshi's design as "ideal" or "perfect".

I personally think that the original design where "transaction hash" is both a "transaction identifier" and "transaction checksum" as a sort of a "neat hack".

Edit:

Quote from: gmaxwell on March 16, 2016, 08:44:50 PM

Quote from: 2112 on March 16, 2016, 12:35:49 PM

What about fixing those "other problems" (I don't want to say "hard", because IMO they aren't "hard" by themselves) without the segregation? Impossible or just not worth it?

A strong malleability fix _requires_ segregation of signatures.

A less strong fix could be achieved without it if generality is abandoned (e.g. only works for a subset of script types, rather than all without question) and a new cryptographic signature system (something that provides unique signatures, not ECC signatures) was deployed.

And even with giving up on fixing malleability for most smart contracts, it's very challenging to be absolutely sure that a specific instance is actually non-malleable. This can be seen in the history of BIP62-- where at several points it was believed that it addressed all forms of malleability for the subset of transactions it attempted to fix, only to later discover that there were additional forms. If a design is inherently subject to malleability but you hope to fix it by disallowing all but one possible representation there is a near endless source of ways to get it wrong.

Segregation removes that problem. Segwitness using scripts achieve a strong base level of non-malleability without doubt or risk of getting it wrong, both in design and by script authors. And only segregation applies to all scripts, not just a careful subset of "inherently non-malleable rules".

Getting signatures out from under TXIDs is the natural design to prevent problems from malleability and engineers were lamenting that Bitcoin didn't work that way as far back as 2011/late-2012.

The requirement for segregation is really only for "logical" segregation, not "physical" segregation.

My opinion is that the main point of the contention is that more programmers agree that "logical" (or algebraic) segregation is OK. Only much smaller subset of programmers agree that "physical" segregation (being far away in the serialized bytestream on the wire or on the disk) is the correct way to implement the algebraic segregation.

Edit2:

In addition to the above there is an issue of what is the optimal length of "transaction id" and "witness id". Transaction identifiers have to be globally unique, whereas "witness identifiers" only have to be unique within the block that they refer to. So the optimal length of the witness id could be much lower than 256.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: JorgeStolfi on March 21, 2016, 01:12:56 AM

Quote from: 2112 on March 21, 2016, 12:42:54 AM

You started writing really weird conflated stuff. What do fees have to do with transaction syntax? ... The amount of fees doesn't change the syntax, so doesn't require change of the version.

Sorry, I don't understand your objections.

There are no "meta-rules" that specify what the validity rules can be. They are not limited to "syntax", whatever that means. Any computable predicate on bit strings could in principle be a validity rule, as long as it does not completely break the system.

Right now there are no validiy rules that refer to fees. The minimum fee, like the Pirate Code, "is more what you'd call 'guideline' than actual rule"; each miner decides whether to require it (or even to require more than it). But the minimum could be made into a validity rule. the difference woudl be that each miner would not only impose it on his blocks, but also reject blocks solved by other miners that contain transactions that pay less than that fee.

Quote

The version field should be used to clearly describe syntax rules governing the transaction format.

As I wrote, this cannot be guaranteed. If a fork (rule change) was executed to fix a bug or prevent an attack, the miners cannot continue to use the old rules for transactions that have the old version tag; that would negate the purpose of the fork. They must reject such transactions.

So, it is not safe to retain signed but unconfirmed transactions without broadcasting them.

I'm still unsure why we started talking about fees in this thread. The fees enter the consensus validity rules only when verifying that they aren't negative. The fees have to be positive or zero. The value of fees is only used when priority-sorting the already verified transactions.

Also, I don't believe in the existence of non-fixable bugs in the old rules that "fork (rule change) was executed to fix a bug or prevent an attack, the miners cannot continue to use the old rules for transactions that have the old version tag".

Edit: Getting back to the original argument:

Quote from: 2112 on March 19, 2016, 06:39:45 PM

Quote from: JorgeStolfi on March 19, 2016, 05:48:52 PM

Pre-signed but unbroadcast or unconfirmed transactions seem to be a tough problem.

I disagree on the "tough" part. In my opinion this is less difficult than DOSbox/Wine on Linux or DOS subsystem in Windows 32 (and Itanium editions of Windows 64). It is more of the problem how much energy to spend on scoping the required area of backward compatibility and preparing/verifying test cases.

Perhaps the DOS/Windows argument wasn't the best. The better, but less well known, example would be mainframe disk device drivers. They easily cover old-style devices with interfaces designed in late 1960. The hardware implementations are "frozen" in the sense that nobody changes the relevant hardware logic anymore. It is just a small sub-area in the modern VLSI chips that implements exactly the same logic as the old TTL-style disk interface controller.

Nobody designs or writes an interface that is sprinkled with conditional logic to handle the old protocols (if () then {} else {}). There's one time inquiry to verify the protocol version used and then all operations are handled through indirection (e.g. (*handle_read[versn])(...).

Same idea could be applied to Bitcoin if the version field would be appropriately changed both in blocks and transactions.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: JorgeStolfi on March 20, 2016, 11:55:24 PM

I am not sure if I understood your comment. Miners cannot apply old semantics when the transaction has an old version field, because that field can be faked by the clients to sabotage the change. E.g., suppose that the change imposed a mininum output amount of 0.0001 BTC as a way to reduce spam attacks on the UTXO database. An attacker could frustrate that measure by issuing transactions with the pre-fork version tag. Does that answer your comment?

I don't buy the argument about "frustrating that measure". It is very easy to verify that the "old style" transactions use only "old coins", the coins that were confirmed no later than the effective time of the new transaction format.

Theoretically someone could try to launch the attack using only the "old coins", pretending to have a pre-signed transaction with some rather large n-lock-time. I think that type of attack would be self-extinguishing, it could be launched only once for each "old" UTxO entry.

TooDumbForBitcoin

legendary

Activity: 1638

Merit: 1001

Quote from: amaclin on April 01, 2016, 02:54:05 AM

Quote from: pri3oner on April 01, 2016, 02:47:54 AM

I can't understand it. anyone can explain it to me

How much are you able to pay for this study? Grin

Keep in mind that Amaclin values BTC at $10 each. So if you settle on a $500 fee, he'll want 50 BTC.

amaclin

legendary

Activity: 1260

Merit: 1019

Quote from: pri3oner on April 01, 2016, 02:47:54 AM

I can't understand it. anyone can explain it to me

How much are you able to pay for this study? Grin

pri3oner

full member

Activity: 504

Merit: 100

Bitgesell (BGL) Decentralized Cryptocurrency!

I can't understand it. anyone can explain it to me

CIYAM

legendary

Activity: 1890

Merit: 1086

Ian Knowles - CIYAM Lead Developer

Quote from: jl777 on April 01, 2016, 12:11:47 AM

my question was to point out that since bitcoin is not turing complete, amalcin's claim that it is impossible to verify bitcoin due to halting problem was not making sense.

Apples and oranges - what @amaclin was referring to was the Bitcoin source code itself (not Bitcoin's scripting language).

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: CIYAM on March 31, 2016, 12:48:14 PM

Quote from: jl777 on March 31, 2016, 12:46:57 PM

is uᴉoɔʇᴉq turing complete?

If you supposedly know anything about uᴉoɔʇᴉq then of course you would know the answer to that - wouldn't you?

(btw - are you still using binary floating point for monetary calculations?)

my question was to point out that since bitcoin is not turing complete, amalcin's claim that it is impossible to verify bitcoin due to halting problem was not making sense.

I only use floating point when it is not needing to obtain consensus and where using floating point makes sense to use. I dont dogmatically avoid usage of something all the time just because there are times where it isnt correct to use it.

James

amaclin

legendary

Activity: 1260

Merit: 1019

Quote from: jl777 on March 31, 2016, 12:46:57 PM

is bitcoin turing complete?

What do you mean? Bitcoin script language is not turing complete today.
But we are discussing not about scripts themselves, but a program which can validate script-processor and say is it consensus-compatible (equal to current c++ reference code) or not

Quote from: CIYAM on March 31, 2016, 12:48:14 PM

(btw - are you still using binary floating point for monetary calculations?)

Har du slutat dricka konjak på förmiddagarna?

CIYAM

legendary

Activity: 1890

Merit: 1086

Ian Knowles - CIYAM Lead Developer

Quote from: jl777 on March 31, 2016, 12:46:57 PM

is bitcoin turing complete?

If you supposedly know anything about Bitcoin then of course you would know the answer to that - wouldn't you?

(btw - are you still using binary floating point for monetary calculations?)

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: amaclin on March 31, 2016, 11:15:02 AM

Quote from: 2112 on March 31, 2016, 11:09:54 AM

But it is also possible that you don't understand the difference between the old stopping problem
and the automated logical equivalency verification like the one used by ARM to verify the
implementations of their namesake architectures.

Of course, I understand the difference between turing-complete and non-turing-complete structures.

is bitcoin turing complete?

2112

legendary

Activity: 2128

Merit: 1074

Quote from: amaclin on March 31, 2016, 11:15:02 AM

Of course, I understand the difference between turing-complete and non-turing-complete structures.

Good. For those unfamiliar with that area of science here are good links to start their research:

https://en.wikipedia.org/wiki/High-level_synthesis
https://en.wikipedia.org/wiki/High-level_verification

amaclin

legendary

Activity: 1260

Merit: 1019

Quote from: 2112 on March 31, 2016, 11:09:54 AM

But it is also possible that you don't understand the difference between the old stopping problem
and the automated logical equivalency verification like the one used by ARM to verify the
implementations of their namesake architectures.

Of course, I understand the difference between turing-complete and non-turing-complete structures.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: amaclin on March 31, 2016, 10:46:52 AM

It is not possible to create an algorithm for verifying turing-complete structures.

In other words.
Imagine that you have a tool which produces '1' if the realization matches consensus and '0' if the realization is wrong and can produce hard-forks and earthquakes.
Who is responsible for bugs in this tool? Grin

How would you check it? With another tool?

And the second objection.
We do not need to 'stone' the consensus code. The majority always has a right to change anything in consensus.

From the past experience talking with you I think you are just pretending to be a dumbass. But it is also possible that you don't understand the difference between the old stopping problem and the automated logical equivalency verification like the one used by ARM to verify the implementations of their namesake architectures.

Every CAD/EDA tool vendor has tools to do automated verification and automated test vector generation. The obvious problems are:

1) those tools are closed source
2) those tools are very expensive
3) the input languages are hardware oriented: Verilog & VHDL mostly, with only recent additions of SystemC or similar high-level-synthesis tools.

That isn't even anything new like zk-SNARKs. Those tools were in use and on the market for more than 10 years. I had used some early prototypes of those tools (based on LISP) in school back in 20th century.

amaclin

legendary

Activity: 1260

Merit: 1019

Quote from: 2112 on March 31, 2016, 10:31:47 AM

Of course. Machine verification. It could be even a subset of C++ like SystemC.
But the proper definition of consensus-critical portion should be machine-verifiable.

It is not possible to create an algorithm for verifying turing-complete structures.

In other words.
Imagine that you have a tool which produces '1' if the realization matches consensus and '0' if the realization is wrong and can produce hard-forks and earthquakes.
Who is responsible for bugs in this tool? Grin

How would you check it? With another tool?

And the second objection.
We do not need to 'stone' the consensus code. The majority always has a right to change anything in consensus.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: amaclin on March 31, 2016, 07:06:09 AM

Quote from: JorgeStolfi on March 31, 2016, 05:35:34 AM

Anyone with sufficient knowledge of algorithms and networking should be able to understand how it works without reading the code.
And there should not be "the" code, since the maintainers of that code would be a central authority.

Is there a reason to explain C++ code in any other [human or computer] language?

Of course. Machine verification. It could be even a subset of C++ like SystemC. But the proper definition of consensus-critical portion should be machine-verifiable.

I'm pretty sure that at least gmaxwell understands the importance of that. He's an advocate of zk-SNARKs and those have a synthesis of logic circuit equivalent to the given program as one of the intermediate steps.

The zk-SNARK people in Israel designed and implemented some subset of C (not C++) to facilitate logic synthesis. It is a key step to machine verification of the code.

Jet Cash

legendary

Activity: 2828

Merit: 2472

https://JetCash.com

Have any conclusions about SegWit come out of this thread?

amaclin

legendary

Activity: 1260

Merit: 1019

Quote from: JorgeStolfi on March 31, 2016, 05:35:34 AM

Anyone with sufficient knowledge of algorithms and networking should be able to understand how it works without reading the code.
And there should not be "the" code, since the maintainers of that code would be a central authority.

Is there a reason to explain C++ code in any other [human or computer] language?

CIYAM

legendary

Activity: 1890

Merit: 1086

Ian Knowles - CIYAM Lead Developer

Quote from: JorgeStolfi on March 31, 2016, 05:35:34 AM

I don't plan to read the code, and I should not have to. The payment system of the world cannot be defined by a (messy) program. Bitcoin should be an implementation-independent protocol, like SMTP and HTML. Anyone with sufficient knowledge of algorithms and networking should be able to understand how it works without reading the code.

That is something that even Mike Hearn knew is *simply impossible* for Bitcoin (I remember a topic about this from around 3 years ago in which I had actually argued for the creation of such a protocol specification and he convinced me otherwise).

It is fine for something like SMTP to have implementation faults as about the worst thing that might happen is you miss an email - but if you make any implementation fault with Bitcoin then people will lose money.

The C++ code IS the specification and if you refuse to read it then I'm sorry but you are NEVER going to understand Bitcoin (and it isn't up to us to educate you).

There are non-node implementations written in other languages so maybe you might be able to read the code of one of those (am guessing C++ is perhaps too difficult for you) but you need to understand that only "libconsensus" (written in C++) holds the key to Bitcoin (other language implementations are not used for mining but only for wallets).

The very language C++ itself would need to form a part of any such specification (as every single little C++ nuance is relevant) which would make any such document ridiculously large (which is why you are never going to see that).

You could of course take a look at Mastering Bitcoin (which I believe can even be found online).

Topic: Segwit details? N + 2*numtxids + numvins > N, segwit uses more space than 2MB HF (Read 21433 times)