Using OP_CAT - page 2. | Bitcointalksearch.org

Taras

legendary

Activity: 1386

Merit: 1053

Please do not PM me loan requests!

Well guys, I broke theoretical bitcoin. My lack of relevant knowledge has theoretically doomed us all.
In all seriousness (not that breaking theoretical bitcoin isn't) the whole take-down-the-network-in-one-transaction is scary as shit. I'd love to be able to use string functions, but I'd rather not advocate risking the network for some silly scriptsigs Tongue

gmaxwell

staff

Activity: 4284

Merit: 8808

Typed data on the stack makes writing correct code much harder, I can't say that I've ever wished for that. I general prefer the stack be "bytes" and everything "converts" them to the right type. Yes, additional constraints would make things like your provably undependable code easier, but they do so by adding more corner cases that an implementation must get right.

I'm also a fan of analyizability, but that always has to be second seat to consensus safeness.

andytoshi

full member

Activity: 179

Merit: 151

-

Quote from: Taras on September 03, 2014, 05:53:28 PM

Well, there can only be one OP_CHECKSIG...

That is false. You even could do threshold signatures with multiple OP_CHECKSIGs if you wanted to be a goof.

Quote

This requires setting a size for each data type. I think it is basically integer, byte array and boolean (which is an int).

Script is not typed; there is only one type, "raw byte data", that is interpreted in various ways by the various opcodes. (This makes accounting quite easy actually.) And today you are required to match the byte representation of all stack objects exactly, since OP_EQUAL requires it, so arguably a total stack size limit would be an easy thing to describe precisely.

My biggest metawish for a script 2.0 would be ease of analysis.... in particular I would like separate types (uint, bool, bytedata) and explicit casts between them. I spent quite a bit of time working on script satisfiability analysis recently, and it seems the best way to describe abstract stack elements is as a bundle of complementary bounds on numeric values, boolean values, length, etc.

Bitcoin-ruby uses a typed script and has each opcode do casts.. the idea makes me smile but for consensus code it is really not appropriate, sadly. They have plans one day to replace it with a more-or-less direct port of bitcoind's script parser.

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: Taras on September 03, 2014, 05:53:28 PM

Well, there can only be one OP_CHECKSIG...

Thats not true.

Quote

Why not make that kind of limit for OP_CAT?
^{^{All the string functions, in fact, should be enabled (even if they are "expensive words" like checksig)}}
What if there was a minimum base transaction fee (rendering a tx with an insufficient base fee invalid) that would be incremented by a certain amount for every OP_CAT in the transaction?

No one is saying that things like OP_CAT cannot be done, or that they're bad or whatever. But making them not a danger requires careful work. Case in point: What you're suggesting is obviously broken. So I write a transaction which pays 100x that (presumably nominal fee) and I crash _EVERY BITCOIN SYSTEM ON THE NETWORK_, and I don't really have to pay the fee at all because a transaction needing a zillion yottabytes of ram to verify will not be mined, so I'll be free to spend it later. Congrats, you added a severe whole network crashing vulnerability to hypothetical-bitcoin.

You should also remove "enabled" from your dictionary, that those opcodes were "disabled" doesn't mean they can just be enabled. They're completely gone— adding them is precisely equivalent to adding something totally novel in terms of the required deployment procedure.

Taras

legendary

Activity: 1386

Merit: 1053

Please do not PM me loan requests!

Well, there can only be one OP_CHECKSIG...
Why not make that kind of limit for OP_CAT?
^{^{All the string functions, in fact, should be enabled (even if they are "expensive words" like checksig)}}
What if there was a minimum base transaction fee (rendering a tx with an insufficient base fee invalid) that would be incremented by a certain amount for every OP_CAT in the transaction?

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: TierNolan on September 03, 2014, 04:26:30 PM

Set a maximum total memory for the stack and a script that exceeds that value automatically fails.

Sure, but this requires: a consistent way of measuring it and enforcing it, and being sure that no operation has unlimited intermediate state.

As Bitcoin was originally written it was thought that it had precisely that: There was a limit on the number of pushes, and a limit on the number of operations. This very clearly makes the stack size "limited", but because of operations that allow exponential growth, the limit wasn't useless. Being absolutely sure that the limits imposed are effective isn't hard for any fundamental reason, as I keep pointing out. "Just have a limit", but being _sure_ that the limit does what you expect is much harder than it seems.

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: andytoshi on September 03, 2014, 03:53:37 PM

And then I thought, "Why did I do that? To stop stack size explosions, I guess...but did I stop that? No idea." because I would have to analyze the interactions with every single other part of the script system, and neither I nor anybody else has a complete model there. (Though I *think* I have a complete model of how everything affects the size of stack objects, at least right now.) So that's where the risk comes from.

Why indirectly solve the problem? The problem is limited stack size, so why not just directly solve it?

Set a maximum total memory for the stack and a script that exceeds that value automatically fails.

This requires setting a size for each data type. I think it is basically integer, byte array and boolean (which is an int).

A slightly less direct method is to focus on the opcodes that can increase the stack size. Even then, it is still really a global limit.

andytoshi

full member

Activity: 179

Merit: 151

-

I'd add that the blocksize and scriptsize limits are easy to check -- you just have to look at the data on the wire, not understand it at all. Any limit on script stack objects is going to be nasty: script is an intricate machine with many, many weird corner cases that interact in surprising (at least) ways.

Quote from: gmaxwell

Point here was that realizing you _needed_ a limit and where you needed it was a risk.

This is probably the most important point. I had initially typed up a long reply explaining what needs to be done for OP_PUSH or OP_CAT limits (which are conceptually not so bad, but you have to be exhaustive and expect all implementors to be identically exhaustive). And then I thought, "Why did I do that? To stop stack size explosions, I guess...but did I stop that? No idea." because I would have to analyze the interactions with every single other part of the script system, and neither I nor anybody else has a complete model there. (Though I *think* I have a complete model of how everything affects the size of stack objects, at least right now.) So that's where the risk comes from.

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: jl2012 on September 03, 2014, 04:31:15 AM

I'm not a programmer so this may sound very stupid:
[...]
Max OP_CAT output size = 520 bytes: why risky?
I mean, is there any fundamental difference between these cases?

All the limits are risks, all complexity— practically every one of them has been implemented incorrectly by one alternative full node implementation or another (or Bitcoin core itself) at some point. They miss them completely, or count wrong for them, or respond incorrectly when they're violated. E.g. here what happens if you OP_CAT 520 and 10 bytes? Should the verify fail? Should the result be truncated? But even that wasn't the point here.

Point here was that realizing you _needed_ a limit and where you needed it was a risk. The reasonable and pedantically correct claim was made that OP_CAT didn't increase memory usage, that it just took two elements and replaced them with one which was just as large as the two... and yet having (unfiltered) OP_CAT in the instruction set bypasses the existing limits and allowed exponential memory usage.

None of it insurmountable, but I was answering the question as to why it's not just something super trivial.

jl2012

legendary

Activity: 1792

Merit: 1111

Quote from: gmaxwell on September 01, 2014, 10:30:40 PM

When behavior like this is fixed via limits great care must be taken to make sure the limits are implemented absolutely consistently everywhere or the result is a consensus splitting risk. Alt full node implementers have repeatedly implemented the limits wrong— even when they're obvious in the code, called out in the comments, documented on the wiki, etc... even by just simply not implementing them (... coverage analysis and testing against the blockchain can't tell you about limits that you're just missing completely).

I'm not a programmer so this may sound very stupid:

MAX_BLOCK_SIZE = 1MB: not risky(?)
Max push size = 520 bytes: not risky(?)
Max script size = 10000 bytes: not risky(?)
Max OP_CAT output size = 520 bytes: why risky?

I mean, is there any fundamental difference between these cases?

gmaxwell

staff

Activity: 4284

Merit: 8808

Sure sure, as I said about it's not hard in theory, but theres the lesson— even after I pointed out there was exponential memory usage in a naive implementation the OP thought otherwise. And before anyone else trips over your egos, it's not that the OP was foolish or anything. There are subtle interactions in the dark corners which make making promises about the behavior difficult. So while the actual safe behavior isn't fundamentally hard, being confident that all the corner cases and interactions are handled is fundamentally hard.

OP_CAT isn't the only "disabled" opcode with those properties.... e.g. multiplying also does it.

When behavior like this is fixed via limits great care must be taken to make sure the limits are implemented absolutely consistently everywhere or the result is a consensus splitting risk. Alt full node implementers have repeatedly implemented the limits wrong— even when they're obvious in the code, called out in the comments, documented on the wiki, etc... even by just simply not implementing them (... coverage analysis and testing against the blockchain can't tell you about limits that you're just missing completely).

Going back to the OP's question. I'm not seeing how OP_CAT (at least by itself) facilitates any of the high level examples there. Can you give me a specific protocol and set of scripts to show me how it would work?

TierNolan

legendary

Activity: 1232

Merit: 1094

Quote from: jl2012 on September 01, 2014, 09:57:27 AM

I can't get it. If we could overflow the output of OP_ADD, why couldn't we do the same for OP_CAT?

OP_CAT allows exponential memory usage.

OP_DUP means duplicate the top item on the stack.

Assume the stack contains a single entry A (length 32 bytes)

Code:

Stack: A  (length=32)

OP_DUP

Stack: A A

OP_CAT

Stack: AA (length=64)

OP_DUP

Stack: AA AA 

OP_CAT

Stack: AAAA (length=128)

OP_DUP

Stack: AAAA AAAA

OP_CAT

Stack: AAAAAAAA (length=256)

Each OP_DUP, OP_CAT call doubles the size of the stack. After 10 calls, 32 bytes would become 32kB. After 10 more, it would be 32MB and 10 more would be 32GB and so on (32TB after 10 more).

An easy fix would be to limit the memory size of the inputs into OP_CAT.

sukhi

newbie

Activity: 12

Merit: 0

Quote from: jl2012 on September 01, 2014, 11:34:45 AM

I surely know what's exponential growth

I was responding to:

Quote from: jl2012 on September 01, 2014, 09:57:27 AM

I can't get it. If we could overflow the output of OP_ADD, why couldn't we do the same for OP_CAT?

...pointing out difference between the two overflows.

Quote from: jl2012 on September 01, 2014, 11:34:45 AM

But the growth could be easily shut down by limiting the length of the output. For example, the script will stop and fail if the output of OP_CAT is bigger than 520bytes (which is also the upper limit of OP_PUSHDATA2)

We established that already:

Quote from: sukhi on September 01, 2014, 10:41:28 AM

Having a maximum output length for the concat would, I guess, solves it.

Quote from: gmaxwell on August 31, 2014, 05:50:41 PM

It isn't available in testnet either. It it's just a question of "enabling it"— you have to prevent it from being a memory exhaustion attack via exponential growth. (this isn't theoretically hard but it would, you know, require a little bit of work).

(my emphasis)

There is also at least one alternative to concatenation, one being:

Quote from: sukhi on September 01, 2014, 10:41:28 AM

And how about substr? One could specify the already concatenated string(s) and, if needed, numerical values for the separators. That would do the job too.

Can you think of any alternative that doesn't require changing the software? (or any software what wouldn't require to be changed?)

jl2012

legendary

Activity: 1792

Merit: 1111

Quote from: sukhi on September 01, 2014, 11:06:23 AM

Quote from: jl2012 on September 01, 2014, 09:57:27 AM

Quote from: gmaxwell on September 01, 2014, 07:56:22 AM

Quote from: andytoshi on September 01, 2014, 06:44:48 AM

Quote from: sukhi on August 31, 2014, 11:17:06 PM

As for the expodential groth, I read somewhere something like: Replacing two inputs from the stack with one output that is only as long as the two inputs together...

Use OP_DUP and OP_CAT in succession, and you will double the size of your (single) input.

To complete the lesson, for those who never liked homework: With a 201 cycle limit, OP_CAT lets you use approximately 534,773,760 YiB memory, vs 102510 bytes without it.

Quote

is unlikely to exhaust the memory. And I agreed very much.

And maybe you will realize why all these altcoins worry me so? Or perhaps you've got cheaper sources of ram than I do?

I can't get it. If we could overflow the output of OP_ADD, why couldn't we do the same for OP_CAT?

There is a big difference between overflowing a numerical value and overflowing memory.

Say you have an 8bit unsigned int, then (200+100) would overflow because the highest value is 255. The result is then (200+100-256)=44, with a carry bit, or overflow. In that context, it just means that you passed the maximum numerical value which resets to zero.

Concatenation of strings coupled with duplication would causes the memory (not a number) to overflow, and it could look like this:
abc; OP_DUP;
abc abc; OP_CAT; abcabc; OP_DUP;
abcabc abcabc; OP_CAT; abcabcabcabc; OP_DUP;
abcabcabcabc abcabcabcabc; OP_CAT; abcabcabcabcabcabcabcabc; OP_DUP;
abcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabc; OP_CAT; abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc; OP_DUP;
...and after so long, that would cause a memory overflow, which would result in a CPU fart.

You may read www.redefiningthesacred.com/math4.html which illustrates the power of exponential growth.

I surely know what's exponential growth

But the growth could be easily shut down by limiting the length of the output. For example, the script will stop and fail if the output of OP_CAT is bigger than 520bytes (which is also the upper limit of OP_PUSHDATA2)

sukhi

newbie

Activity: 12

Merit: 0

Quote from: jl2012 on September 01, 2014, 09:57:27 AM

Quote from: gmaxwell on September 01, 2014, 07:56:22 AM

Quote from: andytoshi on September 01, 2014, 06:44:48 AM

Quote from: sukhi on August 31, 2014, 11:17:06 PM

As for the expodential groth, I read somewhere something like: Replacing two inputs from the stack with one output that is only as long as the two inputs together...

Use OP_DUP and OP_CAT in succession, and you will double the size of your (single) input.

To complete the lesson, for those who never liked homework: With a 201 cycle limit, OP_CAT lets you use approximately 534,773,760 YiB memory, vs 102510 bytes without it.

Quote

is unlikely to exhaust the memory. And I agreed very much.

And maybe you will realize why all these altcoins worry me so? Or perhaps you've got cheaper sources of ram than I do?

I can't get it. If we could overflow the output of OP_ADD, why couldn't we do the same for OP_CAT?

There is a big difference between overflowing a numerical value and overflowing memory.

Say you have an 8bit unsigned int, then (200+100) would overflow because the highest value is 255. The result is then (200+100-256)=44, with a carry bit, or overflow. In that context, it just means that you passed the maximum numerical value which resets to zero.

Concatenation of strings coupled with duplication would causes the memory (not a number) to overflow, and it could look like this:
abc; OP_DUP;
abc abc; OP_CAT; abcabc; OP_DUP;
abcabc abcabc; OP_CAT; abcabcabcabc; OP_DUP;
abcabcabcabc abcabcabcabc; OP_CAT; abcabcabcabcabcabcabcabc; OP_DUP;
abcabcabcabcabcabcabcabc abcabcabcabcabcabcabcabc; OP_CAT; abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc; OP_DUP;
...and after so long, that would cause a memory overflow, which would result in a CPU fart.

You may read www.redefiningthesacred.com/math4.html which illustrates the power of exponential growth.

sukhi

newbie

Activity: 12

Merit: 0

Quote from: gmaxwell on September 01, 2014, 07:56:22 AM

To complete the lesson, for those who never liked homework: With a 201 cycle limit, OP_CAT lets you use approximately 534,773,760 YiB memory, vs 102510 bytes without it.

Quote

is unlikely to exhaust the memory. And I agreed very much.

And maybe you will realize why all these altcoins worry me so? Or perhaps you've got cheaper sources of ram than I do?

Touché. Actually, going for a new altcoin... worries me too, which is why I here to try figure is something else out there (something that already exists) would allow the type of conditional transaction validation I am talking about.

Having a maximum output length for the concat would, I guess, solves it.

And how about substr? One could specify the already concatenated string(s) and, if needed, numerical values for the separators. That would do the job too.

Quote from: gmaxwell on August 31, 2014, 05:50:41 PM

Care to describe your protocol some? it turns out that a lot of things are possible with a bit of transformation.

That's the point. Maybe bitcoin already has something implemented that allows for what I am trying to do. Not only that, but what I am trying to do might have already been implemented and I just wasn't successful at finding it.

I have no hope in having any more opcodes enabled in the mainnet, and this threat is not about enabling it; it's about finding the tool(s) to achieve the goal. Is there another net that allows not necessarily what I think I need (op_cat or op_substr) but at least tools that would allows the type verification I described? I very much wish that the best answer is NOT an altcoin.

jl2012

legendary

Activity: 1792

Merit: 1111

Quote from: gmaxwell on September 01, 2014, 07:56:22 AM

Quote from: andytoshi on September 01, 2014, 06:44:48 AM

Quote from: sukhi on August 31, 2014, 11:17:06 PM

As for the expodential groth, I read somewhere something like: Replacing two inputs from the stack with one output that is only as long as the two inputs together...

Use OP_DUP and OP_CAT in succession, and you will double the size of your (single) input.

To complete the lesson, for those who never liked homework: With a 201 cycle limit, OP_CAT lets you use approximately 534,773,760 YiB memory, vs 102510 bytes without it.

Quote

is unlikely to exhaust the memory. And I agreed very much.

And maybe you will realize why all these altcoins worry me so? Or perhaps you've got cheaper sources of ram than I do?

I can't get it. If we could overflow the output of OP_ADD, why couldn't we do the same for OP_CAT?

amaclin

legendary

Activity: 1260

Merit: 1019

It is possible to set a limit of using OP_CAT. For example, two such operations per script.
I think that original poster can do it himself and play with testnet-in-a-box (or even bitcoin fork!) with extended capabilities.

Later he would share his results with bitcoin community and everyone will receive an answer for what OP_CAT should be included in mainnet.

gmaxwell

staff

Activity: 4284

Merit: 8808

Quote from: andytoshi on September 01, 2014, 06:44:48 AM

Quote from: sukhi on August 31, 2014, 11:17:06 PM

As for the expodential groth, I read somewhere something like: Replacing two inputs from the stack with one output that is only as long as the two inputs together...

Use OP_DUP and OP_CAT in succession, and you will double the size of your (single) input.

To complete the lesson, for those who never liked homework: With a 201 cycle limit, OP_CAT lets you use approximately 534,773,760 YiB memory, vs 102510 bytes without it.

Quote

is unlikely to exhaust the memory. And I agreed very much.

And maybe you will realize why all these altcoins worry me so? Or perhaps you've got cheaper sources of ram than I do?

andytoshi

full member

Activity: 179

Merit: 151

-

Quote from: sukhi on August 31, 2014, 11:17:06 PM

I'm trying to look into NXT... and I have a hard time.

Don't waste your energy. They claim to have solved various extremely difficult problems without even acknowledging that they are problems, they are closed-source and they pay a lot of shills. There is not and never has been any evidence of technical innovation from them.

Quote from: gmaxwell on August 31, 2014, 05:50:41 PM

It isn't available in testnet either. It it's just a question of "enabling it"— you have to prevent it from being a memory exhaustion attack via exponential growth. (this isn't theoretically hard but it would, you know, require a little bit of work).

Quote from: Peter Todd on January 30, 2013, 01:36:31 PM

They are enabled already on Testnet so you can try your experiments there.

I think Peter misunderstood the question he was answering and meant "nonstandard transactions are standard on Testnet". Either that or he was just wrong

.

Quote

As for the expodential groth, I read somewhere something like: Replacing two inputs from the stack with one output that is only as long as the two inputs together... is unlikely to exhaust the memory. And I agreed very much.

Use OP_DUP and OP_CAT in succession, and you will double the size of your (single) input.

Topic: Using OP_CAT - page 2. (Read 4570 times)