Some question about OP_RETURN | Bitcointalksearch.org

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

Thank you everyone for answering my

, i'll keep this thread open in case someone know open-source/self-hosted blockchain explorer which can index and search OP_RETURN data.

Quote from: Coding Enthusiast on January 21, 2021, 11:35:44 AM

Quote from: ABCbits on January 21, 2021, 06:50:57 AM

How it's possible if it's 0 bytes (which means we don't specify the size of the data)? Are there any example?

Each data (byte array) that is to be pushed to the stack (in the scripts) is preceded by its length. For example if you want to push bytes {1,2,3} the script you use the length that is 3 followed by those bytes: 0x03010203.
If the the data can be converted to one of the predefined constants then that constant should (in context of standard rules not consensus rules) be used instead.
For example if you want to push {5} then we have a constant for it called OP_5 which should be used instead. Meaning instead of using 0x0105 we use 0x55 (which is equal to OP_5). That way there is no push length anymore since another OP code was used.

That makes sense, although it'll confuse people since there are many ways to push data.

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

Quote from: vjudeu on January 20, 2021, 08:34:18 AM

Quote

I just remember there are variation of UTF-8 from 1 to 4 bytes, do you mean 1 byte UTF-8?

UTF-8 has eight in its name for a reason (at least one byte per Unicode code point), this is not UTF-16 (at least 2 bytes per code point) or UTF-32 (always 4 bytes per code point). But OP_RETURN can contain any arbitrary hex data, it could as well be just SHA-256 of something and be meaningful only for the transaction creator.

Quote from: Coding Enthusiast on January 20, 2021, 12:41:36 PM

Quote

I just remember there are variation of UTF-8 from 1 to 4 bytes, do you mean 1 byte UTF-8?

I don't know much about UTF-8 but I don't think there is such thing as "1 byte UTF-8".
AFAIK the way this encoding works is that you read one byte at a time and decide based on that byte how many more bytes you need to create the first character.
For example if the first byte is 0xxxxxxx (in binary where 0 is zero and x can be 1 or 0) then that byte is the character itself.
If the first byte is 110xxxxx (in binary) then you have to read another byte and the two bytes represent the character (the second byte also has to use 10xxxxxx format).
Similarly 1110xxxx for 3 bytes and 11110xxx for 4.
That means you can end up reading 4 bytes from the stream to be able to represent 1 character (0xF0, 0x90, 0x8D, 0x88 -> 𐍈 Hwair since the binary is 11110000 10010000 10001101 10001000)
You can check the Wikipedia link https://en.wikipedia.org/wiki/UTF-8#Encoding

Or here is the .net source code in C# where it converts the bytes to string when it can't be mapped 1 byte to ASCII chars: https://source.dot.net/#System.Private.CoreLib/Utf8Utility.Validation.cs,250

Thanks for the clarification, i was confused with various terminology.

Quote from: Coding Enthusiast on January 20, 2021, 12:41:36 PM

Quote from: ?? on ??

So 2 bytes indicate the size of the data?

It is "up to" 2 bytes. It can be 0, 1 or 2.

How it's possible if it's 0 bytes (which means we don't specify the size of the data)? Are there any example?

Coding Enthusiast

legendary

Activity: 1044

Merit: 2826

Bitcoin and C♯ Enthusiast

Quote from: ABCbits on January 21, 2021, 06:50:57 AM

How it's possible if it's 0 bytes (which means we don't specify the size of the data)? Are there any example?

Each data (byte array) that is to be pushed to the stack (in the scripts) is preceded by its length. For example if you want to push bytes {1,2,3} the script you use the length that is 3 followed by those bytes: 0x03010203.
If the the data can be converted to one of the predefined constants then that constant should (in context of standard rules not consensus rules) be used instead.
For example if you want to push {5} then we have a constant for it called OP_5 which should be used instead. Meaning instead of using 0x0105 we use 0x55 (which is equal to OP_5). That way there is no push length anymore since another OP code was used.

Coding Enthusiast

legendary

Activity: 1044

Merit: 2826

Bitcoin and C♯ Enthusiast

Quote from: NotATether on January 20, 2021, 01:42:25 PM

How did the protocol end up defining more opcodes that have successively larger bytes limits?

AFAIK these OP codes existed from the beginning and weren't added later. In fact there are a couple of ways that bitcoin encodes numbers (CompactInts for instance) and they all support huge values.

Quote

it would have been less work to simply define OP_PUSHDATA4 and it's 64K+ bytes support when the desire for larger amounts of arbitrary data in scripts became known.

It is "cleaner" to have all data push OPs in one place without any space between them. If we wanted to add that in the future it would have been >= 0xba instead of right after 0x4d (OP_PushData2). Besides we already have 70 free unassigned OPs (from 256 possible ones in one byte) with a bunch of NOPs and some removed/disabled OPs so assigning a never used OP code such as OP_PushData4 is not really a big deal.

NotATether

legendary

Activity: 1568

Merit: 6660

bitcoincleanup.com / bitmixlist.org

Quote from: ABCbits on January 19, 2021, 08:15:19 AM

6. Blockexplorer can decode OP_RETURN output to text, do they decode HEX to ASCII or UTF-8?

When you have a string of Unicode text it can be represented in bytes using several different encodings. The encoding that the program creates bytes is the only one that will produce the original characters from the bytes. All other encodings will just produce gibberish. In particular, if you encode something in UTF-8 and try to decide it in ASCII, there will be a ton of gibberish characters.

As already mentioned, OP_RETURN does not deal with encodings or "characters" in the traditional sense of the word, only raw bytes.

Nobody uses UTF-16 anymore thankfully, it's now an ancient relic of the past.

Quote from: Coding Enthusiast on January 20, 2021, 12:41:36 PM

* Any bytes with lengths smaller than 0x4c (76) is pushed with 1 byte equal to the size (byte[10] -> 10 + byte[10]; byte[70] -> 70 + byte[70])
* Any bytes bigger than or equal to 0x4c is pushed by using 0x4c (ie. OP_PUSHDATA) followed by the length followed by the data (byte[80] -> OP_PUSHDATA + 80 + byte[80])
* Any bytes with length bigger than 255 uses 0x4d (OP_PUSHDATA2)
* Any bytes with length bigger than 65535 (0xffff) uses 0x4e (OP_PUSHDATA4)

How did the protocol end up defining more opcodes that have successively larger bytes limits? I don't know of the chain of events that lead to all of these being implemented at different times but in hindsight it would have been less work to simply define OP_PUSHDATA4 and it's 64K+ bytes support when the desire for larger amounts of arbitrary data in scripts became known.

Coding Enthusiast

legendary

Activity: 1044

Merit: 2826

Bitcoin and C♯ Enthusiast

Quote from: ?? on ??

So 2 bytes indicate the size of the data?

It is "up to" 2 bytes. It can be 0, 1 or 2.
For example it could be OP_Return OP_2 (which passes IsPushOnly()).

Quote from: Coding Enthusiast on January 19, 2021, 09:09:57 AM

Basically as impossible as sending bitcoin through 0-conf transaction, that means anyone who make transaction with multiple OP_RETURN or OP_RETURN where the data is more than 80 bytes must contact pool/miner and hope they agree to include your transaction.

Yes.

Quote

I just remember there are variation of UTF-8 from 1 to 4 bytes, do you mean 1 byte UTF-8?

I don't know much about UTF-8 but I don't think there is such thing as "1 byte UTF-8".
AFAIK the way this encoding works is that you read one byte at a time and decide based on that byte how many more bytes you need to create the first character.
For example if the first byte is 0xxxxxxx (in binary where 0 is zero and x can be 1 or 0) then that byte is the character itself.
If the first byte is 110xxxxx (in binary) then you have to read another byte and the two bytes represent the character (the second byte also has to use 10xxxxxx format).
Similarly 1110xxxx for 3 bytes and 11110xxx for 4.
That means you can end up reading 4 bytes from the stream to be able to represent 1 character (0xF0, 0x90, 0x8D, 0x88 -> 𐍈 Hwair since the binary is 11110000 10010000 10001101 10001000)
You can check the Wikipedia link https://en.wikipedia.org/wiki/UTF-8#Encoding

Or here is the .net source code in C# where it converts the bytes to string when it can't be mapped 1 byte to ASCII chars: https://source.dot.net/#System.Private.CoreLib/Utf8Utility.Validation.cs,250

vjudeu

copper member

Activity: 909

Merit: 2314

Quote

I just remember there are variation of UTF-8 from 1 to 4 bytes, do you mean 1 byte UTF-8?

UTF-8 has eight in its name for a reason (at least one byte per Unicode code point), this is not UTF-16 (at least 2 bytes per code point) or UTF-32 (always 4 bytes per code point). But OP_RETURN can contain any arbitrary hex data, it could as well be just SHA-256 of something and be meaningful only for the transaction creator.

Skeletron

newbie

Activity: 1

Merit: 2

Quote

1. OP_RETURN have 3 bytes overhead, is it usually used by sidechain/P2P protocol?

Protocols just mean the first 3 bytes of the OP_RETURN data, which can indicate the protocol but it's not an enfoced rule by the Bitcoin consensus.
Omni Layer (#6f6d6e) is responsible of about 40% of all OP_RETURN´s , check : https://opreturn.org/op-return-protocols/
OP_RETURN basic structure is : 6a

Quote

2.OP_RETURN size limit is 80 bytes, is it excluding 3 bytes overhead?
3. Does OP_RETURN size limit enforced on node level, just like minrelayfee?
4. Can you make transaction with multiple OP_RETURN? If yes, are there any limit? If no, is it because it's considered as non-standard script?

Quote

(3)Bitcoin Core 0.12.0 defaults to relaying and mining null data outputs with up to 83 bytes with any number of data pushes, provided the total byte limit is not exceeded. (4) There must still only be a single null data output and it must still pay exactly 0 satoshis.

The -datacarriersize Bitcoin Core configuration option allows you to set the maximum number of bytes in null data outputs that you will relay or mine

check this famous transaction https://www.blockchain.com/btc/tx/d29c9c0e8e4d2a9790922af73f0b8d51f0bd4bb19940d9cf910ead8fbe85bc9b

Quote

6.Blockexplorer can decode OP_RETURN output to text, do they decode HEX to ASCII or UTF-8?

The data embeded must be in HEX encode , UTF-8 encodes Unicode characters into a sequence of 8-bit bytes , ASCII includes 128 character codes and for characters represented by the 7-bit ASCII character codes, the UTF-8 representation is exactly equivalent to ASCII

Coding Enthusiast

legendary

Activity: 1044

Merit: 2826

Bitcoin and C♯ Enthusiast

2&3. Size limit is a standard rule and it is 83 bytes. 80 bytes of data, 1 byte the OP_Return itself and 2 bytes for push data byte(s).
https://github.com/bitcoin/bitcoin/blob/bd6af53e1f8ec9d25cedf0bf36c98b99a8d88774/src/script/standard.h#L96-L99
There is no limit in the protocol for output script size or even validity including OP_Return outputs. Here is a transaction on testnet with 93 byte OP_Return e598de457ebda1638730062efddd9a45d2bc14efb20a5acc83771eed0799f3f59
Here is another one with 542 byte size 7d2922f0b0ee315e7fd0a5f2ba702ee7bfc613cca3dd23cebe52c5aa45609d6c
And another one with 10004 byte size 9b333e6043490472f8e4351008acdd3d16ffce7a90043dc9ebcf1b1739a7cc20
(I have more test vectors with invalid scripts if you're interested)

4. As far as the protocol is concerned you can include as many OP_Returns as you like since they are just output scripts and there is no limit on txout count (except the block size).
As far as standard rules are concerned any transaction (obviously except coinbase) is limited to only one NULL_DATA aka OP_Return output.
https://github.com/bitcoin/bitcoin/blob/4a540683ec40393d6369da1a9e02e45614db936d/src/policy/policy.cpp#L132-L136

6. It depends on the block explorer, but I usually see them use UTF8.

ABCbits

legendary

Activity: 2870

Merit: 7490

Crypto Swap Exchange

1. OP_RETURN have 3 bytes overhead, is it usually used by sidechain/P2P protocol?
2. OP_RETURN size limit is 80 bytes, is it excluding 3 bytes overhead?
3. Does OP_RETURN size limit enforced on node level, just like minrelayfee?
4. Can you make transaction with multiple OP_RETURN? If yes, are there any limit? If no, is it because it's considered as non-standard script?
5. I know few online blockexplorer can search OP_RETURN output, but are there any open-source/self-hosted blockchain explorer for it?
6. Blockexplorer can decode OP_RETURN output to text, do they decode HEX to ASCII or UTF-8?

Topic: Some question about OP_RETURN (Read 273 times)