Is it possible to keep non-transaction data out of a blockchain?

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: hhanh00 on January 11, 2015, 01:04:20 AM

x = SHA-256(offchain data)?
So you can refer to arbitrary size data, right?

Sure.

hhanh00

sr. member

Activity: 467

Merit: 267

x = SHA-256(offchain data)?
So you can refer to arbitrary size data, right?

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: TimS on January 10, 2015, 11:28:28 PM

So I'll encode my data in the transaction amounts

Again, you're limited to a fairly small number of bits at considerable costs, even without any additional artificial limitations.

Quote

can link to a larger amount of data: e.g. 80 bits can be an .onion address

Great; and thats outside of the blockchain, which was the goal of the OP's post.

TimS

sr. member

Activity: 250

Merit: 253

Quote from: Sukrim on January 10, 2015, 08:50:35 PM

There might be a way to only allow transactions to public keys and include a clause that unspent transactions can be re-distributed after e.g. a year or so. This way, even if someone spams the chain, it is not really permanent but will be resolved after a fixed time period.

This doesn't stand in the way of storing data in the blockchain at all: you just send 0 or 1 satoshi to a dummy public key. You don't care whether that's locked up forever or can be reclaimed eventually: the data is still in the blockchain (and, for a year, in UTXO).

Quote from: Sukrim on January 10, 2015, 08:50:35 PM

Another way might be to decouple account management and transactions - this way you might have a block chain that only deals with transfers between accounts that would have very tiny transactions and very strict rules about what can be entered and a different one that contains public keys, smart contracts and whatnot that might have non-transaction data too but is expensive to write into as well as slower and larger in size.

So I'll encode my data in the transaction amounts (this can be limited by giving this no-data chain low divisibility, e.g. tenths of BTC instead of satoshis), addresses, or even txids (this last one is more far-fetched, I'll admit).

And don't forget in all of this that a short amount of data can link to a larger amount of data: e.g. 80 bits can be an .onion address, 256 bits can be a magnet URI, etc. Which prompts the question: just what problem are we trying to solve here? If that can be identified, then I can know whether a certain solution is feasible for that or not. I think it will always be possible, but can be made more expensive, given certain tradeoffs.

Sukrim

legendary

Activity: 2618

Merit: 1007

There might be a way to only allow transactions to public keys and include a clause that unspent transactions can be re-distributed after e.g. a year or so. This way, even if someone spams the chain, it is not really permanent but will be resolved after a fixed time period.

Another way might be to decouple account management and transactions - this way you might have a block chain that only deals with transfers between accounts that would have very tiny transactions and very strict rules about what can be entered and a different one that contains public keys, smart contracts and whatnot that might have non-transaction data too but is expensive to write into as well as slower and larger in size.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: TimS on January 10, 2015, 07:36:09 PM

Granted, all of this does make it significantly more expensive. But it's still quite possible.

It basically makes a data storing party have a huge (e.g. dozens fold) disadvantage against normal users, even if you give them large computation, and at that point economic pressures are likely more effective. Pragmatically it may change how things work legally. People might order a newspaper to censor an old classified in its archive, they're not so likely to do ask the censoring of 50 articles each which leak one letter; at some point the "message" is really embedded in the decode instructions rather than the data elsewhere. You've gone from outputs in a transaction being able to code kilobytes with modest overhead, to, say, a couple hundred bytes with huge overhead.

TimS

sr. member

Activity: 250

Merit: 253

Quote from: gmaxwell on January 10, 2015, 04:41:42 PM

This reduces your covert channel to only the small bits that you can afford to grind into it with guess and check. If the values already have to meet some computational criteria (e.g. you always have to grind) then any additional grinding is a multiplication on that, so with a linear cost to honest users that exponential cost channel can be made as computationally expensive as you like.

I had considered a plan similar to this: e.g. that the first x bits of the public key have to be 0. However, similar to Hashcash, I figured that what is too expensive for a legitimate user is still too weak for a serious attacker and his botnets (especially if ASICs can get involved!).

This hypothetical network should also prohibit address reuse. Why? Imagine a scenario where an attacker wishes to use the last byte of the address/P2SH (whatever is visible in the blockchain) to store his data. An attacker only needs to search through (he doesn't even have to generate them himself, just find people's addresses in the blockchain) about 1568 addresses to find 256 distinct-end-byte addresses and use them to represent 1 byte each in an unlimited number of data storage transactions. Or, if having an off-blockchain lookup table is acceptable, just choose or generate any 256 addresses.

How long is acceptable to force a legitimate user on a mobile device to come up with a new address for a transaction? Seconds? Minutes? Whatever it is, someone with a farm of ten desktop computers could be expected to generate 100 addresses in that time. 15 times that is still easy if the data is really worth putting in the blockchain.

Granted, all of this does make it significantly more expensive. But it's still quite possible.

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: DeathAndTaxes on January 10, 2015, 02:42:57 PM

To secure the output against misuse one could require that the "pre-hash" be included when a transaction is relayed or when a block is published. For Bitcoin the ScriptHash = RIPEMD160(SHA256(script)). So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction. Nodes would take the SHA256 "prehash" hash it and verify it matches the output to confirm the output has a valid hash. Once a txn was sufficiently deep in the blockchain to make re-orgs unlikely, nodes could discard the prehash.

When someone independantly invents the same idea, you know it must be at least somewhat good.

(uh, well actually, maybe not. Tongue

)

I called this P2SH^2: http://sourceforge.net/p/bitcoin/mailman/message/30705609/

Quote from: TimS on January 10, 2015, 02:55:21 PM

I'm pretty sure it's impossible.

The impossible presented. I'm glad to be of service.

Quote from: TimS on January 10, 2015, 02:55:21 PM

So you stick your non-transaction data in there as the "prehash".

Yes, but you can verify this prehash only for blocks on the tip, not the history. and forget about it. So I think that fits at least some definitions of "out of the blockchain". (Though as I just demonstrated even that weakness can be closed.)

Anything here works better with a somewhat narrow definition of "out of a blockchain". Even with outputs fixed signatures still end up with a covert channel (though something with very limited functionality could also make it arbitrarily narrow, again using unique signatures), but most of my thought in this space has been centred around eliminating high bandwidth covert channels for data that all participants are strongly forced by the system to store.

There are also other things that can be done here. A utxo is, roughly, txid:vout -> {scriptPubkey, value}. You could instead store H1(txid:vout) -> Enc(H2(txid:vout), {scriptPubkey, value}) to locally blind yourself to this data until the very moment it's needed (when a block shows up that actually spends the output).

amaclin

legendary

Activity: 1260

Merit: 1019

Quote

Is it possible to keep non-transaction data out of a blockchain?
[...]
Any other ideas?

Alt-coin with strict rules.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: TimS on January 10, 2015, 02:55:21 PM

So you stick your non-transaction data in there as the "prehash".

Nodes will discard the prehash once the txn is included in a block (or alternatively once the txn is sufficiently deep in the blockchain).[/quote]

It is only used for temporary validation.

Quote

The only thing I can think of is to make it prohibitively expensive to include a substantial amount of non-transaction data, e.g. via high fees or a proof of work alongside your transaction.

The difficulty is how to distinguish encoded data from actual transactions. Any ideas? It is made difficult because Bitcoin has a rather open scripting language. This provides flexibility but it also makes it difficult to avoid using opcodes for other purposes. For example OP_CHECKMULTISIG is great for encoding bulk data in the blockchain. Instead of 20 bytes per transaction you can put hundeds and it appears indistinguishable from normal transactions and is even considered standard by the network.

Quote

If it's not possible to prevent it, you might as well make it pruneable.

That is another possible route. Imagine if OP_RETURN had no size limit but outputs were hashed first and the txn hash was created from the output hashes. This would allow pruning the OP_RETURN data with just its hash. Not only could OP_RETURN data be pruned out of the blockchain, txns could still be validated so even full nodes could choose validate th txn without the data. Those who wanted to keep the data would need to pay the cost to store and relay the data. Other nodes would relay the hash only. However for that to work it should be difficult or expensive to bypass this and bury data in "fake" txns instead or it does no good.

TimS

sr. member

Activity: 250

Merit: 253

I'm pretty sure it's impossible.

Quote from: DeathAndTaxes on January 10, 2015, 02:42:57 PM

So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction.

So you stick your non-transaction data in there as the "prehash".

The only thing I can think of is to make it prohibitively expensive to include a substantial amount of non-transaction data, e.g. via high fees or a proof of work alongside your transaction.

If it's not possible to prevent it, you might as well make it pruneable.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Is it possible to keep non-transaction data out of a blockchain? This doesn't necessarily need to be Bitcoin for the sake of the thread lets just consider a generic blockchain like system. Also for the sake of the thread lets not get into "if" it is a good idea to limit the blockchain to only transactional data (that is a good debate for another thread).

For partial solutions which make it more difficult lets give higher value to solutions which keep non-transactional data out of the UTXO. Spent or provably unspendable transactions can be pruned.

One idea would be to make all transaction outputs being P2SH. This could even be implicit with no opcodes in the output at all where outputs are simply a hash of the redeem script and a value. As a side note this has an added benefit of making the UTXO (but not the full blockchain) smaller with fixed length records, making lookups simpler and cheaper. Of course this by itself would not significantly limit inclusion of non-transactional data as encoded data (the 20 bytes "hash" could be put any encoded data without verification). This by itself would actually be a step back because unlike OP_RETURN outputs the output would be unspendable but not provably so leading to UTXO bloat.

To secure the output against misuse one could require that the "pre-hash" be included when a transaction is relayed or when a block is published. For Bitcoin the ScriptHash = RIPEMD160(SHA256(script)). So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction. Nodes would take the SHA256 "prehash" hash it and verify it matches the output to confirm the output has a valid hash. Once a txn was sufficiently deep in the blockchain to make re-orgs unlikely, nodes could discard the prehash.

Any other ideas?

Topic: Is it possible to keep non-transaction data out of a blockchain? (Read 1278 times)