Author

Topic: Questions about hash_serialized, returned by gettxoutsetinfo (Read 844 times)

legendary
Activity: 2058
Merit: 1416
aka tonikt
Congrats, you win today's failed cryptography trophy. Smiley That kind of structure is trivially to second preimage attacks using wagners algorithm for solutions to the subset sum problem.  

Sorry then.
I've been learning cryptography by reverse engineering your code Wink

Is it also so "trivial" when the final checksum is not 256, but e.g. 65536 bits? or 8388608 bits?
In order to get this, I'd xor each 256/512 bit hash into a different offset of the final checksum.
The offset would depend on some part of the input records (e.g. TXID).
Then, for convenience, I'd calculate and distribute a shorter hash (of the big "final checksum").

EDIT:
And, for extra security, I'd also feed the final hashing with the number of input records.
staff
Activity: 4326
Merit: 8951
The only purpose of it is bitcoin core specific software testing, it's effectively free to compute while returning those other statistics, and it allows rapid isolation of suspected database corruption or inconsistency.

The structure of the hashing is not well suited to other applications.

and xor all the hashes together to get the final value.
Congrats, you win today's failed cryptography trophy. Smiley That kind of structure is trivially to second preimage attacks using wagners algorithm for solutions to the subset sum problem.  Order independent accumulators are a tricky subject, the only that I'm aware of that have any real argument for security have huge hashes and are very slow.

The data hashed here is also _highly_ implementation specific and subject to change in any new version without warning.
legendary
Activity: 2058
Merit: 1416
aka tonikt
I imagine the idea behind hashing it like this was to be able to verify a disk snapshot of the database, using a single command line and some common existing tools.

which keeps making me curious about where is the code that stores the snapshot on the disk or imports it into the client..
legendary
Activity: 2058
Merit: 1416
aka tonikt
I'm not entirely sure how it is calculated, but the functions for getting all of that data is in the code at https://github.com/bitcoin/bitcoin/blob/37d83bb0a980996338d9bc9dbdbf0175eeaba9a2/src/rpc/blockchain.cpp#L633
thanks. I have actually seen that after I sent the post - sorry Smiley

it seems that since 0.12.1 the "ss << key;" was added.

obviously, looking into this code, it's just a hash of a serialized data and the records must come in a sorted order.
I'm not sure how they are sorted, but I'd guess it's given by leveldb, so probably sorted by txid. either little, or big endian..

shame it uses a sorted order, as it becomes very hard to calculate this value with a different utxo db engine (one which does not keep sorted records).
I'd rather calculate a separate hash of each "key+record(s)" and xor all the hashes together to get the final value.
staff
Activity: 3458
Merit: 6793
Just writing some code
I'm not entirely sure how it is calculated, but the functions for getting all of that data is in the code at https://github.com/bitcoin/bitcoin/blob/37d83bb0a980996338d9bc9dbdbf0175eeaba9a2/src/rpc/blockchain.cpp#L633
legendary
Activity: 2058
Merit: 1416
aka tonikt
Recently I've read somewhere that the way hash_serialized gets calculated has changed, but I just want to ask about the last version (not yet released):

How exactly is it calculated?
Specifically, can someone please confirm that I need to have the utxo records sorted, in order to calculate it.

How are they sorted?

Also, are there any plans to make this hash more open and verifiable, for a purpose of setting up fresh nodes with a snapshots of an UTXO database?
I'd imagine setting up a node with a snapshot of the coinstate db, without the need to build it up from the blockchain.
That'd save shit loads of time, bandwidth and disk space.
Jump to: