Rule 30 automaton as hash function - page 6.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

Quote from: Anders on July 23, 2014, 06:18:38 AM

Here is a version of the Rule 30 hash function that skips rows 2.5 times the number of bits in the initial condition: http://jsfiddle.net/d3NS6/

The reason is to ensure that the bits become influenced enough as a protection against cryptanalysis.

This sounds weak. This is the path to perceived security through perceived complexity, the thinking how SHA probably was constructed.

If there is really something with R30, then below should be sufficient.

grau

hero member

Activity: 836

Merit: 1030

bits of proof

I think Satoshi's solution to the Byzantine General's problem is the first real world application of Wolfram's computational irreducibility, since doing an irreducible computation is the ideal proof of work.

My intuition says that known attacks to hash functions are only special cases of reducibility checks and a computationally irreducible function must be resistant of them by definition.

Maybe SHA256 is also computationally irreducible, it is however more likely that irreducibility of an R30 hash will be rigorously proven, than that of SHA256.

It is desirable to use the least complex irreducible computation for proof of work, as that leaves least room for implementation differences. If R30 is irreducible then it is also the least complex of such.

Note that computation of the below area is sufficient.

Anders

full member

Activity: 126

Merit: 100

Another potential problem with R30 other than it being untested is that it may turn out to bee too good! Grin

Imagine criminal organizations including terrorists being able to communicate in a way that is hidden even to intelligence agencies like the NSA. I don't like Orwellian stuff, but I trust the NSA more than I would trust secret communication between criminal elements.

Anders

full member

Activity: 126

Merit: 100

It would probably be too risky to use R30 in Bitcoin today. The algorithm is way too untested. And even if experts would find it likely to be strong, the algorithm has to be used in real applications and be exposed to real and heavy attacks. Anyway, R30 could be a good backup in case SHA-256 would become seriously breached.

Anders

full member

Activity: 126

Merit: 100

I created a Google Code project called R30 hash function: https://code.google.com/p/r30-hash-function/

Anders

full member

Activity: 126

Merit: 100

I think the full message version of my Rule 30 hash function has at least the following cryptographic benefits:

Resistant against preimage attacks.
Resistant against second-preimage attacks.
Resistant against collision attacks.
Resistant against chosen-prefix collision attacks.
Resistant against length extension attacks.
Resistant against brute-force attacks.
Resistant against distinguishing attacks.
Resistant against cryptanalysis.
Close approximation to a random oracle.

"preimage resistance: for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage x such that h(x) = y when given any y for which a corresponding input is not known.[1]

second-preimage resistance: it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given x, to find a second preimage x' ≠ x such that h(x) = h(x′).[1]" -- http://en.wikipedia.org/wiki/Preimage_attack

"Collision attack
Find two different messages m1 and m2 such that hash(m1) = hash(m2).

Chosen-prefix collision attack
Given two different prefixes p1, p2 find two appendages m1 and m2 such that hash(p1 ∥ m1) = hash(p2 ∥ m2) (where ∥ is the concatenation operation)." -- http://en.wikipedia.org/wiki/Collision_attack

"In cryptography and computer security, length extension attacks are a type of attack on certain types of hashes which allow inclusion of extra information." -- http://en.wikipedia.org/wiki/Length_extension_attack

"In cryptography, a brute-force attack, or exhaustive key search, is a cryptanalytic attack that can, in theory, be used against any encrypted data[1] (except for data encrypted in an information-theoretically secure manner)." -- http://en.wikipedia.org/wiki/Brute_force_attack

"In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. [1]" -- http://en.wikipedia.org/wiki/Distinguishing_attack

"In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that query is submitted." -- http://en.wikipedia.org/wiki/Random_oracle

Anders

full member

Activity: 126

Merit: 100

Here is the Rule 30 hash function with the same algorithm and with fewer calculations: http://jsfiddle.net/GwLp4/

Since it's only the center column from where values are taken, fewer cells need to be calculated at the bottom half of the pyramid.

Anders

full member

Activity: 126

Merit: 100

The image below shows how the bits for the hash value are taken from the further expansion of the cellular automaton after 2.5 N rows have been calculated. If the bit values would be taken from the top of the automaton the result would be far from random for similar messages.

Anders

full member

Activity: 126

Merit: 100

Here is a version of the Rule 30 hash function that skips rows 2.5 times the number of bits in the initial condition: http://jsfiddle.net/d3NS6/

The reason is to ensure that the bits become influenced enough as a protection against cryptanalysis. The image shows that 2.5 times the message length in bits is enough to make the bits to the far left and right influence the cells back into the center column after N rows.

The next image shows again how protection against cryptanalysis is achieved:

Anders

full member

Activity: 126

Merit: 100

"In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that query is submitted." -- http://en.wikipedia.org/wiki/Random_oracle

The Rule 30 hash function is almost like a random oracle perhaps.

"In cryptography and computer security, length extension attacks are a type of attack on certain types of hashes which allow inclusion of extra information." -- http://en.wikipedia.org/wiki/Length_extension_attack

The Rule 30 hash function is automatically protected against length extension attacks since the bits for the length of the message are concatenated with the bits of the full message, making the cellular automaton expand with different values for every different message length.

Anders

full member

Activity: 126

Merit: 100

Quote from: Peter R on July 23, 2014, 12:59:24 AM

I agree with DeathAndTaxes and Grau that Rule 30 is probably most interesting as a proof of work. Using it as such, and applying D&T's suggestion

R30(nonce + H(blockheader)) < target

also means you don't have to worry about arbitrary-length messages.

Have you considered what the minimum requirement is for the number of cells in width need to apply Rule 30 to these fixed-length initial conditions

I did a quick test now, and skipping a number of rows equal to 1.5 times the number of bits in the initial condition seems enough: http://jsfiddle.net/r3qAd/

And that makes theoretical sense since that's the point where all the bits have influenced each other and reached back to the center column.

Quote

Can the cell width simply be equal to the bit-length of the initial conditions, or do you need additional width to ensure irreversibility (I wouldn't think so)?

I think skipping rows 1.5 times the number of bits in the initial condition is enough. The initial condition is the number of bits in the message plus 32 bits for the concatenation of the length bits. The length bits need to be added or else there would be no difference for different length messages containing only zeros. The cellular automaton in addition must be large enough to contain the additional rows for calculating the hash value, i.e. taking the bit value of every other cell in the center column.

Quote

Have you given any thought to what a Rule 30 PoW ASIC would look like? I suspect it might be simpler and cheaper than a SHA256 ASIC (although I understand the SHA256 ASICs are fairly simple too).

I guess a hardware implementation of the automaton would be very fast. I have also seen a term "quantum dot cellular automaton".

Peter R

legendary

Activity: 1162

Merit: 1010

I agree with DeathAndTaxes and Grau that Rule 30 is probably most interesting as a proof of work. Using it as such, and applying D&T's suggestion

R30(nonce + H(blockheader)) < target

also means you don't have to worry about arbitrary-length messages.

Have you considered what the minimum requirement is for the number of cells in width need to apply Rule 30 to these fixed-length initial conditions? Can the cell width simply be equal to the bit-length of the initial conditions, or do you need additional width to ensure irreversibility (I wouldn't think so)? Have you given any thought to what a Rule 30 PoW ASIC would look like? I suspect it might be simpler and cheaper than a SHA256 ASIC (although I understand the SHA256 ASICs are fairly simple too).

Anders

full member

Activity: 126

Merit: 100

There is a risk that any form of reducing the length of the original message will affect the security of the rule 30 hash function (as Peter R already pointed out earlier).

Therefore I made a version that uses the entire message as initial condition: http://jsfiddle.net/8aR3W/

It's very computationally demanding for long messages. On the other hand, computing performance is progressing exponentially year by year, and many practical cryptographic applications deal with short messages.

Anders

full member

Activity: 126

Merit: 100

The initial conditions I have developed are way too amateurish for any serious application. Anyway, it's fun to learn about hash functions, and hopefully others new to this stuff can learn some basics too.

Length extension attacks are a serious threat even to SHA-256. In Bitcoin SHA-256 is used twice in a row. I learned that this technique is actually called SHA-256d:

"You should also consider using a generic length-extension defense such as the "SHA-256d" design by Ferguson and Schneier. SHA256d(x) = SHA256(SHA256(x))." -- http://crypto.stackexchange.com/questions/893/at-the-current-time-is-sha256-the-de-facto-standard-for-strong-cryptographic-ha

With rule 30, length extension defense can be done like in one of my previous posts. As an initial condition, start with the SHA-256 hash and concatenate it with the bits for the length of the message. And then run the rule 30 automaton as I have described earlier. That could be a serious use of the rule 30 hash function.

"Distinguishing H² from a random oracle (essentially an ideal hash) is much cheaper that it should, namely 2⁶⁴ for SHA-256d. This doesn't lead to any practical attacks, but it hurts security proofs relying on indistinguishably. It is easy to avoid this problem by using distinct prefixes for the inner and outer hash, so I see little reason to use H² in practice." -- http://crypto.stackexchange.com/questions/7895/weaknesses-in-sha-256d

Anders

full member

Activity: 126

Merit: 100

"The Davies–Meyer single-block-length compression function feeds each block of the message (m_i) as the key to a block cipher. It feeds the previous hash value (H_i-1) as the plaintext to be encrypted. The output ciphertext is then also XORed (\oplus) with the previous hash value (H_i-1) to produce the next hash value (H_i)." -- http://en.wikipedia.org/wiki/One-way_compression_function#Davies.E2.80.93Meyer

The trick with the Davies–Meyer compression function seems to be the xor of the previous hash value with the ciphertext.

I made a very simple version based on this: http://jsfiddle.net/2mK9R/

Anders

full member

Activity: 126

Merit: 100

Here is a version of my hash function using a kind of Merkle–Damgård construction: http://jsfiddle.net/YkPA3/

Length extension attacks are protected against by concatenating the initial condition with the bits of the length of the message. Any change in length of the message and that bit pattern will become different, making the cellular automaton expand with different cell values.

The compression function is a simple cipher with a key that depends on the previous block (of 640 bits).

Anders

full member

Activity: 126

Merit: 100

Quote from: DeathAndTaxes on July 21, 2014, 05:23:16 PM

Then the function is no more secure than SHA-2 so why not just use SHA-2 if the goal is security?

Hmm... Yes, the combination would perhaps not be more secure than SHA-2 alone except as a way of obfuscating the initial hash.

Quote

It still may be useful to for a PoW by moving the nonce outside of the blockheader.

R30(nonce + H(blockheader)) < target

The security of the PoW still relies on the preimage resistance of H however if R30 is irreducible then it would prevent more efficient work in the PoW. This has the advantage of making mining hardware highly commoditized which means lower margins (anyone can do it and they work about the same) which is optimal from a security point of view.

Concatenating the Bitcoin nonce and the hash of the blockheader and use it as an initial condition for an R30 hash would perhaps work. I don't know enough about the Bitcoin protocol to know for sure. As a proof of work to produce a value less than the target, the work required can easily be made harder by increasing the number of rows (generations) that should be calculated before the hash value is extracted.

In the standard rule 30 cellular automaton the initial condition is always only one bit set to 1. The trick I have done is to replace that initial condition with a whole sequence of bits. And the other part of the trick is to skip enough generations in the cellular automaton so that the initial bits influence each other left to right and back again. This ensures that the initial condition is distributed in a highly random way that is unique for each message, when the hash is long enough. The hash value is taken from every other cell in the center column of the cellular automaton after the initial (skipped) rows have been calculated.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: Anders on July 21, 2014, 03:20:06 PM

One solution to make the hash function more general, other than using the entire message as initial condition (which will become computationally very demanding for long messages), is, as I described earlier, to use another known hash function such as SHA-512 to generate the initial condition.

Then the function is no more secure than SHA-2 so why not just use SHA-2 if the goal is security?

It still may be useful to for a PoW by moving the nonce outside of the blockheader.

R30(nonce + H(blockheader)) < target

The security of the PoW still relies on the preimage resistance of H however if R30 is irreducible then it would prevent more efficient work in the PoW. This has the advantage of making mining hardware highly commoditized which means lower margins (anyone can do it and they work about the same) which is optimal from a security point of view.

Quote

Also, the often used Merkle–Damgård construction has problems:

These are weaknesses known to M-D and they are what cryptographers target when attempting to "break" the hashing function. To date nobody has shown a preimage attack on SHA-1 or the more complex SHA-2 is possible.

The issue of length extension doesn't apply to PoW as the header has a fixed length and ordering. Even if you could perform a preimage attack on an existing block via length extension the resulting block would be invalid regardless of the block hash because Bitcoin blocks must be exactly 840 bytes and the elements ordered in a specific order.

In applications where the hash will protect variable length data using a HMAC over the pure hashing function is preferable. HMAC don't suffer from length extension attacks and they make collision attacks less effective. Still this is academical at this point as most hashing functions are still secure against preimage attacks (even the ancient MD5). A major goal of the SHA-3 competition was to bypass some of the weaknesses of M-D construction and as such there are theoretical length extension attacks on SHA-3. Still time trumps all, SHA-2 has been vetted more than SHA-3 at least as of today. Maybe in a decade or so but SHA-3 is a little ahead of its time as SHA-2 held up better than NIST expected it to.

Anders

full member

Activity: 126

Merit: 100

My current hash function is only potentially good when the message is unknown. When the message is known it's easy to construct collisions, such as manipulating the content in a document so that it gets the same hash value. Not good.

One solution to make the hash function more general, other than using the entire message as initial condition (which will become computationally very demanding for long messages), is, as I described earlier, to use another known hash function such as SHA-512 to generate the initial condition.

It's a bit boring to have to rely on another already existing hash function. I will try to figure out a way to make the rule 30 hash function more general without being dependent on other known hash function standards.

Also, the often used Merkle–Damgård construction has problems:

"Unfortunately, this construction also has several undesirable properties:

Length extension — once an attacker has one collision, he can find more very cheaply.
Second preimage attacks against long messages are always much more efficient than brute force.
Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.[5]
"Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a random oracle.[6][7]
"Extension attacks": Given the hash H(X) of an unknown input X, it is easy to find the value of H(pad(X) || Y), where pad is the padding function of the hash. That is, it is possible to find hashes of inputs related to X even though X remains unknown.[8] A random oracle would not have this property, and this may lead to simple attacks even for natural schemes proven secure in the random oracle model.[9] Length extension attack was actually used to attack a number of commercial web message authentication schemes such as one used by Flickr.[10]

" -- http://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Security_characteristics

Anders

full member

Activity: 126

Merit: 100

I tested with messages of up to 10,000 characters and got an average of about 350, the same as for fixed messages of length 4.

Attempt 1: 419
Attempt 2: 263
Attempt 3: 185
Attempt 4: 372
Attempt 5: 351
Attempt 6: 434
Attempt 7: 155
Attempt 8: 619
Attempt 9: 319
Attempt 10: 336
Attempt 11: 333
Attempt 12: 442
Attempt 13: 95
Attempt 14: 639
Attempt 15: 290
Attempt 16: 284
Attempt 17: 435
Attempt 18: 514
Attempt 19: 98
Attempt 20: 352
----------------
Average: 347

Source: http://jsfiddle.net/My76F/

So on average it seems that around 350 attempts are needed instead of the theoretical 256 attempts. It also shows that my wrapping method for the initial condition seems to work. And the collision-rate looks good, unless I have messed something up in my test.

Topic: Rule 30 automaton as hash function - page 6. (Read 10686 times)