Rule 30 automaton as hash function - page 8.

Anders

full member

Activity: 126

Merit: 100

Ouch. Professional cryptographers/cryptanalysts, the below article seems to say, usually only have time and interest to look at new algorithms if they have been published in the mainstream professional literature or developed by known cryptographers. Angry

It's understandable since it probably takes a lot of time and effort to look at even simple cryptographic algorithms. Stephen Wolfram's quote I posted earlier can perhaps help a bit, since even though he's not a cryptographer he is a top expert in mathematics. (A cipher is different than a hash function but I think the article implicitly describes cryptography in general.)

"Memo to the Amateur Cipher Designer

Congratulations. You've just invented this great new cipher, and you want to do something with it. You're new in the field; no one's heard of you, and you don't have any credentials as a cryptanalyst. You want to get well-known cryptographers to look at your work. What can you do?

Unfortunately, you have a tough road ahead of you. I see about two new cipher designs from amateur cryptographers every week. The odds of any of these ciphers being secure are slim. The odds of any of them being both secure and efficient are negligible. The odds of any of them being worth actual money are virtually non-existent.

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis. And the only way to prove that is to subject the algorithm to years of analysis by the best cryptographers around.

"The best cryptographers around" break a lot of ciphers. The academic literature is littered with the carcasses of ciphers broken by their analyses. But they're a busy bunch; they don't have time to break everything. How do they decide what to look at?

Ideally, cryptographers should only look at ciphers that have a reasonable chance of being secure. And since anyone can create a cipher that he believes to be secure, this means that cryptographers should only look at ciphers created by people whose opinions are worth something. No one is impressed if a random person creates an cipher he can't break; but if one of the world's best cryptographers creates an cipher he can't break, now that's worth looking at." -- https://www.schneier.com/crypto-gram-9810.html#cipherdesign

Anders

full member

Activity: 126

Merit: 100

Quote from: Peter R on July 17, 2014, 02:18:53 PM

... I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). ...

Yes, if the full message was used as initial condition the algorithm would be O(n²) and require several times the message length storage (in a 1-dimentional array). What I do is to crunch down the message to an initial condition of max 640 bits (or whatever fixed value is chosen). This makes my hash function O(n) and the storage requirement fixed to a small value.

Anders

full member

Activity: 126

Merit: 100

Improved version ~~without~~ with random generator: http://jsfiddle.net/ALV5X/

Very fast for huge messages.

Anders

full member

Activity: 126

Merit: 100

Here is new version with a random generator added to the initial condition: http://jsfiddle.net/Nem6N/

The rationale for that is that messages with repeated patterns can cause collisions when those patterns match the modulo wrap. Now different lengths of messages will produce different initial conditions regardless of how similar the messages are and regardless of what repeated patterns they contain.

Anders

full member

Activity: 126

Merit: 100

Same hash function, just corrected array size: http://jsfiddle.net/n9Ssu/

Anders

full member

Activity: 126

Merit: 100

The reduction of the message into 640 bits I make for the initial condition of the cellular automaton, is itself a hash function. And if that hash sucks in terms of collisions, then those collisions will remain even after the Rule 30 scrambling. The simple xor wrapping could be good enough though. If for example a hash is calculated for a 1 megabyte file, then the initial condition will contain 640 bits of complete mess (in a good sense) because of the xor of all the bytes in the file. And even for a very short message the Rule 30 scrambling will turn the hash into a highly random sequence of 256 bits. So the 256-bit hash in my JavaScript example should possibly work well for all types of messages of all lengths.

Anders

full member

Activity: 126

Merit: 100

Quote from: Peter R on July 17, 2014, 02:18:53 PM

If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message). Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa). Your "hash value" would be the N central-most bits.

I forgot to comment on that. Yes, the bits in the original message must be blended like that. That's why I start with only 80 characters (640 bits) as initial condition in the automaton. And it seems that more than 640 iterations are needed initially. I think it has to do with that the cells need to be influenced left to right and then back right to left again.

Edit: But the initial "blending" can be calculated in a 1-dimensional array. Because the bits for the hash are taken from continuing expanding the cellular automaton in one dimension.

Anders

full member

Activity: 126

Merit: 100

Quote from: Peter R on July 17, 2014, 02:18:53 PM

I expect Rule 30 would be cryptographically as strong (or stronger) than SHA2. (Side note: Rule 30 is used by the Mathematica function Random[])

Off the top of my head (and I could be very wrong here), I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message). Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa). Your "hash value" would be the N central-most bits.

The 1-dimensional automata only grow 2 cells per generation (iteration). And if every other row is skipped (for stronger cryptography) then it means that the automaton grows with 4 cells per generation.

In my hash function I first reduce the message to n characters (80 in my JavaScript example).

Anders

full member

Activity: 126

Merit: 100

Here is an improved version (live demo): http://jsfiddle.net/L2eNR/

In the first version a message of for example 160 characters 'a' would generate the same hash as 160 characters 'b' (actually zero). Not good. I simply wrapped the message modulo 80 with xor. In the new version there is an additional xor that is dependent on the characters in the message. And more rows are skipped initially in the new version to allow the bits to blend from left to right of the automaton.

Peter R

legendary

Activity: 1162

Merit: 1010

I expect Rule 30 would be cryptographically as strong (or stronger) than SHA2. (Side note: Rule 30 is used by the Mathematica function Random[])

Off the top of my head (and I could be very wrong here), I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message). Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa). Your "hash value" would be the N central-most bits.

Anders

full member

Activity: 126

Merit: 100

Quote from: dentldir on July 17, 2014, 11:37:46 AM

A relevant cryptanalysis can be found in:

Analysis of Pseudo Random Sequences Generated by Cellular Automata
Willi Meier, Othmar Stdelbach

It doesn't cover every other center bit but does address the CA as a whole.

It seems that using cellular automata for hash functions is a viable option according to that paper: http://download.springer.com/static/pdf/386/chp%253A10.1007%252F3-540-46416-6_17.pdf?auth66=1405789057_31fc1522b59f8110ad57d060b99b9ccc&ext=.pdf I wonder how well my Rule 30 hash function performs. I have no clue how to test it. It would be good to have a general test that can be fed a bulk of sample hashes. I found this: http://hashtest.sourceforge.net but I can't determine whether the test is useful or not. I guess I need to learn more about cryptography.

dentldir

sr. member

Activity: 333

Merit: 250

A relevant cryptanalysis can be found in:

Analysis of Pseudo Random Sequences Generated by Cellular Automata
Willi Meier, Othmar Stdelbach

It doesn't cover every other center bit but does address the CA as a whole.

Anders

full member

Activity: 126

Merit: 100

Here is a JavaScript implementation of my Rule 30 hash function: http://jsfiddle.net/d2g2R/

Anders

full member

Activity: 126

Merit: 100

Here is the reason for why every other cell in the center column should be used instead of each consecutive cell/row:

"So can such rules be used for cryptography? I strongly suspect that they can, and that in fact they allow one to construct systems that are at least as secure to cryptanalysis as any that are known. ... The picture below shows evidence for this. The cells marked by dots have colors that are taken as given, and then the colors of other cells are filled in according to the average that is obtained by starting from all possible initial conditions." -- Stephen Wolfram, A New Kind of Science, pp. 603, 605.

Even fewer cells could be used since the image text says it would make the cryptanalysis even harder.

Anders

full member

Activity: 126

Merit: 100

"Rule 30 is a one-dimensional binary cellular automaton rule introduced by Stephen Wolfram in 1983.[2] Using Wolfram's classification scheme, Rule 30 is a Class III rule, displaying aperiodic, chaotic behaviour." -- http://en.wikipedia.org/wiki/Rule_30

I read somewhere that when taking every other center bit of the Rule 30 cellular automaton, it's very difficult to determine from what initial condition the resulting bit string was generated. That can be used as a cryptographic hash function.

Indeed I found that this has already been done, such as:

A New Cryptographic Hash Function Based on Cellular Automata Rules 30, 134 and Omega-Flip Network -- http://www.ipcsit.com/vol27/32-ICICN2012-N10006.pdf

A new cryptographic hash function based on the Cellular Automaton Rule 30 -- http://www.zimuel.it/talks/Nks2006.pdf

I haven't read much about it yet but my idea is to start with the bits of the message as the initial condition for the Rule 30 cellular automaton. And then run the automaton a fixed number of steps so that the cells in the automaton have highly random values. Then after that every second bit is taken from the middle column of the automaton with 2n iterations, where n is the number of bits of the hash (digest) value.

Topic: Rule 30 automaton as hash function - page 8. (Read 10686 times)