Author

Topic: Hash Phrase - Human Readable Hash Function (Read 3586 times)

staff
Activity: 4284
Merit: 8808
August 01, 2013, 09:53:08 AM
#14
I describe the internals on the github README.  By default, PBKDF2_HMAC_SHA256 (50k iterations, no salt, 32 bytes) is applied to the input data.  The resulting 256-bit hash is converted to an integer and then used to create the phrase.  So it's resistant against second pre-image attacks.
I'm suggesting a per-user secret nonce which would provide immunity to partial second preimages too (and, in fact, complete second preimage immunity even if the attacker isn't computationally bounded). This is applicable to some applications and should be used for them...

legendary
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
Yep.

OP:
You should try something like SSH does.
Code:
Generating public/private rsa key pair.
The key fingerprint is:
05:1e:1e:c1:ac:b9:d1:1c:6a:60:ce:0f:77:6c:78:47 you@i
The key's randomart image is:
+--[ RSA 2048]----+
|       o=.       |
|    o  o++E      |
|   + . Ooo.      |
|    + O B..      |
|     = *S.       |
|      o          |
|                 |
|                 |
|                 |
+-----------------+

Generating public/private dsa key pair.
The key fingerprint is:
b6:dd:b7:1f:bc:25:31:d3:12:f4:92:1c:0b:93:5f:4b you@i
The key's randomart image is:
+--[ DSA 1024]----+
|            o.o  |
|            .= E.|
|             .B.o|
|              .= |
|        S     = .|
|       . o .  .= |
|        . . . oo.|
|             . o+|
|              .o.|
+-----------------+
legendary
Activity: 1135
Merit: 1166
newbie
Activity: 28
Merit: 12
I think this is an interesting area to explore, basically taking steps toward solving the problem of 'humanizing' bitcoin addresses. It makes me think of the difference of an internet with ip addresses vs. domain names. Are there any more initiatives going on now aimed at giving us the 'DNS' of Bitcoin?

Regarding your proposal, the first thing that kind of popped in my head was that it was a very english language oriented system. For a foreign language speaker it may be not much better than Base58 gibberish. Like as an english speaker I would find the hungarian phrase...

Nem tudok jól magyarul

...pretty hard to read and remember. Anyway, not trying to discourage you because I really think solving this problem would be a big breakthrough to mass adoption. I'll think about it some more and maybe come back with some more constructive suggestions Smiley

legendary
Activity: 1792
Merit: 1111
i thought about this: what about using a generative grammar to encode the information. that way you could take a hash of decent size and encode it into a grammatically correct sentence. the syntax tree encodes part of the information as well as the verbs, nouns, etc.. this helps with information density but it also helps to memorize it because it is grammatically correct.
you just have to be careful that the generated sentence is always parsed back correctly.

if this works, you could have a working brain wallet with enough entropy.

It should work, but the sentence has to be very long
hero member
Activity: 668
Merit: 501
i thought about this: what about using a generative grammar to encode the information. that way you could take a hash of decent size and encode it into a grammatically correct sentence. the syntax tree encodes part of the information as well as the verbs, nouns, etc.. this helps with information density but it also helps to memorize it because it is grammatically correct.
you just have to be careful that the generated sentence is always parsed back correctly.

if this works, you could have a working brain wallet with enough entropy.
hero member
Activity: 560
Merit: 517
Quote
Are you familiar with RFC2289?
Thanks for the link, retep!  I was not aware of RFC2289, and it's great to see a standardized wordlist like that.

Quote
If you're using them just for the purpose of comparing it can be better to first apply a pseudorandom permutation so that a third party couldn't pick values which you will easily mistake.
I describe the internals on the github README.  By default, PBKDF2_HMAC_SHA256 (50k iterations, no salt, 32 bytes) is applied to the input data.  The resulting 256-bit hash is converted to an integer and then used to create the phrase.  So it's resistant against second pre-image attacks.
staff
Activity: 4284
Merit: 8808
If you're using them just for the purpose of comparing it can be better to first apply a pseudorandom permutation so that a third party couldn't pick values which you will easily mistake.
legendary
Activity: 1120
Merit: 1152
Are you familiar with RFC2289?

It already defines a well thought out, and standardized, word list representing 11-bits/word as well as a standard way to combine those words to get a 66-bit number. It's used in RFC2289 for 64-bit numbers with a 2 bit checksum.

I'd suggest sticking with a standard wordlist. Depending on your attack model 80-bits, or even 64-bits, is considered fine, giving 6-8 words; JANE ISLE TONG TINY BAY FLY HOB DOOM is 88 bits for instance. (ask yourself, is this for pre-image, or second pre-image resistance?)
hero member
Activity: 560
Merit: 517
Quote
While good in theory, and I do like your idea, I must point out that human error will get in the way of this.
Can you give a concrete example of how that kind of tampering could be used by an attacker?

Hash Phrase spits out a series of words.  It's not possible to fiddle with the letters in the words to give a different hash; you would just get an invalid hash doing that.

The applicable use cases occur when you have a known good Hash Phrase, potentially manipulated data, and a trusted way to calculate the Hash Phrase of that data.  The only thing an attacker can really do under that situation is try to brute-force similar phrases.  So if the real phrase is "Random baptism coaches seymour parishes", they might expend computations to brute-force data that hashes to "Random baptism coaches electric parishes."  But doing that would require ~1x10^21 iterations of SHA-256, which would take thousands of years on a modest farm.
member
Activity: 67
Merit: 10
In a similar spirit (originally for ssh, I believe):
http://bohwaz.net/archives/web/Bubble_Babble.html
Nonsense sounds, but easily pronounceable (which may help it stick in people's minds better).
legendary
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
Like you said the problem is that you'll never get 64 bits
sr. member
Activity: 392
Merit: 250
♫ A wave came crashing like a fist to the jaw ♫
While good in theory, and I do like your idea, I must point out that human error will get in the way of this.

For instance:

Original:
"This is a human readable passphrase"

Tampered:
"This is a human readeble passphrase"
"This is a human readable paspphrase"


All very similar and if you don't really look at it, would be quite easy for a hacker or malicious user to try and mess with unsuspecting folks.
hero member
Activity: 560
Merit: 517

4ba38d48a60f1b29e9eb726eaff08b2e83d8d81e031666fee50e85900d7dc1ef
versus
Theologian examine writer walking desire


While working on my Hardware Wallet project, and while using Bitcoin in general, I often found myself manually comparing destination Bitcoin addresses.  This is a good practice, to ensure the destination address is not tampered with by malicious software.  However, addresses are base58 gibberish, and thus difficult for a human being to compare; they could be fooled by a partial collision attack.

In addition to that, I had the thought that a browser extension which displays a webpage's hash to the user would be helpful for security critical, static websites like brainwallet.  That would help prevent tampering.  But nobody is going to memorize a hexadecimal hash.

One solution is a human readable hash function; one which is easy for humans to read, memorize, and compare.  My first implementation, Hash Phrase, outputs a phrase composed of common English words.  For example, the Hash Phrase of my sig's Bitcoin address is "Random baptism coaches seymour parishes."  Easy to read, memorize, and/or compare!

I'm interested in discussion on this particular implementation, alternate solutions, and applications.  For example, this implementation suffers from entropy starvation; it can only reasonably represent ~64-bits.  Its phrases are also sometimes confusing and weird, which makes them difficult to remember.  A procedural image based hash function would likely be a better solution, but is more difficult to implement well.
Jump to: