Hash Phrase - Human Readable Hash Function

gmaxwell

staff

Activity: 4326

Merit: 8951

Quote from: fpgaminer on July 31, 2013, 05:45:35 AM

I describe the internals on the github README. By default, PBKDF2_HMAC_SHA256 (50k iterations, no salt, 32 bytes) is applied to the input data. The resulting 256-bit hash is converted to an integer and then used to create the phrase. So it's resistant against second pre-image attacks.

I'm suggesting a per-user secret nonce which would provide immunity to partial second preimages too (and, in fact, complete second preimage immunity even if the attacker isn't computationally bounded). This is applicable to some applications and should be used for them...

ShadowOfHarbringer

legendary

Activity: 1470

Merit: 1006

Bringing Legendary Har® to you since 1952

Quote from: domob on August 01, 2013, 02:43:06 AM

I like SSH's "random art": http://superuser.com/questions/22535/what-is-randomart-produced-by-ssh-keygen

Yep.

OP:
You should try something like SSH does.

Code:

Generating public/private rsa key pair.
The key fingerprint is:
05:1e:1e:c1:ac:b9:d1:1c:6a:60:ce:0f:77:6c:78:47 you@i
The key's randomart image is:
+--[ RSA 2048]----+
| o=. |
| o o++E |
| + . Ooo. |
| + O B.. |
| = *S. |
| o |
| |
| |
| |
+-----------------+

Generating public/private dsa key pair.
The key fingerprint is:
b6:dd:b7:1f:bc:25:31:d3:12:f4:92:1c:0b:93:5f:4b you@i
The key's randomart image is:
+--[ DSA 1024]----+
| o.o |
| .= E.|
| .B.o|
| .= |
| S = .|
| . o . .= |
| . . . oo.|
| . o+|
| .o.|
+-----------------+

domob

legendary

Activity: 1135

Merit: 1166

I like SSH's "random art": http://superuser.com/questions/22535/what-is-randomart-produced-by-ssh-keygen

natb

newbie

Activity: 28

Merit: 12

I think this is an interesting area to explore, basically taking steps toward solving the problem of 'humanizing' bitcoin addresses. It makes me think of the difference of an internet with ip addresses vs. domain names. Are there any more initiatives going on now aimed at giving us the 'DNS' of Bitcoin?

Regarding your proposal, the first thing that kind of popped in my head was that it was a very english language oriented system. For a foreign language speaker it may be not much better than Base58 gibberish. Like as an english speaker I would find the hungarian phrase...

Nem tudok jól magyarul

...pretty hard to read and remember. Anyway, not trying to discourage you because I really think solving this problem would be a big breakthrough to mass adoption. I'll think about it some more and maybe come back with some more constructive suggestions

jl2012

legendary

Activity: 1792

Merit: 1121

Quote from: apetersson on July 31, 2013, 09:36:52 AM

i thought about this: what about using a generative grammar to encode the information. that way you could take a hash of decent size and encode it into a grammatically correct sentence. the syntax tree encodes part of the information as well as the verbs, nouns, etc.. this helps with information density but it also helps to memorize it because it is grammatically correct.
you just have to be careful that the generated sentence is always parsed back correctly.

if this works, you could have a working brain wallet with enough entropy.

It should work, but the sentence has to be very long

apetersson

hero member

Activity: 668

Merit: 501

i thought about this: what about using a generative grammar to encode the information. that way you could take a hash of decent size and encode it into a grammatically correct sentence. the syntax tree encodes part of the information as well as the verbs, nouns, etc.. this helps with information density but it also helps to memorize it because it is grammatically correct.
you just have to be careful that the generated sentence is always parsed back correctly.

if this works, you could have a working brain wallet with enough entropy.

fpgaminer

hero member

Activity: 560

Merit: 517

Quote

Are you familiar with RFC2289?

Thanks for the link, retep! I was not aware of RFC2289, and it's great to see a standardized wordlist like that.

Quote

If you're using them just for the purpose of comparing it can be better to first apply a pseudorandom permutation so that a third party couldn't pick values which you will easily mistake.

I describe the internals on the github README. By default, PBKDF2_HMAC_SHA256 (50k iterations, no salt, 32 bytes) is applied to the input data. The resulting 256-bit hash is converted to an integer and then used to create the phrase. So it's resistant against second pre-image attacks.

gmaxwell

staff

Activity: 4326

Merit: 8951

If you're using them just for the purpose of comparing it can be better to first apply a pseudorandom permutation so that a third party couldn't pick values which you will easily mistake.

Peter Todd

legendary

Activity: 1120

Merit: 1164

Are you familiar with RFC2289?

It already defines a well thought out, and standardized, word list representing 11-bits/word as well as a standard way to combine those words to get a 66-bit number. It's used in RFC2289 for 64-bit numbers with a 2 bit checksum.

I'd suggest sticking with a standard wordlist. Depending on your attack model 80-bits, or even 64-bits, is considered fine, giving 6-8 words; JANE ISLE TONG TINY BAY FLY HOB DOOM is 88 bits for instance. (ask yourself, is this for pre-image, or second pre-image resistance?)

fpgaminer

hero member

Activity: 560

Merit: 517

Quote

While good in theory, and I do like your idea, I must point out that human error will get in the way of this.

Can you give a concrete example of how that kind of tampering could be used by an attacker?

Hash Phrase spits out a series of words. It's not possible to fiddle with the letters in the words to give a different hash; you would just get an invalid hash doing that.

The applicable use cases occur when you have a known good Hash Phrase, potentially manipulated data, and a trusted way to calculate the Hash Phrase of that data. The only thing an attacker can really do under that situation is try to brute-force similar phrases. So if the real phrase is "Random baptism coaches seymour parishes", they might expend computations to brute-force data that hashes to "Random baptism coaches electric parishes." But doing that would require ~1x10^21 iterations of SHA-256, which would take thousands of years on a modest farm.

EmperorBob

member

Activity: 67

Merit: 10

In a similar spirit (originally for ssh, I believe):
http://bohwaz.net/archives/web/Bubble_Babble.html
Nonsense sounds, but easily pronounceable (which may help it stick in people's minds better).

jackjack

legendary

Activity: 1176

Merit: 1280

May Bitcoin be touched by his Noodly Appendage

Like you said the problem is that you'll never get 64 bits

btceic

sr. member

Activity: 392

Merit: 250

♫ A wave came crashing like a fist to the jaw ♫

While good in theory, and I do like your idea, I must point out that human error will get in the way of this.

For instance:

Original:
"This is a human readable passphrase"

Tampered:
"This is a human readeble passphrase"
"This is a human readable paspphrase"

All very similar and if you don't really look at it, would be quite easy for a hacker or malicious user to try and mess with unsuspecting folks.

fpgaminer

hero member

Activity: 560

Merit: 517

Hash Phrase Github

4ba38d48a60f1b29e9eb726eaff08b2e83d8d81e031666fee50e85900d7dc1ef
versus
Theologian examine writer walking desire

While working on my Hardware Wallet project, and while using Bitcoin in general, I often found myself manually comparing destination Bitcoin addresses. This is a good practice, to ensure the destination address is not tampered with by malicious software. However, addresses are base58 gibberish, and thus difficult for a human being to compare; they could be fooled by a partial collision attack.

In addition to that, I had the thought that a browser extension which displays a webpage's hash to the user would be helpful for security critical, static websites like brainwallet. That would help prevent tampering. But nobody is going to memorize a hexadecimal hash.

One solution is a human readable hash function; one which is easy for humans to read, memorize, and compare. My first implementation, Hash Phrase, outputs a phrase composed of common English words. For example, the Hash Phrase of my sig's Bitcoin address is "Random baptism coaches seymour parishes." Easy to read, memorize, and/or compare!

I'm interested in discussion on this particular implementation, alternate solutions, and applications. For example, this implementation suffers from entropy starvation; it can only reasonably represent ~64-bits. Its phrases are also sometimes confusing and weird, which makes them difficult to remember. A procedural image based hash function would likely be a better solution, but is more difficult to implement well.

Topic: Hash Phrase - Human Readable Hash Function (Read 3588 times)