Measuring the randomness of a seed phrase - page 3.

o_e_l_e_o

legendary

Activity: 2268

Merit: 18775

Quote from: ranochigo on July 18, 2023, 09:28:18 PM

Now, back to the topic. Sure, you can reject a result where you have 12 consecutive '0's in your key, but that is extraordinarily rare and it would prob never be executed in any code that you write.

I would just point out that this isn't an accurate statement. 12 consecutive 0s has a 1 in 4,096 chance, which is definitely not "extraordinarily rare" to start with. But if you take a 24 word seed phrase with 256 bits, then there is actually around a 1 in 34 chance that you get 12 consecutive 0s somewhere in those 256 bits. And of course you can double that chance if you consider 12 consecutive 1s as well.

So for roughly every seventeen completely random 24 word seed phrases you generate, you'll have a string of 12 consecutive 0s or 1s. This is why it is difficult to assess randomness like OP is proposing. Strings which look random may not be at all, and strings which look predictable can indeed be entirely random.

ranochigo

legendary

Activity: 3038

Merit: 4418

Crypto Swap Exchange

Quote from: jaydee3839 on July 18, 2023, 12:50:11 PM

If you can measure that some are "not random", there ought to be (I would think) some algorithm that captures such combinations and gives them a quantifiable score, which you can then expand towards "less random" combinations. Perhaps you hit a limit at some point, but it seems to me that there should be a mathematical model to represent "badness".

The issue lies when you associated randomness with a uniform distribution. Contrary to popular belief, they are actually not synonymous. For Cryptographically Secure Pseudo-Random Number Generator (CSPRNG), they are subjected to the next-bit test where you cannot predict the next few bits given the first few bits. That requirement is fulfilled by your OS's CSPRNG and thus it qualifies as being sufficiently random.

Now, back to the topic. Sure, you can reject a result where you have 12 consecutive '0's in your key, but that is extraordinarily rare and it would prob never be executed in any code that you write. Hence, there is no good reason for anyone to include test-cases which tests for this. Going by that, the definition of having entropy would then be having the results for which each character has the equal probability of being in each space (ie. non-biased). A counter-example is this:

Code:

52431
43521
24312

Against these which are generated with a CSPRNG:

Code:

52440
24595
35269

The former has low entropy, even though each character appears exactly once, which means that by normal standards, you would consider each character as having the equal probability to occur at least once. Yet, that is predictable. The second is generated with a CSPRNG, which is random yet there are repeated characters present. That is unpredictable. Given a large enough set, think infinity, each of the values would possibly be uniformly distributed. The mathematical model doesn't exist, there is no telling of how random something is because it is not designed to be predictable. Analysis with any results are often done with something that can be measured and thereby predictable.

There is no need to implement any algorithms to test for this. Your wallet client probably incorporates /dev/random which is a CSPRNG within your OS. random continually collected entropy from the environment and blocks if there isn't any sufficient entropy being collected. In addition, your wallet also seeds using entropy collected from other sources. Hence, trying to evaluate entropy is unnecessary and provides a false sense of security.

jaydee3839

newbie

Activity: 14

Merit: 34

Thank you for the response.

It seems to be true that you can definitively say that a seed phrase is "bad"/not random (such as 12 repeating words, or sequential forwards or backwards). If you can measure that some are "not random", there ought to be (I would think) some algorithm that captures such combinations and gives them a quantifiable score, which you can then expand towards "less random" combinations. Perhaps you hit a limit at some point, but it seems to me that there should be a mathematical model to represent "badness".

ranochigo

legendary

Activity: 3038

Merit: 4418

Crypto Swap Exchange

No. If there is a method to accurately determine the randomness of strings or cryptographic keys, we wouldn't have so much issues with CSPRNGs. We would be able to just test the entropy using algorithm. The issue is that there is no way of testing if a key is truly random, variance could skew your results to have more x characters than another for example. Even if you introduce a huge sample size, there is no telling if a bias is inherent or it is just a coincidence with variance. There are instances where the lack of CSPRNG is evident; most evidently with Bitcoin signatures but they are attacked in unique ways and are not determined using a fixed algorithm.

If you are using a reputable wallet, one of the key things that is heavily scrutinized is the CSPRNG mechanism used during seed generation. That being said, you're probably in safe hands.

As to how bad humans are at generating entropy: http://www.loper-os.org/bad-at-entropy/manmach.html.

jaydee3839

newbie

Activity: 14

Merit: 34

I was wondering if there are any measurement techniques (software tools) that can quantify the randomness of a seed phrase. I've read numerous times that humans picking their own seed phrase is not advisable, because it would not have the level of randomness a (quality) computer-generated seed phrase would produce. Therefore, their must be some test or method of measuring this. I'm picturing something like a 0-100 scale, where the first word repeated 12 consecutive times would be 0 or extraordinarily close to 0, and the best entropy sources designed for seed phrase generation would be something close to 100, but there may be other ways to measure.

Is there anything like this? I would think there would be, but I haven't come across is, nor have I heard anyone advertise to "test the randomness of your phrase", though I get the skepticism of entering the phrase into such a system introduces a risk (you'd only want to do it on a trusted, air-gapped device).

For nothing else, I'm curious as to "how bad" a human is at generating seed phrases randomly, versus computer.

Topic: Measuring the randomness of a seed phrase - page 3. (Read 689 times)