Author

Topic: 🖤 (Read 182 times)

full member
Activity: 206
Merit: 450
September 29, 2023, 05:57:12 PM
#8
Quote
So put simply, the only way to see if 2 identical addresses exist in any bit range with different private keys is to have all the 2^160 addresses in a file to compare for collisions when we brute force any range.  There might be other ways but to be 100% sure that's the only way, even to find one collision?


Sorry if I ask stupid questions, my head is not clear past few days.  As always thank you all so much for the time you spent to answer.

Let's first try to see if there is an "impossible" address, that is - no public key maps to that address.

1. When randomly mapping a n-bit number x to n-bit number y, the probability of having k distinct x-es mapping to y is e-1/k!
2. Starting with a private key (256 bit) we map it to public key, then sha256, and finally ripemd160. This is 256 bit to 160 bit random mapping.
3. Fixing the 96 MSB of the private key we get 160 bit to 160 bit mapping. The probability of not reaching y is e-1.
4. We repeat this for every possible 96 MSB. The probability of not reaching y every time is (e-1)296 = e-296 < 2-114302077158074026402637675936.

Well, we could safely say there's no such address.

Let's try to see if there is an address to which points only single public key.
The probability of having zero mappings is the same as having one mapping:
e-1(e-1)296-1 = e-296
We get the same number as above.

There is no address with a single public key.
We expect the number of public keys pointing to that address to be quite close to 296.

---

Now let the private key lays in certain 160-bit interval (or is in a random set of 2160 distinct private keys).
The probability of not reaching an address y is e-1 ≈ 36.8%
The probability of reaching an address only once is the same.
The probability of collision is 1-e-1-e-1 ≈ 26.4%

---

Hopefully this brings some clarity.

For info on the math leading to these results one could look at the article "Random Mapping Statistics", or the book "Analytic Combinatorics".

copper member
Activity: 821
Merit: 1992
September 29, 2023, 01:35:29 PM
#7
Quote
There might be other ways but to be 100% sure that's the only way, even to find one collision?
As long as some hash function is not fully broken, brute force is the only way. However, if you can break some hash function, then you can mount an attack, and potentially reach that answer faster than by brute force. But to do that, you need to understand it on a level, where you can put some hash as your input, and reach the exact number of possible messages, that could produce it, as your output. And it could be possible, if you could dig into internal operations of hash functions, and construct a proof, that a combination of some of them, will allow reaching exactly M inputs from N outputs.

Even if something like that is possible, I guess it is harder than finding a single collision. However, there are some mathematical operations, where you can count something, without checking every possible case. One of those are the number of points on elliptic curve. By having p-value, and the curve equation, it is possible to calculate the exact n-value, without visiting N points, one-by-one. So, there may exist some algorithms, to do some similar things on top of hash functions, but I guess they are not simple, and I guess even if they exist, nobody revealed them yet.

But, even if you could reach something like that, I guess it would be quite useless. Because currently, you can estimate it as 2^96=79228162514264337593543950336. And imagine you could have some algorithm, that could tell you, that for 18oaCk6aLBNTh6f9h6Jh9hGeSQm6S6MRC9, it would be for example exactly 23136222216537552491922889594 matching keys, but for 1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm, it would be exactly 164065158840186683045169679 keys. Let's assume you could get that. What then? I guess it could be just some useless information, if that would work in a similar way as counting points on elliptic curves, where generating n-value won't help you in breaking it.

So yes, in theory, there could exist some way, where you could get some results, similar to those ones:
Code:
+------------------------------------+-------------------------------+
| address                            | number of keys                |
+------------------------------------+-------------------------------+
| 1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm |   164065158840186683045169679 |
| 1LagHJk2FyCV2VzrNHVqg3gYG4TSYwDV4m | 30532549733365734454036793384 |
| 1NZUP3JAc9JkmbvmoTv7nVgZGtyJjirKV1 |  7267057953132352762409993259 |
| 1MnyqgrXCmcWJHBYEsAWf7oMyqJAS81eC  | 13706592529034439826148909667 |
| 1E1NUNmYw1G5c3FKNPd435QmDvuNG3auYk | 22600489248816315494910839255 |
| 1UCZSVufT1PNimutbPdJUiEyCYSiZAD6n  | 22781771574697636554236098866 |
| 1BYbgHpSKQCtMrQfwN6b6n5S718EJkEJ41 | 53747544880598209823088174875 |
| 1JMcEcKXQ7xA7JLAMPsBmHz68bzugYtdrv | 70146720541911932714723581592 |
| 1CijKR7rDvJJBJfSPyUYrWC8kAsQLy2B2e |  6271389058184764912135711583 |
| 1GDWJm5dPj6JTxF68WEVhicAS4gS3pvjo7 | 62578228669668857908823271124 |
| 1NokMRjkCGBmy8F3JRdX5XHyXqY8Yxvd4i |  7241446013791069824890434332 |
| 1LWsLyY2j2mPtYcG9yG2bDFwTWryiJL6sp | 12168047268834663722106727065 |
| 1Q2ndXEiSc1yevvPzEfQ4wuwKEhyyE1Nsj | 16405252121437584980825029520 |
| 1KgVr2GExouGLAeAt79KwxykCck4k9Cexk | 20174602953627792613246289576 |
| 13xG3mC4TXUn8cRgqJG5eG9TDw4fFcztfE |  2475362220379810925834770649 |
| 18XrReT5ChW8qgXecNgKTU5T6MrMMLnV8H | 20320337312409219138630205687 |
+------------------------------------+-------------------------------+
But then, I have a question: what you can get out of it? Because for me, it looks like something useless, even if you could get that precisely, without checking every single key, and without breaking any hash function. What is the point of having some results, similar to those in this table, even if you could?
hero member
Activity: 862
Merit: 662
September 29, 2023, 12:40:39 PM
#6
You only can get some approximation of the expected number of address, but is only that an approximation. Since the address is generated from a double hash system sha256 and rmd160 the approximation can or can't match because the out of those hash look like random numbers.

I will publish the way to calculate it ASAP
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
September 29, 2023, 10:20:38 AM
#5
Quote
You mean we can't find an estimation of how many probable identical addresses exist in each bit range? So it would be wrong to divide N by 2^96 and then divide any bit range by that result?

You could divide the range, but then it will just be a number and there is no way to know which addresses are on either side of it. The problem is very similar to public keys, where you can't "split" a list of public keys with a line based on how big their private key might be. Only in this case, it's even worse, because there's no other information you can associate with the address hash besides itself.
copper member
Activity: 821
Merit: 1992
September 29, 2023, 08:40:05 AM
#4
Maybe you should start with some simple example on some smaller numbers to better see the whole topic. First, start with the procedure of hashing a public key: https://bitcointalksearch.org/topic/m.62916017 Then, imagine that you have only keys from 1 to 256. And that all of your addresses can contain only a single "one", and one character after that. In this case, you will have 256 valid private keys, and also 256 valid public keys, but only 58 valid addresses (a single one, and one character with base58, and you can optionally include any checksum into that, if you really want).

And then, you are trying to count collisions, and find out, which keys are used to generate which address. So, let's see, what will happen, if you compute it for the first 16 keys and addresses:
Code:
+---------+-------------+------------+
| address | private key | public key |
+---------+-------------+------------+
| 13...   |       ...0f | 04 D792... |
+---------+-------------+------------+
| 18...   |       ...10 | 04 E60F... |
+---------+-------------+------------+
| 1B...   |       ...07 | 04 5CBD... |
+---------+-------------+------------+
| 1C...   |       ...09 | 04 ACD4... |
+---------+-------------+------------+
| 1E...   |       ...01 | 04 79BE... |
| 1E...   |       ...05 | 04 2F8B... |
+---------+-------------+------------+
| 1G...   |       ...0a | 04 A043... |
+---------+-------------+------------+
| 1J...   |       ...08 | 04 2F01... |
+---------+-------------+------------+
| 1K...   |       ...0e | 04 499F... |
+---------+-------------+------------+
| 1L...   |       ...02 | 04 C604... |
| 1L...   |       ...0c | 04 D011... |
+---------+-------------+------------+
| 1M...   |       ...04 | 04 E493... |
+---------+-------------+------------+
| 1N...   |       ...03 | 04 F930... |
| 1N...   |       ...0b | 04 774A... |
+---------+-------------+------------+
| 1Q...   |       ...0d | 04 F287... |
+---------+-------------+------------+
| 1U...   |       ...06 | 04 FFF9... |
+---------+-------------+------------+

01   04 79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798 483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8   1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm
02   04 C6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5 1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A   1LagHJk2FyCV2VzrNHVqg3gYG4TSYwDV4m
03   04 F9308A019258C31049344F85F89D5229B531C845836F99B08601F113BCE036F9 388F7B0F632DE8140FE337E62A37F3566500A99934C2231B6CB9FD7584B8E672   1NZUP3JAc9JkmbvmoTv7nVgZGtyJjirKV1
04   04 E493DBF1C10D80F3581E4904930B1404CC6C13900EE0758474FA94ABE8C4CD13 51ED993EA0D455B75642E2098EA51448D967AE33BFBDFE40CFE97BDC47739922   1MnyqgrXCmcWJHBYEsAWf7oMyqJAS81eC
05   04 2F8BDE4D1A07209355B4A7250A5C5128E88B84BDDC619AB7CBA8D569B240EFE4 D8AC222636E5E3D6D4DBA9DDA6C9C426F788271BAB0D6840DCA87D3AA6AC62D6   1E1NUNmYw1G5c3FKNPd435QmDvuNG3auYk
06   04 FFF97BD5755EEEA420453A14355235D382F6472F8568A18B2F057A1460297556 AE12777AACFBB620F3BE96017F45C560DE80F0F6518FE4A03C870C36B075F297   1UCZSVufT1PNimutbPdJUiEyCYSiZAD6n
07   04 5CBDF0646E5DB4EAA398F365F2EA7A0E3D419B7E0330E39CE92BDDEDCAC4F9BC 6AEBCA40BA255960A3178D6D861A54DBA813D0B813FDE7B5A5082628087264DA   1BYbgHpSKQCtMrQfwN6b6n5S718EJkEJ41
08   04 2F01E5E15CCA351DAFF3843FB70F3C2F0A1BDD05E5AF888A67784EF3E10A2A01 5C4DA8A741539949293D082A132D13B4C2E213D6BA5B7617B5DA2CB76CBDE904   1JMcEcKXQ7xA7JLAMPsBmHz68bzugYtdrv
09   04 ACD484E2F0C7F65309AD178A9F559ABDE09796974C57E714C35F110DFC27CCBE CC338921B0A7D9FD64380971763B61E9ADD888A4375F8E0F05CC262AC64F9C37   1CijKR7rDvJJBJfSPyUYrWC8kAsQLy2B2e
0a   04 A0434D9E47F3C86235477C7B1AE6AE5D3442D49B1943C2B752A68E2A47E247C7 893ABA425419BC27A3B6C7E693A24C696F794C2ED877A1593CBEE53B037368D7   1GDWJm5dPj6JTxF68WEVhicAS4gS3pvjo7
0b   04 774AE7F858A9411E5EF4246B70C65AAC5649980BE5C17891BBEC17895DA008CB D984A032EB6B5E190243DD56D7B7B365372DB1E2DFF9D6A8301D74C9C953C61B   1NokMRjkCGBmy8F3JRdX5XHyXqY8Yxvd4i
0c   04 D01115D548E7561B15C38F004D734633687CF4419620095BC5B0F47070AFE85A A9F34FFDC815E0D7A8B64537E17BD81579238C5DD9A86D526B051B13F4062327   1LWsLyY2j2mPtYcG9yG2bDFwTWryiJL6sp
0d   04 F28773C2D975288BC7D1D205C3748651B075FBC6610E58CDDEEDDF8F19405AA8 0AB0902E8D880A89758212EB65CDAF473A1A06DA521FA91F29B5CB52DB03ED81   1Q2ndXEiSc1yevvPzEfQ4wuwKEhyyE1Nsj
0e   04 499FDF9E895E719CFD64E67F07D38E3226AA7B63678949E6E49B241A60E823E4 CAC2F6C4B54E855190F044E4A7B3D464464279C27A3F95BCC65F40D403A13F5B   1KgVr2GExouGLAeAt79KwxykCck4k9Cexk
0f   04 D7924D4F7D43EA965A465AE3095FF41131E5946F3C85F79E44ADBCF8E27E080E 581E2872A86C72A683842EC228CC6DEFEA40AF2BD896D3A5C504DC9FF6A26B58   13xG3mC4TXUn8cRgqJG5eG9TDw4fFcztfE
10   04 E60FCE93B59E9EC53011AABC21C23E97B2A31369B87A5AE9C44EE89E2A6DEC0A F7E3507399E595929DB99F34F57937101296891E44D23F0BE1F32CCE69616821   18XrReT5ChW8qgXecNgKTU5T6MrMMLnV8H
And then, you can add compressed keys into all of that, to get the whole picture. See? As vjudeu said, there are collisions, but nobody knows any simple mathematical relation, that could tell you that "in the first 16 keys, you will have two, that will start with 1E...". Some addresses may be unique, but it is unlikely if the space is sufficiently huge, and the hash function is good enough. Some may be missing, but then, that would mean something is really wrong with the hash function, which was used to generate them. On average, you will get 256/58=4+(24/58), which means, you can expect four "1E..." addresses in all keys from 1 to 256 range. But you can get three or five in practice, and if you generate the full table for some simple cases like that, you will see, how it works.
copper member
Activity: 909
Merit: 2301
September 29, 2023, 05:22:34 AM
#3
Quote
Is there any formula to calculate how many identical base58 addresses exist in each bit range?
No, you can only estimate that, but you will not get the exact numbers, unless you want to do that in a very small range, where you can just check all of them.

Quote
Since each bit is twice the size of it's previous bit, how should I accurately calculate how many of 2^96 collisions could exist in a certain bit range?
Well, if you have 2^256 on one side, and 2^160 on another, you can get 2^(256-160), which is equal to 2^96. It is just size of all public keys, divided by size of all addresses. Nothing more, nothing less, but for such huge numbers, it is only an estimation. It could be 2^95 or 2^97 as well for some specific values, nobody knows the exact number, because nobody counted all of them one-by-one, and nobody found an algorithm, to quickly and accurately count them, without breaking math problems behind them.

Also note that for Script-based hashes, this number is probably much bigger, because then you can have a script, that for example pushes some 520-byte value on the stack.

Quote
You mean we can't find an estimation of how many probable identical addresses exist in each bit range?
Estimation is always possible, but it could be wrong, because you could estimate it as 2^96, while it could be 2^97 or more, if you count compressed and uncompressed keys separately, or if you take other things into account.

Quote
So it would be wrong to divide N by 2^96 and then divide any bit range by that result?
If you want to get the probability of a collision, then it is something like that.

Quote
but is the calculation above correct?
To some extent yes, but you cannot be 100% sure, that the real probability is 5*10^(-29), and not for example 6*10^(-29).

Quote
Meaning there are no collisions in 66 bit
It is almost guaranteed that there are collisions in 66 bit. If there would be absolutely no collisions, that would mean RIPEMD-160 is totally broken.

Quote
or you are saying we can't really tell because there is no mathematical relation?
There is a mathematical relation, but it is unknown, and hard to simplify, because of the avalanche effect. However, if you want to for example count, how many collisions are present, where you have some 32-bit hash function on input (for example reduced SHA-256 from 32-bit internal values to 4-bit internal values), and 20-bit hash function on output (for example RIPEMD-160, also reduced to 4-bit internal values), then you can use brute force, to get the exact results.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
September 29, 2023, 03:21:46 AM
#2
No, you can't do that because an address hash is made with a SHA-256 + Ripemd-160 (plush a double SHA-256 checksum) and the properties of these algorithms are non-trivial and therefore do not allow it to be mapped between any two ranges of an address.
copper member
Activity: 1330
Merit: 899
🖤😏
September 29, 2023, 01:58:37 AM
#1
🖤
Jump to: