This is my attempt to understand SHA-256, Bitcoin's hashing function, at least to the extent needed for a better understanding of Bitcoin. I would appreciate any correction or suggestion from the other members of this forum.
SHA-256 is a function that is used in Bitcoin often. It is used, for example:
- in the Proof of Work (PoW) algorithm,
- to create bitcoin addresses,
- to provide a compact form for denoting transactions.
The name, SHA-256, may suggest something complicated. At least, that was my initial reaction. This very fact may hinder your attempts to understand it. Without involving too many technical terms, let's try to see what it does.
SHA-256 is a
function. As with any other function, F(x)=y
we have 2 important things related to it:
a) the input, x
b) the output, y
Unless you are a mathematician or cryptographer, what's under the hood, the
actual F, is not so important.
All that matters it that SHA-256 takes an input and produces an output.
In this case, our output, y, is also called a
hash.
Unlike any ordinary function (for example a square root) which takes only numbers, SHA-256 can take anything: numbers, letters, text, video, binary file, or anything that can be represented in a digital form.
SHA-256 reduces the input to a well defined output, the hash. The hash is always
256 bits.
This is like a matrix, 16x16, with randomly distributed zeros and ones.
1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 1
0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 1
1 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0
0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1
1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1
0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 1
0 1 1 1 1 1 0 1 0 1 1 1 1 0 1 0
0 0 1 0 0 1 1 0 1 0 0 0 0 1 0 0
1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 0
1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1
1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1
0 0 1 0 1 1 0 1 0 1 0 0 0 0 1 0
0 0 0 1 1 0 1 1 0 1 0 0 1 1 0 0
1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0
0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 1
The number of different ways these 1s and 0s can be distributed is HUGE. It is precisely 2
256, which, according to
this large numbers calculator, is exactly this many combinations:
115792089237316195423570985008687907853269984665640564039457584007913129639936
How huge the above number is can be illustrated by comparing it to the estimated number of
atoms in the observable universe, which is something like 10
80. It is only several orders of magnitude smaller.
SHA-256 produces a unique hash. The chances that you would ever obtain two identical patterns of ones and zeros for two different inputs is zero (
is it actually zero or a negligible small number?). Because of the uniqueness of our hash function output, we can use it as a digital signature.
In Bitcoin, the binary form of the hash can be represented as a hexadecimal number, in which case, there are 64 digits and a relatively more compact form:
92A95D15F5703B0C86A7944166257D7A26849BDEA225A74B2D421B4CEAAC6D21
Properties of a good hashing function- It takes the same time on average to hash large and small inputs.
- It should be fast to compute.
- The outputs of similar inputs should be distinctly different.
- A good hashing function should be collision resistant. This means, no two different inputs should produce the same output.
- Given the output, to find the input one would have to randomly hash all possible inputs (brute force attack). This implies that if one has no clue about the form of the input, finding it, just based on the output, should be practically an impossible task. How difficult it might be to do a brute force attack on SHA-256 even with quantum computers at one's disposal, is illustrated in this post. We are talking about billions of years.
Sources:
https://medium.com/@ConsenSys/blockchain-underpinnings-hashing-7f4746cbd66bhttps://en.bitcoin.it/wiki/SHA-256Now to the questions:
1. I understand the use of hashing in PoW, but what's the point of using SHA-256 to generate bitcoin addresses? Security doesn't look too important here, as we obtain these addresses from public keys, not the private ones.
2. What are the advantages (or disadvantages) of the newest generation of SHA3 Keccak hashing function vs SHA2-256?
3. Is the only benefit of hashing transactions their compact representation?