Author

Topic: Comprehensive SHA256 implementer's guide (Read 188 times)

copper member
Activity: 909
Merit: 2301
May 30, 2024, 03:58:18 AM
#8
Quote
Can you also do a short explanation on why Satoshi might have decided not to use a stronger algorithm?

Quote
MD5
https://en.wikipedia.org/wiki/MD5
Quote
On 1 March 2005, Arjen Lenstra, Xiaoyun Wang, and Benne de Weger demonstrated construction of two X.509 certificates with different public keys and the same MD5 hash value, a demonstrably practical collision.

Quote
or SHA-1
https://en.wikipedia.org/wiki/SHA-1
Quote
On 17 August 2005, an improvement on the SHA-1 attack was announced on behalf of Xiaoyun Wang, Andrew Yao and Frances Yao at the CRYPTO 2005 Rump Session, lowering the complexity required for finding a collision in SHA-1 to 2^63.

However, Satoshi initially picked SHA-1, and repeated all steps, which were done in Hashcash. You can confirm that, if you download "BitCoin v0.01 ALPHA" version, and read for example "sha.h" file:
Code:
// This file is public domain
// SHA routines extracted as a standalone file from:
// Crypto++: a C++ Class Library of Cryptographic Schemes
// Version 5.5.2 (9/24/2007)
// http://www.cryptopp.com
This is also one of the earliest timestamps, which can confirm, that Satoshi started working on Bitcoin in 2007. Here, you can see 24th September 2007, but maybe some earlier date can be found elsewhere (or maybe not, because the hashcash version is very old, and the code at that time probably didn't contain much more than just hashing).

Quote
or SHA-2
This is what we have.

Quote
or even SHA-512
See file "key.h" in the same 0.01 version:
Code:
// secp160k1
// const unsigned int PRIVATE_KEY_SIZE = 192;
// const unsigned int PUBLIC_KEY_SIZE  = 41;
// const unsigned int SIGNATURE_SIZE   = 48;
//
// secp192k1
// const unsigned int PRIVATE_KEY_SIZE = 222;
// const unsigned int PUBLIC_KEY_SIZE  = 49;
// const unsigned int SIGNATURE_SIZE   = 57;
//
// secp224k1
// const unsigned int PRIVATE_KEY_SIZE = 250;
// const unsigned int PUBLIC_KEY_SIZE  = 57;
// const unsigned int SIGNATURE_SIZE   = 66;
//
// secp256k1:
// const unsigned int PRIVATE_KEY_SIZE = 279;
// const unsigned int PUBLIC_KEY_SIZE  = 65;
// const unsigned int SIGNATURE_SIZE   = 72;
//
// see www.keylength.com
// script supports up to 75 for single byte push

Satoshi used uncompressed public keys, which took 520 bits (expressed in code as 65 bytes). And he concluded, that they are too big to get a convenient address, and applied SHA-256 on that, and then put it through one more hash function, which is RIPEMD-160.

Quote
I am not very sure why Satoshi chose to use SHA256, RIPEMD160, and even secp256k1.
SHA-256: because it is different than what Hashcash used, because it was considered strong at that time, and because it was available by default in secp256k1.

RIPEMD-160: because he needed some shorter hash for addresses, 512-bit (x,y) coordinates were too big, 256-bit shortened addresses were still not small enough, and SHA-1 was too weak, and there were not that much 160-bit hash functions to choose from.

secp256k1: because it was one of those four curves, considered by Satoshi, and because it was the most secure in that group. Picking for example secp160k1 could be also a good choice for short-term transactions, just like picking 160-bit addresses is good enough for that, but in the long term, the collision resistance is only at 2^80, and can be often bring down into 2^64, and then you enter a danger zone. So, to be safe, it is better to pick some 256-bit curve with 128-bit security. Note: at that time, pay-to-IP was the main use case, addresses were used later as a "feature", recommended to be used only if someone is not online, and everything started just from P2PK, called "direct payment" at that time.

Also, those four curves are one of those, which have quite well-explained parameters (except of the Generator).
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Can you also do a short explanation on why Satoshi might have decided not to use a stronger algorithm?

Some say, it would have been too slow or it would have bloated the Blockchain. (MD5 or SHA-1or SHA-2 or even SHA-512)

To be honest, I am not very sure why Satoshi chose to use SHA256, RIPEMD160, and even secp256k1. I do know that the two hashing algorithms are much stronger than MD5. SHA2 is obviously stronger than SHA1.

MD5 and SHA1 have both been broken since he vanished so there's that.

On the contrary, SHA256 is not slow to run, as newer hardware has opcodes that can run a SHA256 hash directly - x64 cpus have them, ASICs have them, the GPU implementation of SHA256 by JeanLucPons is quite fast, and so on.
legendary
Activity: 3542
Merit: 1965
Leading Crypto Sports Betting & Casino Platform
Can you also do a short explanation on why Satoshi might have decided not to use a stronger algorithm?

Some say, it would have been too slow or it would have bloated the Blockchain. (MD5 or SHA-1or SHA-2 or even SHA-512)

hero member
Activity: 714
Merit: 1298


sha224 and sha256 only use the first 64 of those constants. The other SHA2 functions use all 80 of them:

https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions

Interestingly, SHA1 also uses 80 rounds.

Yeah, agreed, as defined by NIST standard,  sha512/256 is sha512 truncated to 256 bits   and as such it uses 80 rounds with 80 constants, while sha256 has 64 rounds and relies on 64 constants.

According to NIST, besides SHA1 the whole 80 rounds are also valid for SHA384 and truncated SHA512/224
copper member
Activity: 909
Merit: 2301
Quote
How does that work, you might ask? Well, let's take the first prime for example: 2. The first digits of the square root of 2 are: 1.4142135... but we are not interested in these decimal digits. We want to convert this into binary first.
There is a better way to see those values. Just square them, and see, how their 64-bit representation looks like.

SHA-1 k-value constants squared:
Code:
5a827999^2 = 1fffffff 4d25fd71
5a82799a^2 = 20000000 022af0a4

6ed9eba1^2 = 2fffffff abd2fb41
6ed9eba2^2 = 30000000 8986d284

8f1bbcdc^2 = 4fffffff 29bbdd10
8f1bbcdd^2 = 50000000 47f356c9

ca62c1d6^2 = 9ffffffe b29c5ee4
ca62c1d7^2 = a0000000 4761e291
Note that in case of SHA-1, we have sqrt(2), sqrt(3), sqrt(5) and sqrt(10). We don't use sqrt(7) here, which would mean using 0xa953fd4e instead of 0xca62c1d6:
Code:
a953fd4e^2 = 6fffffff 373743c4
a953fd4f^2 = 70000000 89df3e61

SHA-256 initialization vector squared:
Code:
16a09e667^2 = 1fffffff d4e9b3d71
16a09e668^2 = 20000000 022af0a40

1bb67ae85^2 = 2fffffff e33ff1119
1bb67ae86^2 = 30000000 1aace6e24

23c6ef372^2 = 4fffffff b8d799ec4
23c6ef373^2 = 50000000 0065785a9

2a54ff53a^2 = 6fffffff e08b41124
2a54ff53b^2 = 70000000 35353fb99

3510e527f^2 = afffffff b7e799b01
3510e5280^2 = b0000000 220964000

39b05688c^2 = cfffffff ec82a0c90
39b05688d^2 = d0000000 5fe34dda9

41f83d9ab^2 = 10fffffff 7e8155839
41f83d9ac^2 = 110000000 0271d0b90

45be0cd19^2 = 12fffffff f56110c71
45be0cd1a^2 = 130000000 80dd2a6a4
Of course, here we have bigger values than 64-bit, because instead of handling only the fractional part, the whole number is handled, so when we have 1.4142135, we square 14142135, instead of squaring 4142135.

Edit: By the way, RIPEMD-160 uses exactly sqrt(7) instead of sqrt(10):
Code:
5a827999^2 = 1fffffff 4d25fd71
5a82799a^2 = 20000000 022af0a4

6ed9eba1^2 = 2fffffff abd2fb41
6ed9eba2^2 = 30000000 8986d284

8f1bbcdc^2 = 4fffffff 29bbdd10
8f1bbcdd^2 = 50000000 47f356c9

a953fd4e^2 = 6fffffff 373743c4
a953fd4f^2 = 70000000 89df3e61

50a28be6^3 = 7ffffff effd7a7193cccb58
50a28be7^3 = 8000000 3c2f7661d85726f7

5c4dd124^3 = bffffff e389014cc2c2e640
5c4dd125^3 = c000000 47611339e737c0dd

6d703ef3^3 = 13ffffff 7658d9473fa5bc6b
6d703ef4^3 = 14000000 02b32dbebe464940

7a6d76e9^3 = 1bffffff bdde5c5c84c38579
7a6d76ea^3 = 1c000000 6d83e8cb9cfcca68
Another interesting fact is potential expansion of SHA-1 from 160-bit into 256-bit hash function, to preserve the first 160 bits "as they are". If anyone needs that kind of change, then this Initialization Vector can be used:
Code:
67452301^66666666=01234567
efcdab89^66666666=89abcdef
98badcfe^66666666=fedcba98
10325476^66666666=76543210
c3d2e1f0^c3c3c3c3=00112233
8796a5b4^c3c3c3c3=44556677
4b5a6978^c3c3c3c3=8899aabb
0f1e2d3c^c3c3c3c3=ccddeeff
And those values are also used elsewhere, for example here: https://www.rfc-editor.org/rfc/rfc7008
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org


There are also 64 round constants K, although we almost always make this an array. Their initial values are the first 32 bits of the cube roots of the first 64 prime numbers.


I had a quick  comparison with the  NIST standard and noticed the discrepancy relevant to those constants.

NIST refers to eighty of them.

Does it make any sense to limit their number in your scheme of calculation or I  have missed something?

sha224 and sha256 only use the first 64 of those constants. The other SHA2 functions use all 80 of them:

https://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions

Interestingly, SHA1 also uses 80 rounds.
hero member
Activity: 714
Merit: 1298


There are also 64 round constants K, although we almost always make this an array. Their initial values are the first 32 bits of the cube roots of the first 64 prime numbers.


I had a quick  comparison with the  NIST standard and noticed the discrepancy relevant to those constants.

NIST refers to eighty of them.

Does it make any sense to limit their number in your scheme of calculation or I  have missed something?




legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
A lot of processes inside Bitcoin are using SHA256 in various places, from creating an address to signing a transaction to mining a block. So in this post I hope to make it clear how most of the internal procedures around SHA256 works so that you can better understand exactly why it is so resistant to cryptanalysis attacks.

This thread is intended to be a resource for anybody whose trying to implement a custom SHA256 operation inside a CPU, GPU, FPGA, or whatever.

What is it useful for?

- Brainflayer
- Vanitysearch
- Recovering lost private keys
- Mining (were SHA256 is done twice)
- Any application where you need to do SHA256 really fast

Interactive demo (not mine): https://sha256algorithm.com/

Warning - don't try to make an implementation for a secure application. This is for speed only.

SHA256 variables

In SHA256, every variable is 32 bits long except for the message itself. Even the arrays have 32-bit elements.

We have 8 variables denoting state. Usually this would be in an array practically speaking, but theoretically you could call the variables a, b, c, d, e, f, g, and h. they are all 32 bits long. Initially, they are set to the first 32 bits of the square roots of the first 8 prime numbers.

When the SHA256 operation is finished then these variables are all concatenated together and it is returned as the final checksum.

How does that work, you might ask? Well, let's take the first prime for example: 2. The first digits of the square root of 2 are: 1.4142135... but we are not interested in these decimal digits. We want to convert this into binary first.

Also it is worth noting that we don't use the numbers before the decimal point. In this case it is 1, so we subtract the integer part from the number to get only the fractional part.

Well, how do we convert this fractional number into binary, and is it even possible?

The easiest way to do this is to multiply the number by 2**32 and truncate the result in order to get the first 32 bits on their own. You can adjust the exponent if you want to get a different number of bits. e.g. 2**64.

There are also 64 round constants K, although we almost always make this an array. Their initial values are the first 32 bits of the cube roots of the first 64 prime numbers. Calculated in a similar way to above.

Finally we have a list of words. As in, in binary nomenclature "words". And by the way, in the SHA256 algorithm, a word is defined as 32 bits, and not having its own meaning. We call these W and there are 64 of them.

Input preparation

SHA256 works on chunks of 512 bits i.e. 64 bytes. So, the input has to be padded by 64 bytes and this is done automatically by the operation.

SHA256 takes an arbitrary string as an input, even an empty string. However, we always add 0x80 to the end of the string (technically a single 1 bit is all the implementation needs), and then after that we keep adding 0x00 bytes to the end until the length is a multiple of 512 bits, or 64 bytes actually.

Then we are going to 56 more 0x00 bytes at the end, followed by 8 bytes containing the original length of the message. (SHA256 supports messages up to 2**64 bits large).

Operations on chunks

For each 512-bit chunk, we are going to run a particular algorithm on the chunk 64 times. In the process, it is going to update the variables a through h. The reason why these variables are also changed is because they are used in the algorithm as well in the processing of the next chunk. When you have the final result depending on the values of each 512-bit chunk to create a particular result, it becomes much harder to forge hashes.

The first thing that is done before the rounds begins is that we are going to set the values of W. As I have mentioned before, there are 64 W values. The 512-bit chunk is broken into parts of 32-bits each to set the first 16 W values. The other 48 W values are calculated by the following formula, assuming zero based index:

W[i]=σ1(W[i-2])+W[i-7]+σ0(W[i-15])+W[i-16]     16 <= i <= 63

We will get to the sigma functions in a minute. But first I would like to note that this equation can be unrolled out of a loop, can can be ran in batches of 16. Effectively this means that words 16-31 of W can be calculated after the first 16 rounds, words 32-47 of W can be calculated after the second 16 rounds, and words 48-63 after the third 16 rounds which is before the fourth and last 16 rounds.

This process is called expansion in SHA256.

Here is some code (that is not mine) which shows how this is done in practice:

Code:
// uint32_t * w; // Function parameter that contains the 512-bit message chunk i.e. the first 16 words of W

SHA256_RND(0); // This does 16 rounds on a chunk
WMIX(); // generate the next 16 words of W
SHA256_RND(16); // Do another 16 rounds
WMIX(); // etc...
SHA256_RND(32);
WMIX();
SHA256_RND(48);

Code:
// https://github.com/XopMC/CudaBrainSecp/blob/main/GPU/GPUHash.h#L170-#L187
#define WMIX() { \
w[0] += s1(w[14]) + w[9] + s0(w[1]);\
w[1] += s1(w[15]) + w[10] + s0(w[2]);\
w[2] += s1(w[0]) + w[11] + s0(w[3]);\
w[3] += s1(w[1]) + w[12] + s0(w[4]);\
w[4] += s1(w[2]) + w[13] + s0(w[5]);\
w[5] += s1(w[3]) + w[14] + s0(w[6]);\
w[6] += s1(w[4]) + w[15] + s0(w[7]);\
w[7] += s1(w[5]) + w[0] + s0(w[8]);\
w[8] += s1(w[6]) + w[1] + s0(w[9]);\
w[9] += s1(w[7]) + w[2] + s0(w[10]);\
w[10] += s1(w[8]) + w[3] + s0(w[11]);\
w[11] += s1(w[9]) + w[4] + s0(w[12]);\
w[12] += s1(w[10]) + w[5] + s0(w[13]);\
w[13] += s1(w[11]) + w[6] + s0(w[14]);\
w[14] += s1(w[12]) + w[7] + s0(w[15]);\
w[15] += s1(w[13]) + w[8] + s0(w[0]);\
}

As you can see, although the algorithm prescribes that the last 48 words are generated by the formula, it only uses numbers between 1..16 in the subtraction part. Technically that means you can get away with generating 16 words at a time and overwriting the previous words as you go. For example:

Code:
w[0] += s1(w[14]) + w[9] + s0(w[1]);

Here you can define i to be 16 so that W[i-2] is W[14], W[i-7] is W[9], W[i-15] is W[1] and the last one W[i-16] is just W[0] which we take care of by adding it and then overwriting its value via the += operator.

So effectively this means W[0] in the program now contains W[16] from the math.

Subsequent lines that reference W[0] are doing so because their value of t-2, t-7, t-15 or whatever evaluates to 16, which is what is inside W[0] right now.

The s0 and s1 are the lower sigma functions (named so because there are also upper sigma functions - I will explain in a minute) and they are defined like this:

Code:
#define s0(x) (ROR(x,7) ^ ROR(x,18) ^ (x >> 3))
#define s1(x) (ROR(x,17) ^ ROR(x,19) ^ (x >> 10))

ROR is just the Right Rotate function for 32 bits, i.e. ((x>>n)|(x<<(32-n))).

These lower sigma functions are designed in such a way that the expansion process creates words that are as non-linear (unpredictable) as possible.

Anatomy of a round

After initializing the values of a through h, which is done at the beginning of the function by the way,

not at the beginning of a chunk,

we apply another formula to update these eight variables.

Code:
#define S0(x) (ROR(x,2) ^ ROR(x,13) ^ ROR(x,22))
#define S1(x) (ROR(x,6) ^ ROR(x,11) ^ ROR(x,25))

#define Maj(x,y,z) ((x & y) | (z & (x | y))) // Majority function i.e. (x & y) ^ (x & z) ^ (y & z)
#define Ch(x,y,z) (z ^ (x & (y ^ z))) //  Choice function i.e. (x & y) ^ (~x & z)

// SHA-256 inner round
#define S2Round(a, b, c, d, e, f, g, h, k, w) \
    t1 = h + S1(e) + Ch(e,f,g) + k + (w); \
    t2 = S0(a) + Maj(a,b,c); \
    d += t1; \
    h = t1 + t2;

#define SHA256_RND(k) {\
S2Round(a, b, c, d, e, f, g, h, K[k], w[0]);\
S2Round(h, a, b, c, d, e, f, g, K[k + 1], w[1]);\
S2Round(g, h, a, b, c, d, e, f, K[k + 2], w[2]);\
S2Round(f, g, h, a, b, c, d, e, K[k + 3], w[3]);\
S2Round(e, f, g, h, a, b, c, d, K[k + 4], w[4]);\
S2Round(d, e, f, g, h, a, b, c, K[k + 5], w[5]);\
S2Round(c, d, e, f, g, h, a, b, K[k + 6], w[6]);\
S2Round(b, c, d, e, f, g, h, a, K[k + 7], w[7]);\
S2Round(a, b, c, d, e, f, g, h, K[k + 8], w[8]);\
S2Round(h, a, b, c, d, e, f, g, K[k + 9], w[9]);\
S2Round(g, h, a, b, c, d, e, f, K[k + 10], w[10]);\
S2Round(f, g, h, a, b, c, d, e, K[k + 11], w[11]);\
S2Round(e, f, g, h, a, b, c, d, K[k + 12], w[12]);\
S2Round(d, e, f, g, h, a, b, c, K[k + 13], w[13]);\
S2Round(c, d, e, f, g, h, a, b, K[k + 14], w[14]);\
S2Round(b, c, d, e, f, g, h, a, K[k + 15], w[15]);\

The capital S stands for upper sigma function, ignore these for now we will see them later. Ignore the majority and choice functions as well, we will look at  them later.

Each round (which is implemented in S2Round) switches around the variables. So the variables used in the current round will not be in the same order as the previous round. You can see that it wraps around to its starting order after every 8 rounds.

This particular implementation only shows 16 rounds, the reason which I have explained in the previous section (it only calculates 16 words at a time).

At any rate...

The upper sigma functions have a similar definition to the lower sigma functions, except they are not used in the word expansion process, but inside the rounds.
Their purpose is similar- to make it really hard for anyone to guess what the previous value of the input was.

The majority and choice functions are elementary logical functions. The majority function takes three variables, and in each bit position, takes the one which is most occurring in the three variables. For example, if two variables have a one in a bit position but the third has a zero, that bit is going to be a one. And vice versa. So if you have three variables like 0x0010010, 0x10001000, and 0x01010101, the majority function of all these variables is going to be just 0x0000000 because in each bit position more variables have a 0 bit there than a 1 bit.

The choice function also takes three variables, but uses the first one as a conditional, like an if statement or something. So basically for each bit position:

- If the first variable has a 1 bit there, the result has that bit position set to the second variable's bit
- Otherwise if the first variable has a 0 bit there, the result has the bit position set to the third variable's bit.

You don't actually have to understand how all these helper functions work in order to implement SHA256, as long as you have the function definitions you're good.

Going back to the variables themselves, you can see that before the variables are rotated, that d and h are updated as follows:

h = S1(e) + Ch(e, f, g) + k + w + S0(a) + Maj(a, b, c)
d = d + S1(e) + Ch(e, f, g) + k + w

This adds some additional obfuscation to the state variables before the rotation (i.e. b = a, c = b, etc). Changing the order of the variables like this brings about a major security advantage that makes it difficult to reverse-engineer the previous values of the variables from the final SHA256 hash - which, as you recall, are just the a through h variables stuck together.

Under no circumstances are the values of the round constants K changed to new values.

Wrapping up

After the 64 rounds are finished, you have the eight variables for the chunk. After you get these variables for all the chunks, you need to add them together to get the final eight variables.

What I mean is, You need to have another array of size 8 that is initialized to zero so that you can perform addition-assignment with the state variables.As you are not going to be storing the variables after you discard the chunk. Practically speaking, you do something like this:

Code:
output[0] += a;
output[1] += b;
output[2] += c;
output[3] += d;
output[4] += e;
output[5] += f;
output[6] += g;
output[7] += h;

A reminder that since these variables are all 32-bits or 4 bytes, you need to concatenate them together to get the final 256-bit digest.



Credits:
https://github.com/XopMC/CudaBrainSecp
https://sha256algorithm.com/
https://en.wikipedia.org/wiki/SHA-2
https://blog.daisie.com/sha-256-explained-comprehensive-secure-hash-tutorial/
Jump to: