Author

Topic: SHA256 output as mathematical function of input? (Read 439 times)

jr. member
Activity: 49
Merit: 38
November 06, 2019, 08:12:40 AM
#19
I'm no expert but I believe the SHA256 algorithm involves non-mathematical operations,
such as summing or multiplying the lower and upper halves of intermediate values,
followed by appending arbitrarily bit-masked derivatives to the upper or lower ends,
and this is why it's irreversible.
legendary
Activity: 3472
Merit: 10611
then the only remaining  freedom degrees are the nonce's 32 bits.. still many but not the whole 640 bit header

the thing about hash algorithms is that they have a property called "avalanche effect"[1] which means that if you change even a single bit inside a message, the final hash result should change completely.
wikipedia has an interesting example:
The quick brown fox jumps over the lazy dog
d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
The quick brown fox jumps over the lazy cog
e4c4d8f3bf76b692de791a173e05321150f7a345b46484fe427f6acc7ecc81be


my suggestion is that you should start by debugging the SHA256 algorithm. find an unoptimized SHA256 code and go through it step by step and see how all the intermediate values are being calculated. then change 1 bit and see the effects once again. that can give you a pretty good idea what is going on under the hood and you can see places where you can actually "optimize" things.

[1] https://en.wikipedia.org/wiki/Avalanche_effect
member
Activity: 90
Merit: 91
Thanks for replies, suggestions, and @aliashraf for the "merit".. really appreciated

and sorry for me being late, but I'm involved in a lot of stuff lately

I was thinking that -as underlined by aliashraf- the general problem for sure is huge (stated it seems to still rest unsolved), but I'm in some way confident "that reduction" perhaps could be less impossible Tongue if we consider the mining problem...

I mean, if -for sake of simplicity- we consider:

  • a fixed order of transactions in the block (so fixed merkle root)
  • no misuse of timestamp and version fields from their original meaning

then the only remaining  freedom degrees are the nonce's 32 bits.. still many but not the whole 640 bit header

Ok ok, I know the block hash is the SHA256 applied TWO TIMES to the block header, nevertheless the "whole function" would still apply to just 32 bits input...

And for the PoW we "only" need the beginning 0s, so we would not be interested in all the 256 bits outputs (so we would handle a smaller vector function/system)

When I'll have time I'll try to write a symbolic SHA256 able to mill the numbers and to pass-through variables.. just to check what happen....

..but it won't be soon , for sure  Sad  Grin
legendary
Activity: 3472
Merit: 10611
- to understand what makes an HASH effective against collisions
- to understand if its computation can be optimized in particular circumstances.. a sort of new ASICboost-like optimization let's say

have you studied SHA1?
i think it would be a good idea to first check what happened to SHA1 and how they were able to increase the efficiency of their collision attack so much that it could happen in a much smaller space than 2^80 (i think there was a scientific article about how they could reduce to 2^50). check these articles out and see their approach, it could give you a lot of good information.
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
interesting replies, thank you very much

So, you have restated the problem as:

"per-output-bit" mathematical formula/equation could exist (maybe in the form of a vectorial function to calculate all the output bits at the same time as well), but it would be as long and difficult to be expressed as the number of inputs increase (aka length of hashed string)

and that as... as... wouldn't be polynomial, but exponential  at least...

from an algorithmic point of view this complexity explosion comes from the recursive nature of SHA256 over unbound input... that's why I'm wondering if in your opinion (but I guess the answer is "no"  Sad ) any in-some-way-cyclic mathematical operator like Series or Infinite Products (or something else, maybe in a superset of binary field) could introduce any reduction in formulas, or if you know if this idea has ever been explored.

Thanks!
Good question, but I am afraid it wouldn't be easy to give a rigid answer.
My guess is obviously "NO" but proving it is another issue. The point is if there is any proof for this problem it would be a breakthrough: the security of SHA256 is not proved mathematically because it is ad-hoc and does not belong to the so-called Provably Secure Hash Functions.  Once I got enough time, I'll check it myself but for now, I'm curious if there is any chance to study the security of ad-hoc hash functions by testing the feasibility of such reducibility? My guess is any hash function that is reducible to mathematical functions that their size grows polynomially (instead of exponentially) with the length, is not ultimately secure.
member
Activity: 90
Merit: 91
If I understand correctly your question, what you are asking is if it is actually possible to break the POW Bitcoin algorithm, using a mathematical function to find the nonce to be used in the block header, instead of random guesses, to get the desidered hash value. 
Then I second your request, as a bitcoin enthusiast (and might be investor), but I have a very strong feeling the answer is NO.


Well "to break" maybe is too strong  Grin  let's say I'm interested in SHA256 analysis:

- to understand what makes an HASH effective against collisions

- to understand if its computation can be optimized in particular circumstances.. a sort of new ASICboost-like optimization let's say

I think the first step is to write SHA256 in an alternative form.. algebraical? over which field? as set of relations in graph-db? I don't' know.. I just wanted to explore if someone has already thought about it too
legendary
Activity: 2380
Merit: 17063
Fully fledged Merit Cycler - Golden Feather 22-23
If I understand correctly your question, what you are asking is if it is actually possible to break the POW Bitcoin algorithm, using a mathematical function to find the nonce to be used in the block header, instead of random guesses, to get the desidered hash value. 
Then I second your request, as a bitcoin enthusiast (and might be investor), but I have a very strong feeling the answer is NO.
member
Activity: 90
Merit: 91
interesting replies, thank you very much

So, you have restated the problem as:

"per-output-bit" mathematical formula/equation could exist (maybe in the form of a vectorial function to calculate all the output bits at the same time as well), but it would be as long and difficult to be expressed as the number of inputs increase (aka length of hashed string)

and that as... as... wouldn't be polynomial, but exponential  at least...

from an algorithmic point of view this complexity explosion comes from the recursive nature of SHA256 over unbound input... that's why I'm wondering if in your opinion (but I guess the answer is "no"  Sad ) any in-some-way-cyclic mathematical operator like Series or Infinite Products (or something else, maybe in a superset of binary field) could introduce any reduction in formulas, or if you know if this idea has ever been explored.

Thanks!
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

i don't think it is mathematically possible to come up with such a formula because of how hash algorithms work. SHA256 like all hash algorithms don't work on individual bits, they take a block of them and work on all the bits at the same time then use the result in calculation of next block.
...
It is possible but just not in polynomial time and space to come up with such a formula. For example, having an input string with length 1 (one byte), you can feasibly deduct 256 functions

my point is that all the functions that you could come up with will be the same exact function. the most thing you could do is to skip a very small number of operations in the end. for example to get the first bit the only thing you can skip is the final addition after the 64th step of b to h. this that kind of addition could be done in one CPU register, it is not really a big deal to skip them.
Now I get your point and you are absolutely right but it reduces the complexity with a constant 256 factor which is negligible compared to exponential complexity growth with input length.

Briefly speaking, this is theoretically possible to represent O_i with a mathematical expression but the complexity of such expression grows exponentially with input string length.
legendary
Activity: 3472
Merit: 10611
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

i don't think it is mathematically possible to come up with such a formula because of how hash algorithms work. SHA256 like all hash algorithms don't work on individual bits, they take a block of them and work on all the bits at the same time then use the result in calculation of next block.
...
It is possible but just not in polynomial time and space to come up with such a formula. For example, having an input string with length 1 (one byte), you can feasibly deduct 256 functions

my point is that all the functions that you could come up with will be the same exact function. the most thing you could do is to skip a very small number of operations in the end. for example to get the first bit the only thing you can skip is the final addition after the 64th step of b to h. this that kind of addition could be done in one CPU register, it is not really a big deal to skip them.
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

i don't think it is mathematically possible to come up with such a formula because of how hash algorithms work. SHA256 like all hash algorithms don't work on individual bits, they take a block of them and work on all the bits at the same time then use the result in calculation of next block.
...
It is possible but just not in polynomial time and space to come up with such a formula. For example, having an input string with length 1 (one byte), you can feasibly deduct 256 functions each with a maximum of O(28) complexity (maximum number of operations), I suppose, but for 80 bytes block headers it is very different because you have O(2640) as the ceiling unless you could find heuristic optimizations and/or analytic solutions which are not feasible because of the algorithm and what you have said above.

So, what makes it ultimately infeasible is the length of the input to an NP-like problem: find a mathematical presentation for SHA256 hash of strings with arbitrary length.
legendary
Activity: 3472
Merit: 10611
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

i don't think it is mathematically possible to come up with such a formula because of how hash algorithms work. SHA256 like all hash algorithms don't work on individual bits, they take a block of them and work on all the bits at the same time then use the result in calculation of next block.
in other words f1, f2, ... are all the same exact "formula" and O_1, O_2,... are bits of the final chunk.
One iteration in a SHA-2 family compression function from wikipedia:


so for example to calculate O_1 you have to first calculate small_sigma0 and small_sigma1 for that block to build the 256 bit working vector then perform the permutation (big_sigma0, big_sigma1, CH, MAJ, ROR) on its words 64 times and get A (word) out and get its first bit. there is no math to simplify that.
legendary
Activity: 1456
Merit: 1176
Always remember the cause!
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

where the f* functions are math equations and not algorithms?

Sure you can! SHA256 is a deterministic function of input bits, undisputable, but:

Firstly, it is not a set of 256 functions, but a series of such functions for n ranging from 1 to infinity (size of the string).

Secondly, Although bitcoin block header is fixed size (80 bytes) and a fixed set of functions hypothetically exist for hashing it, those functions are not feasible to be written down or saved or implemented as circuits like actual "formula"s, because such formulas, each, will be infeasibly long.

I've not made a close assessment but I think it would be interesting to prove such an infeasibility in terms of the number of operations/gates it takes to be represented.
legendary
Activity: 1134
Merit: 1118
O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

I might be wrong but surely if you could get neat functions for the mapping between input and output bits it would be relatively straightforward to reverse the hashing process? I don't believe there is anyway to produce a neat set of 256 functions like you're asking, no.
member
Activity: 90
Merit: 91
Thank you everybody replying, and sorry to:

- have originally posted in the wrong forum section (Mining)

- haven't been clear enough


what I was trying to say was:

given SHA256 output bits: O_1, O_2, ..., O_256
and input bits: I_1, I_2, ..., I_n

is it possible to obtain:

O_1 = f1(I_1, I_2, ..., I_n)
O_2 = f2(I_1, I_2, ..., I_n)
...
O_256 = f256(I_1, I_2, ..., I_n)

where the f* functions are math equations and not algorithms?
I mean, for example, given x,y belonging to [0,1]:

XOR = x(1-y)+y(1-x)
AND = xy
OR = x+y-xy
NOT = 1-x

but I don't know for shifts, rotations, mod-additions...

I'm also thinking if recursive structure of SHA256 (deriving from input being unbounded) can -in some way- be substituted by Series, Infinite Product, or other math stuff like that

And maybe considering operators working on binary field is not the best choice, in the same sense complex numbers are sometimes easier to work with than real numbers even when both math function domain and codomain are real.


I'm thinking about those things because I wonder if this "mapping" of SHA256 algorithm into math objects with more workable properties could permit easier hash analysis and optimizations in its computation

Thanks everybody


legendary
Activity: 4522
Merit: 3426
Hi everybody! Is it possible to express SHA256 output bits as MATHEMATICAL function of input bits?
If so, which is the minimum subset of needed mathematical operators? Over which numeric set (e.g. Real or Complex field instead of binary to exploit more freedom degrees)?  Any news about it already out there on the Internet?

The are many explanations of the SHA-256 algorithm. Here is one: https://steemit.com/cryptocurrency/@f4tca7/introduction-to-the-sha-256-hash-function

Keep in mind that the input is arbitrary and unbounded. I suppose that it might be possible to describe each bit of the output as an explicit function of the inputs in some standard mathematical notation, but I doubt it would be practical or even insightful.
legendary
Activity: 2646
Merit: 1722
https://youtu.be/DsAVx0u9Cw4 ... Dr. WHO < KLF
Hi everybody! Is it possible to express SHA256 output bits as MATHEMATICAL function of input bits?
If so, which is the minimum subset of needed mathematical operators? Over which numeric set (e.g. Real or Complex field instead of binary to exploit more freedom degrees)?  Any news about it already out there on the Internet?

Thanks!

I'm also not 100% clear on what you are referring to exactly, so please provide some clarification and further examples.

However, useful proof-of-work and mathematical problem solving / function does already exist in several Bitcoin based alt. coin chains ...

See: gapcoin.club ...

"So the difficulty will simply be the length of the prime gap?

Not exactly. The average length of a prime gap with the starting prime p, is log(p), which means that the average prime gap size increases with lager primes. Then, instead of the pure length, we use the merit of the prime gap, which is the ratio of the gap's size to the average gap size.

Let p be the prime starting a prime gap, then m = gapsize/log(p) will be the merit of this prime gap.

Also a pseudo random number is calculated from p to provide finer difficulty adjustment.

Let rand(p) be a pseudo random function with 0 less than rand(p) less than 1. Then, for a prime gap starting at prime p with size s, the difficulty will be s/log(p) + 2/log(p) * rand(p), where 2/log(p) is the average distance between a gap of size s and s + 2 (the next greater gap) in the proximity of p.

When it actually comes to mining, there are two additional fields added to the Blockheader, named “shift” and “adder”. We will calculate the prime p as sha256(Blockheader) * 2^shift + adder. As an additional criterion the adder has to be smaller than 2^shift to avoid that the PoW could be reused."


The gapcoin blockchain has been producing results and actually breaking world records since late 2014 ...

- gapcoin.club/new-prime-gap-of-maximum-known-merit-press-release.php
- en.wikipedia.org/wiki/Prime_gap

Unfortunately the original developer passed away in 2017.

Also see Primecoin and Riecoin.

...

Can Bitcoin technology / blockchain be utilized to solve and/or calculate other mathematical solutions ... most likely yes ... take your pick ... ?

- en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics
legendary
Activity: 1134
Merit: 1118
I'm not sure what you're really asking? That is what SHA256 is, a one-way function that maps any arbitrary data to a fixed-size bit string. Whether you call that 'mathematical' or not is up to debate on what you consider that term to mean, I consider it mathematical because I consider all algorithms/functions to be mathematical. To be honest, I consider pretty much anything descended from Computer Science theory mathematical.

There isn't an 'equation' that will map the input to the output. There is a long series of steps - an algorithm - but that doesn't seem to be what you're asking for.
member
Activity: 90
Merit: 91
Hi everybody! Is it possible to express SHA256 output bits as MATHEMATICAL function of input bits?
If so, which is the minimum subset of needed mathematical operators? Over which numeric set (e.g. Real or Complex field instead of binary to exploit more freedom degrees)?  Any news about it already out there on the Internet?

Thanks!
Jump to: