I'm experimenting with an alternative way to compute SHA-256 -- through tracing boolean functions.
Here's the idea:
1. Represent SHA-256 as a huge-ass DAG (directed acyclic graph) with nodes being boolean operations -- AND, OR, NOT, XOR. (One SHA256 block expansion + transform has 121k nodes after some optimization.)
2. Since we go through ~4 billion different nonces it makes sense to assume all other variables constant and optimize some nodes away. I get 86k nodes with 32 variables left.
3. Pre-compute fringe nodes with few dependencies via binary decision diagrams, normal forms or whatever. Optimize out parts of graph which do not need to be computed with certain configurations of nonce.
4. Now we can compute SHA-256 with about as many CPU instructions as there were nodes in DAG. It is possible to compute many combinations at once in parallel, e.g. on CPU which can do 64-bit operations in parallel it is possible to compute 64 combinations.
I've got some preliminary results, without precomputation it, theoretically, should be about as fast as traditional implementation of SHA-256 on CPUs, probably a bit slower due to required load/store operations. With pre-computation and other optimizations it might be faster. Or maybe not. I don't know.
Anybody wants to discuss? I'm not an expert in discrete math, so maybe some SAT solver or boolean function optimization techniques I don't know would be useful here.
Here's discussion in sci.crypt:
http://groups.google.com/group/sci.crypt/browse_thread/thread/06ffe40905f725feTheoretical considerations:
* traditional implementation makes advantage of native integer addition, it is very expensive to express addition in form of boolean ops
* boolean function implementation, however, does not need rotateright -- it is optimized away at graph building stage
* further, boolean function implementation can take advantage of pre-computation (I see some GPU implementations pre-compute some things, but they cannot go deep).
So this might be of interest for implementation on GPUs which do not have ROTR.