That's a very important topic.
I am totally pro to open-hardware initiative. At Free Software Foundation, chapter Europe we try to push 'open-v' The World's First Open Source RISC-V-based 32-bit μC for general use.
Now imagine when we're talking about cryptographic arena.
Regards to not trust hidden functions in the hardware. I really liked that interview with Richard Stallman about
lets move on a lite bit ...
==Background==
The general idea of this attack is that SHA2-256 is a merkle damgard hash
function which consumes 64 bytes of data at a time.
The Bitcoin mining process repeatedly hashes an 80-byte 'block header' while
incriminating a 32-bit nonce which is at the end of this header data. This
means that the processing of the header involves two runs of the compression
function run-- one that consumes the first 64 bytes of the header and a
second which processes the remaining 16 bytes and padding.
The initial 'message expansion' operations in each step of the SHA2-256
function operate exclusively on that step's 64-bytes of input with no
influence from prior data that entered the hash.
Because of this if a miner is able to prepare a block header with
multiple distinct first 64-byte chunks but identical 16-byte
second chunks they can reuse the computation of the initial
expansion for multiple trials. This reduces power consumption.
There are two broad ways of making use of this attack. The obvious
way is to try candidates with different version numbers. Beyond
upsetting the soft-fork detection logic in Bitcoin nodes this has
little negative effect but it is highly conspicuous and easily
blocked.
The other method is based on the fact that the merkle root
committing to the transactions is contained in the first 64-bytes
except for the last 4 bytes of it. If the miner finds multiple
candidate root values which have the same final 32-bit then they
can use the attack.
To find multiple roots with the same trailing 32-bits the miner can
use efficient collision finding mechanism which will find a match
with as little as 2^16 candidate roots expected, 2^24 operations to
find a 4-way hit, though low memory approaches require more
computation.
An obvious way to generate different candidates is to grind the
coinbase extra-nonce but for non-empty blocks each attempt will
require 13 or so additional sha2 runs which is very inefficient.
This inefficiency can be avoided by computing a sqrt number of
candidates of the left side of the hash tree (e.g. using extra
nonce grinding) then an additional sqrt number of candidates of
the right side of the tree using transaction permutation or
substitution of a small number of transactions. All combinations
of the left and right side are then combined with only a single
hashing operation virtually eliminating all tree related
overhead.
With this final optimization finding a 4-way collision with a
moderate amount of memory requires ~2^24 hashing operations
instead of the >2^28 operations that would be require for
extra-nonce grinding which would substantially erode the
benefit of the attack.
It is this final optimization which this proposal blocks
from:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/013996.html