Can a hash *ever* repeat?
This question is oddly difficult in bitcoin. In bitcoin, the hashes are used as identifiers.
As pointed out by others already, cryptographic hashing systems are essentially lossy compression. For a given sha256 output, there is at least one input that creates it, and possibly an infinite number. A block header is 80 bytes long, and a sha256 output is 256 bits long. If we assume an even distribution for both the bits of a header (known to be false) and for sha256 (see previous posts, and my addition below), we can expect about 2
384 possible block headers per block ID, with 384 being 640-256.
It gets messy in reality. Large portions of the block header are not evenly distributed, and several of those portions are moving targets. Someone could probably work up reasonable estimates for the actual bits of freedom for a given block header candidate, and from that estimate how many such candidates we can expect per output. I find it interesting enough, but it is late, and I'm tired, so I won't bother.
With that pointless aside out of the way, back to identifiers. We use the block header hashes as identifiers for the block. The network enforces uniqueness of these identifiers in an odd way. Say you are hashing along, and you find a nonce that satisfies a header for the next block, but by a strange twist of fate, that hash just happens to be equal to a freakishly low hash previously found*, perhaps one listed in this thread. Your node announces this to peers by sending them a message with the new block's identifier. They all ignore you, because they already "have" that block and so there is no point asking you for the rest of it. I'm actually not sure how your node would even handle it. I'd have to check the code to see if it would overwrite the old block, or drop the new one. Odds are good that we'll never find out the hard way.
With transactions, it is even worse. The same mechanism exists, but nodes do not (by default) keep full track of all transactions, just unspent ones. It is possible** to create a new transaction that happens to have the same hash as an old transaction. Again, I'm not sure how it would end up without reading the code.
* I'm not sure if this would qualify for impossibly good luck, or impossibly bad luck. Certainly one of these extremes though** Not really. The birthday attack on 2256 is still impossibly huge.I don't actually know that we know if there is a hash with all zeros. The state space of the SHA2-256 compression function is 'only' 768 bits and it's not at all constructed like a permutation on the input. There is clearly internal cancellation, so AFAICT some outputs may be unreachable but we don't know which ones are.
Indeed. The output space of sha256 is currently unknown. We suspect (hope) that it is close to [0-2
256], but we don't exactly
know that. Cryptographic hashes look a hell of a lot like random numbers, by design. One of the standard tests is to generate pairs of hashes from pairs of inputs that differ by a single flipped bit. We expect that about half of the output bits will change between pairs,
on average, and we expect a fairly flat distribution of flip counts for each bit position, again,
on average. The sha family passes these tests, and from this, we gain confidence (but not certainty) about the distribution of outputs.