Is this a typo in the code, which just happens to work because the way Verilog truncates array assignements?
Since you've not managed to coax a reply from the author(s), I'll volunteer a reply, so that you're not just talking to yourself ...
512 bits (64 bytes) is the input to the sha256_transform function (rx_input).
However the altsource_probe (virtual_wire) is limited to 256 bits per channel (or whatever its called).
Luckily most of the data is constant, in fact we only use 96 bits of it vis ...
data <= {384'h000002800000000000000000000000000000000000000000000000000000000000000000000000000000000080000000, nonce_next, data_buf[95:0]};
The 512 bits in the test harness is the second 64 bytes of the 128 byte getwork data field (see my addendum below).
This is truncated to 256 bits in the mining TCL script, and sent to the FPGA.
Only 96 bits of it is actually used, but since the rest is constant, the full 512 bits is reconstructed (including the variable nonce), and passed to sha256_transform.
Since verilog automatically truncates the data, the 512 bit data supplied in the test harness for simulation is shoehorned into the 256 bit data_buf (this is either fortuitous or deliberately done like this, I cannot know which, but it works).
And if you wanted to optimize the FPGA some more, modify the mining TCL script to only send 96 bits to the FPGA in the first place and save a chunk of registers in the altsource_probe.
[EDITED to fix numerous typos and mixup betweed midstate and data]
And one further edit (since I'm making a right dog's dinner of this post, I may as well try to get it correct, if only for my own peace of mind)...
Getwork supplies 128 bytes as "data", which is the 80 byte block header, padded with 0's (and a single 1) and its (fixed) length value to give a 128 byte message. The SHA256 digest requires the message to be split into 64 byte chunks for hashing, so the 128 byte is split into two chunks. The first of which is precalculated to give midstate and supplied within the getwork packet so we don't have to do it ourselves. Only the second 64 bytes is used by the FPGA (and much of this is constant, mainly 0's, and is assumed to be so within the verilog code, hence that 384'h string). To further confuse things, the TCL mining script only sends 32 bytes (256) bits of those 64 bytes to the FPGA, since this is the most altsource_probe allows per channel. The nonce is appended within the FPGA, plus the 384'h constant part and then the full 64 bytes (512 bits) is hashed TWICE.
That's my take on it anyway, if one of the experts wants to correct me, then feel free.
Mark