The hash function has a 524288 cycles, if I understood everything correctly.
From the article:
"Two new FPGA designs for the Advanced Encryption Standard
(AES) are presented. The first is believed to be the fastest, achieving 25 Gbps
throughput using a Xilinx Spartan-III (XC3S2000) device. The second is
believed to be the smallest and fits into a Xilinx Spartan-II (XC2S15) device,
only requiring two block memories and 124 slices to achieve a throughput of
2.2 Mbps. "
It means, XC3S2000-5 for $78 only can generate 1490,12 h/s; https://www.digikey.com/product-detail/en/xilinx-inc/XC3S2000-5FGG456C/XC3S2000-5FGG456C-ND/1951750
+ SRAM memory like https://www2.mouser.com/ProductDetail/GSI-Technology/GS8160Z36DGT-150?qs=sGAEpiMZZMt9mBA6nIyysJQc0%252bdJyBlRwbATLGJ3Hts%3d, $14 for 2,25 megabyte.
About a hundred bucks for one and a half kh. And it's using the components from the online store. If they use an external AES coprocessor, this can work faster.
DwarfMiner, tell us more about the architecture of the system
I think you're completely ignoring memory latency for implementing the hash function.