Finished adding the other half of the block header not shown earlier to the output of the tests - putting in the final XOR with input block stage of Blake-256 that I forgot about when testing, in order to make it a full Decred process that will work, and porting tests to match. Also changed protocol to something more Icarus-like, in preparation for CGMiner support, and changed some stuff in the core transform, helping me drop slices, making the design smaller.
Previously, slice usage was ~1,300 out of the LX9's 1,430. Dropped it to 1,205 in this latest synthesis, but that isn't yet enough free logic to allow me to speed it up yet, I don't think.
Sounds promising, do you expect similar kind of speeds compared to other Blake algo coins on FPGA devices? I am also curious if you are using a more modern readily available FPGA device?
Backstory is, this implementation was done simply because people kept telling me it couldn't be - the current public implementation of Blake-256 takes up 16k slices on a Spartan-6 LX150 - so it was definitely impossible to fit on an LX9. So I did. Performance is not going to be good on the LX9, of course - I'm using a Mojo v3 for testing. The real chip design is in developement, on my SoCKit - details on it here:
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=165&No=816&PartNo=1 - for larger chips and targeting the most hashrate I can get out of it. It's got a Cyclone V FPGA and ARMv7 dual-core on the same chip, and 1GB of DDR3 connected to the ARMv7 side, with the other 1GB connected to the FPGA. It has TONS of wonderful things onboard to program - I've already played with the software side of it, compiling the kernel with Altera's patches, compiling the preloader and u-boot, and putting Arch Linux ARM on it - but now I'll be working with the FPGA more, and connecting the two together.
It can be a fully self-contained miner, too; here's just a short list of fun ideas I've had so far:
- Using the onboard LCD to output hashrate
- Reading temps off the sensor and displaying that
- Connecting some of the buttons/switches to start/stop hashing
- Lighting an LED when hashing is enabled and turning it off when disabled
- LED flash on share found
- Using hardware interrupts to notify the CPU of shares
- Putting the ARMv7 into low-power mode while mining
The last two kind of help each other - you can signal an interrupt from the FPGA, and handle it in your own Linux kernel module, passing it to the userspace miner, and letting it check and submit, if need be. The advantage here is no need for polling, which reduces shit the CPU has to execute - add that to enabling the low-power state and it could be awesome.