If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code. I didn't try to optimize the code in any other way.
Thank you very much for your valuable input. If you have a moment, could you please post a snippet of HDL code that shows how you convinced ISE DS to use DPS48s for adders? Does ISE have some flag to make it infer DSP48s from additions? Or did you have to explicitly instantiate them?
Since my last post in this thread I learned a lot about ISE software. The license is node-locked to the Ethernet MAC address using standard FlexLM technology. So it allows for designing on one system and running the design on another system. I was afraid of a node-locking technology that would require connecting the ML605 board to the system that runs ISE to allow it to check the license.
Also, would you dare to speculate what will be the initial pricing on the Kintex-7 KC705 evaluation kit? I hesitate to buy ML605 right now because I could not really start working on it immediately due to the need to reorganize and remodel my physical workspace. On the other had I'm completely fascinated with contemporary FPGA design after a long break from doing any hardware-oriented design.
Hi, here's a section from the sha256_transform.v. I picked one of the larger adders to replace with DSPs to help preserve logic resources.
I replaced a 4 input adder with a cascade of 2 DSPs. I used coregen to generate two different DSP instances. One with a 2 input adder (dsp_2_input) that used its dedicated carry routing (pcout) to connect to a 3 input adder (dsp_3_input_cascade) that used its dedicated carry routing (pcin).
I know you can ask ISE to infer DSP48s, but I think that's more a shotgun approach that I've never had much luck with.
//////////////// Begin DSP adder new_w ////////////
//wire [31:0] new_w = s1_w + rx_w[319:288] + s0_w + rx_w[31:0];
wire [31:0] new_w;
wire [47:0] new_w_stage1_pcout;
wire[47:0] new_w_stage2_out;
dsp_2_input new_w_stage1 (
.c(s1_w), // input [31 : 0] c
.concat(rx_w[319:288]), // input [31 : 0] concat
.pcout(new_w_stage1_pcout), // ouput [47 : 0] pcout
.p()); // ouput [31 : 0] p
dsp_3_input_cascade new_w_stage2 (
.pcin(new_w_stage1_pcout), // input [47 : 0] pcin
.c(s0_w), // input [31 : 0] c
.concat(rx_w[31:0]), // input [31 : 0] concat
.pcout(), // ouput [47 : 0] pcout
.p(new_w_stage2_out)); // ouput [47 : 0] p
assign new_w = new_w_stage2_out[31:0];
/////////////// End DSP adder /////////////
Now as far as the KC705 boards, I am guessing that they will be about the same price as the ML605s, around $2000. Now, you have to watch out because the first runs of the boards are going to be ES parts (there can be bugs).
At work, we have actually already built boards (not for bitcoin of course!) using ES K7 325T devices. I haven't found any huge speed advantages over the V6 devices, so would not expect any huge frequency increases. However, the device that's going on the KC705 board is going to be larger than the ML605 (240 vs 325?), so you'll have more room for your design. However, the amount of DSP48s is about the same.
As far as porting the code from V6 to K7, if you took the bitcoin code as is, it would build for both just fine. Except for pin changes of course. If you started using DSP48s for the V6, you'd probably have to regenerate those for the K7, but that's not a big deal.