Author

Topic: New scalable pipelined FPGA core for SHA-256 - any interest? (Read 14934 times)

full member
Activity: 196
Merit: 100
Hello,

are there news and where can I get the code to check them on newer hw?

Cheers...


WOW, serious necro there man!

Why not PM the guy, he's still active on the forum.
full member
Activity: 128
Merit: 100
Hello,

are there news and where can I get the code to check them on newer hw?

Cheers...
newbie
Activity: 42
Merit: 0
Any chance of seeing the code you developed? I am very curious and would like to try it on the LX150 devboard I just purchased.

Also see this thread:
http://forum.bitcoin.org/index.php?topic=29169.0

I see that mpfrank has rolled his own pipelined VHDL... http://forum.bitcoin.org/index.php?topic=22415.0   I'd like to try it, but I can't send him a message!   Can someone let him know I'd like to try his code on a Kintex?

Xilinx_Guy
newbie
Activity: 36
Merit: 0
I've got a Stratix IV dev board (the GX 230 model) that I'd try your code on.  It seems like the mining program is where more of the inefficiencies lie. I'm running @ 240MHZ and two cores and getting around 200-300 Mhash/second. 

Using OrphanGland's code and fpgaminer's mining program.

5grand...


did you try overclocking it to 500MHz that it supposedly supports?
member
Activity: 89
Merit: 10

I think you should target the Xilinx spartan 6, LX150(T) :  XC6SLX150-2FGG484C
Costs about 170$ at digikey for one, but I've heard 120$ with some volume.

It does however have less global routing layers compared to Virtex family and
would need some "massaging" to extract maximum performance.
My test implementation without any optimization got 180Mhash/s according to the Xilinx ISE tool.

I'm working on a board with several of these and others are also making spartan-6 boards.

Next year it seems the Artix-7 series will give us the most "bang for the buck".
newbie
Activity: 9
Merit: 0
I am curious how your design would compare to current FPGA champ according to https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGA_Devices

As of this writing best perfomance seems to be around 110 Mhash on a $299(academic) Terasic DE2-115  dev board.

I think a worthy goal would be adopting your design to a lower cost board and finding a sweet spot for perfomance/price.
member
Activity: 99
Merit: 10
I've got a Stratix IV dev board (the GX 230 model) that I'd try your code on.  It seems like the mining program is where more of the inefficiencies lie. I'm running @ 240MHZ and two cores and getting around 200-300 Mhash/second. 

Using OrphanGland's code and fpgaminer's mining program.
newbie
Activity: 36
Merit: 0
what about the Xilinx Virtex 7 would that be pretty fast?
hero member
Activity: 518
Merit: 500
well, you could have simply googled it.
http://www.google.com/search?ie=UTF-8&oe=UTF-8&sourceid=navclient&gfns=1&q=altera+terasic+de3

First Result shows that it's being shipped with a 250W Power Supply, so you can guess that without all the extensions it will probable consume around 100-150 Watts.
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Bitcoin Mhash/s per FPGA:                        150 - 165 Mhash/s (temp-dependent)

Consumption?
hero member
Activity: 518
Merit: 500
you could go for an Altera Cyclone II - I'm using those for my fpgaminer implementation. They are cheap(I think the boards should be arount 150$), consume almost no energy, and are a good starter. I can get around 30MH/s on my Cyclone II's, however you will have to(as I did) consider something for data exchange, as those boards come with minimal interfaces(I implemented a GPIO-RS-232 which is getting/sending it's work from a modified pushpool instance, no longpolling however). Also my design needs to search through the entire nonce space to conserve bandwidth.

However, my design is still way to unoptimized to be release-quality, but the cyclone-II is a good starter(oh, and 30MH/~10W is also quite attractive)
legendary
Activity: 1428
Merit: 1000
i am very interested in this (not only btc, just wanto to dive into fpga-development)

can you advice any pci-e fpga card (max 1000$) to start with?

mining should work... so it can mine while i am at work...
sr. member
Activity: 247
Merit: 250
Cosmic Cubist
Hi,

What is the end target of your project?

1) Mine on FPGA's (meaning you think you can actually think of a way to optimize the design to a point where it becomes efficient)?
2) This will be a prototype for a ASIC implementation
3) It's just for fun

22$/MHash/s right now (165 MHash/s on a 3777$ chip)...not competitive but you gotta start somewhere (just being able to make something that works from scratch is impressive, good job). I think it does not matter as long as you have a plan in mind to make it efficient enough.

I know it's not cost-competitive with GPUs at this point.  But I think the goal here is:

1) People who already happen to have spare FPGA boards lying around can use them to mine BTC's, in a power-efficient and reasonably productive way
2) Prototype for an ASIC, as you said (though this takes big capital)
3) For fun and to learn about technical innards of Bitcoin.  Cheesy
newbie
Activity: 29
Merit: 0
Hi,

What is the end target of your project?

1) Mine on FPGA's (meaning you think you can actually think of a way to optimize the design to a point where it becomes efficient)?
2) This will be a prototype for a ASIC implementation
3) It's just for fun

22$/MHash/s right now (165 MHash/s on a 3777$ chip)...not competitive but you gotta start somewhere (just being able to make something that works from scratch is impressive, good job). I think it does not matter as long as you have a plan in mind to make it efficient enough.
sr. member
Activity: 247
Merit: 250
Cosmic Cubist
You know that the last four rounds of sha256 can be eliminated when mining, right?

No, I didn't know that, but once that insight is combined with my new core it should improve its performance even further.  Cheesy
sr. member
Activity: 247
Merit: 250
Cosmic Cubist
I think you're looking for this thread: http://forum.bitcoin.org/index.php?topic=9047.0;topicseen

Yes, I've already been looking that system.  I think if my new core is integrated into it, that might improve the fpgaminer's performance.
staff
Activity: 4284
Merit: 8808
There is freely available code for this (In VHDL), you might save some time and modify what is already there.
Or is that what you are working on?

After reading about what was available, I thought I'd roll my own from scratch and see if I could do better.  I think my new core is nearly as efficient as possible. 

You know that the last four rounds of sha256 can be eliminated when mining, right?
member
Activity: 109
Merit: 10
sr. member
Activity: 247
Merit: 250
Cosmic Cubist
There is freely available code for this (In VHDL), you might save some time and modify what is already there.
Or is that what you are working on?

After reading about what was available, I thought I'd roll my own from scratch and see if I could do better.  I think my new core is nearly as efficient as possible. 
hero member
Activity: 770
Merit: 500
There is freely available code for this (In VHDL), you might save some time and modify what is already there.
Or is that what you are working on?
sr. member
Activity: 247
Merit: 250
Cosmic Cubist
Hi, I recently graduated from the newbie board, and thought I'd repost this.  I know there have been a number of FPGA mining threads already, but I thought I'd share my contribution...

I've been developing a new optimized SHA-256 core in VHDL.  The design philosophy of this version revolves around these points:

1) Reorganize and aggressively pipeline the round processor so as to achieve a clock frequency (and hardware efficiency) close to the maximum possible on a given FPGA.  (The critical-path delay of this particular design should be no more than one 32-bit add delay, plus register setup time.)

2) For improved scalability to maximally utilize FPGAs of any size, don't unroll the round loop, but instead build a small iterative, single-round processor, many copies of which can be operated in parallel.  Each of these cores can simultaneously hash as many block candidates as it has pipeline stages (4 in this design).  A properly designed work-dispatch unit (still to be written) can ensure that all cores always stay fully utilized hashing block candidates.

As an example of this approach's performance, here are some example stats derived for the current design, based on its compilation for a Stratix III FPGA (EP3SL150F1152C2N, as found in the Altera/Terasic DE3 board).

Area for 1 core, including test rig:             2,113 cells (plus a little memory)
Maximum frequency:                               385 - 421 MHz (depending on temperature)
Clock cycles per SHA-256 (1 chunk):         64 (on average, if pipeline is kept full)
Clock cycles per double-SHA-256:             128 (ditto)
Bitcoin Mhash/s per core:                         3.0 - 3.3 (temp-dependent)
Cores per FPGA:                                     At least 50
Bitcoin Mhash/s per FPGA:                        150 - 165 Mhash/s (temp-dependent)

This particular FPGA is rather expensive; I haven't yet researched which FPGA platform would be most cost-effective for this design.  But, if anyone else is interested in exploring this line of work, and helping to integrate this new core into a more complete mining solution, I would be happy to release the code.
Jump to: