FPGA Development (SHA256 core) | Bitcointalksearch.org

tinkerer

newbie

Activity: 1

Merit: 0

I've got a DE0-Nano and have been trying unsuccessfully to get the open source fpga miner to compile for it. I'm new to FPGA development and was wondering if someone already has a port running on the Nano and could give me some advice.

Thanks!

LazarusLong

newbie

Activity: 16

Merit: 0

That one would rock!
http://cgi.ebay.de/Micro-Super-Computer-3-x-Altera-Stratix-III-FPGA-/130500717600

sadly its above my budget Roll Eyes

eusor

newbie

Activity: 9

Merit: 0

Quote from: fpgaminer on June 25, 2011, 06:24:19 PM

DE0-Nano is $80 USD for a Cyclone 4 CE22. I haven't used one before, but it seems like a pretty good price point and the board is very spartan.

There are also small Xilinx boards out there that are cheap; just the chip, USB, and GPIO. I can't remember the name of it ... the FPGA mining thread lists it somewhere, amongst lots of other options.

Thanks, I'll check that out Grin

fpgaminer

hero member

Activity: 560

Merit: 517

DE0-Nano is $80 USD for a Cyclone 4 CE22. I haven't used one before, but it seems like a pretty good price point and the board is very spartan.

There are also small Xilinx boards out there that are cheap; just the chip, USB, and GPIO. I can't remember the name of it ... the FPGA mining thread lists it somewhere, amongst lots of other options.

eusor

newbie

Activity: 9

Merit: 0

Quote from: fpgaminer on June 25, 2011, 12:35:24 PM

How about this? http://www.fpga4fun.com/ISEQuickStart.html
And once you have that working, they have an assortment of other little projects with tutorials

Nice link, quite interesting!

I've done fpga in the past on an old vortex 2 pro. Could be interesting to starting playing again..

Any of you could recommend some good FPGA dev board to start up with? Something that has a good power/price ratio?

Thanks Wink

goxed

legendary

Activity: 1946

Merit: 1006

Bitcoin / Crypto mining Hardware.

Quote from: fpgaminer on June 25, 2011, 12:35:24 PM

How about this? http://www.fpga4fun.com/ISEQuickStart.html
And once you have that working, they have an assortment of other little projects with tutorials

Nallatech produces FPGA modules for Xeon sockets. One of these could be used for building FPGA miners
http://www.nallatech.com/Intel-Xeon-FSB-Socket-Fillers/fsb-expansion-module.html

fpgaminer

hero member

Activity: 560

Merit: 517

How about this? http://www.fpga4fun.com/ISEQuickStart.html
And once you have that working, they have an assortment of other little projects with tutorials

redhatzero

full member

Activity: 126

Merit: 100

BTW:
Can anyome recommend a nice fpga/verilog/vhdl Tutorial?
I have one of these http://www.xess.com/prods/prod047.php and I'm still struggling with simple stuff like let a LED glow... Wink

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

I have opened a new Thread specifically for the hardware development of a dedicated FPGA mining system. http://forum.bitcoin.org/index.php?topic=22426.0

I'd like to invite everybody interested in helping to plot out the hardware needed to get a prototype of a modular Mining system no matter his experience.

I especially like to ask all of you who are currently developing this FPGA Miner to help us determine wich FPGA chips are needed at minium for one execution of a full unrolled Miner.

Thank you for your help

LazarusLong

newbie

Activity: 16

Merit: 0

Ahh, thanks! I always thought python code is easy to read Wink

I will publish it when finished. Propably in about 1-2 weeks, I have not that much spare time.

Bloody Bell

newbie

Activity: 18

Merit: 0

Quote from: LazarusLong on June 23, 2011, 05:56:31 AM

Im porting the miner.py to C because I have no python on my embedded system.
Can someone explan me what this python snipped is all about:

Code:

self.fpga.write(struct.pack("B", 1) + job.state[::-1] + job.data[75:63:-1]

Does it mean bytes 75 to 63 are snipped out in reverse order???

b.r.
LazarusLong

the -1 after the second colon means reverse order, but in python the number after the first colon is not the last element of the subrange, it's the first element not included. So job.data[75:63:-1] will give you the [75th, 74th, 73rd ... 65th, 64th] elements.

btw, do you plan to make the C version public?

LazarusLong

newbie

Activity: 16

Merit: 0

Im porting the miner.py to C because I have no python on my embedded system.
Can someone explan me what this python snipped is all about:

Code:

self.fpga.write(struct.pack("B", 1) + job.state[::-1] + job.data[75:63:-1]

Does it mean bytes 75 to 63 are snipped out in reverse order???

b.r.
LazarusLong

fpgaminer

hero member

Activity: 560

Merit: 517

Great work OrphanedGland

And thank you for opening a thread in the Newbies section. It will be a pain to keep track of yet another thread, but I guess it is our only option for now.

If you want, I will happily put your code up on the public repo, if you want it available on there.

njloof

member

Activity: 73

Merit: 10

Quote from: ?? on ??

Subscribe (don't mind me) Cheesy

Am I smoking crack or is there a "notify" button that does this same thing without the threadbump?

OrphanedGland

member

Activity: 70

Merit: 10

Quote from: mpfrank on June 12, 2011, 09:40:45 AM

Quote from: OrphanedGland on June 12, 2011, 09:11:38 AM

Quote from: makomk on June 12, 2011, 05:56:21 AM

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter. The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage. I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device? The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores. A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

Have you considered using carry-save adders to achieve faster clock speeds? Using carry-save effectively pipelines long carry chains, and usually means you can achieve an adder throughput at the limiting clock speed that is achievable for 1 combinational LUT stage between each stage of pipeline registers. I've found that the adder megafunctions included in Altera's tools cannot run as fast.

Seems like a worthy consideration

phillipsjk

legendary

Activity: 1008

Merit: 1001

Let the chips fall where they may.

Quote from: mjoz on June 12, 2011, 09:29:57 AM

For that kind of money you can buy about 200GH/s through noisy over, power consuming, heat producing rigs. FPGA has a long way to go unless your rich and have an irrational desire to go green regardless of the expense.

If you are generating your own power, the start-up costs are cheaper if you can reduce power usage significantly. I did the math for solar power in this post.

Quote

12 Watts is 288 Watt-hours (1.04 MJ) per day. A 12V battery would need a capacity of at least 24 Amp-hours to supply that much load all day (6 amp-hours for a 48V battery). You will want to be able to fully charge the battery in full sun during the day. To do this, the solar panels must charge the battery within ~8 hours (preferably 6). That will take at least 36Watts (assuming 100% battery efficiency) + the 12Watts you are constantly drawing (48 Watts). Round up to a 60Watt panel. For a 60Watt load, you need to multiply all those numbers by 5 (30 amp-hour 48 Volt battery, 300 Watts of panels).

For a 600 Watt load, multiply the 60 Watt load by 10: 300 amp-hour 48V battery, 3000 Watts of panels.

The beauty of it is that once the infrastructure is paid off, you have "free" (but limited) power. You can still keep power hungry machines using grid power on standby in case the network hash rate drops for whatever reason.

Edit: batteries would probably need replacing every 5 years.

PS: Solar panels are just an example. Once bitcoin mining goes industrial, we will see the large miners building mega-projects.

mpfrank

sr. member

Activity: 247

Merit: 250

Cosmic Cubist

Quote from: OrphanedGland on June 12, 2011, 09:11:38 AM

Quote from: makomk on June 12, 2011, 05:56:21 AM

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter. The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage. I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device? The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores. A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

Have you considered using carry-save adders to achieve faster clock speeds? Using carry-save effectively pipelines long carry chains, and usually means you can achieve an adder throughput at the limiting clock speed that is achievable for 1 combinational LUT stage between each stage of pipeline registers. I've found that the adder megafunctions included in Altera's tools cannot run as fast.

ijuz

newbie

Activity: 6

Merit: 0

Quote from: LCID Fire on June 12, 2011, 09:35:24 AM

I would be very interested in that code and perhaps find out whether we can port it to run on GPUs as well.

How do you run Verilog code on an GPU? ;-)

ijuz

newbie

Activity: 6

Merit: 0

Quote from: OrphanedGland on June 12, 2011, 09:26:31 AM

No placement constraints, and virtual pins defined. Clock rate will probably drop when more of these are packed in but I would still expect > 200MHz on a full device.

Nice, Quartus seems to be much better in this regards than ISE.
I build a quarter sized pipeline (no loopback, just to see how behaves) for Virtex6 and it worked decently, a full sized pipeline took already a _very_ long buildtime and the reachable frequency was pretty bad or edatastic.

LCID Fire

newbie

Activity: 1

Merit: 0

I would be very interested in that code and perhaps find out whether we can port it to run on GPUs as well.
That's currently the area the miners run on CPU still.

Topic: FPGA Development (SHA256 core) (Read 13611 times)