Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 20.

kramble

sr. member

Activity: 384

Merit: 250

Quote from: hardcore-fs on November 02, 2012, 05:52:34 PM

On the FPGA heat, go looking on your PCB for the voltage regulators, then measure the temp on these beasts.

If it is high then COOL the regulator then re-measure the FPGA temp.

When the FPGA "core" voltage drifts, then the FPGA get hot, but that can be due to the Switched Mode regulators drifting because they are hot.

Cooling the FPGA in this situation does very little, since the regulators will fail and may still even damage the FPGA because of the drift.

Good point. Its a linear regulator (LP385005D), and since its dropping from 5V (actually 4.2V under load) to 1.2V (chained via a 3.3V regulator, you can see why Makomk thought it was cheap!), most of the power dissipation is in the regulators rather than the EP4CE22F17C8N, and that was getting to around 60C (I reckon that if I can touch it without burning my finger, its still at a safe junction temperature).

*breathe!*

So what the regulators were doing I hate to guess Angry

Anyway this board is now designated a test & burn device, and I'll get another couple to develop on (Farnell are great, just order it and it almost turns up straight down the intertubes pipe!)

But I guess I'm going to have to jump ship to Xilinx (shame, I was just getting the hang of Quartus), seeing as the word is they cream on hashing performance. The only slight worry is IP licensing (I was just reading the Altera terms, scary!), but I've seen nothing on the boards about problems from that, so maybe its just scareware.

TTFN
Mark

hardcore-fs

full member

Activity: 196

Merit: 100

On the FPGA heat, go looking on your PCB for the voltage regulators, then measure the temp on these beasts.

If it is high then COOL the regulator then re-measure the FPGA temp.

When the FPGA "core" voltage drifts, then the FPGA get hot, but that can be due to the Switched Mode regulators drifting because they are hot.

Cooling the FPGA in this situation does very little, since the regulators will fail and may still even damage the FPGA because of the drift.

kramble

sr. member

Activity: 384

Merit: 250

Hi, if anyone's following my little saga, just a small confession. My figures for MH/s throughput a couple of post's above were rubbish.

In trying to get the code working, I'd messed with the PLL parameters (like, I thought, why is the clock specified at 20MHz, when its physically 50MHz?), so I'd changed it. Well I've got round to reading the documentation, and (as with all things FPGA), its not quite what it seems. AFAIK (and that's not far for considering I'd never touched one of these things until a week ago), the inclk0_input_frequency is just some sort of fudge factor to get the PLL built with sensible parameters for a real 50MHz clock input. So the PLL was hunting all over the place, giving somewhat variable MH/s rates. I'm amazed I was getting any results at all Shocked

Anway I set it back to 20, and I'm now geting a rock steady 12.5MH/s at 100 MHz (and a rather hot FPGA ... I need to sort out some cooling, and a proper 1.2V power supply as the USB rail is drooping horrendously). So considering the device is only about half utilized, and a bit of agressive cooling, I reckon I might get it up to 25MH/s, or a bit more. It'll would still take years to pay back the investment (assuming I ramped up the quantity somewhat), but of course that's not the point here (and BitCoin could be gone tomorrow), but perhaps it would be worth taking a little gamble, if only I could source some cheap parts. Anyone know a good source of scrap mid-range FPGA-based equipment that needs saving from some chinese (oops, no offence meant), melt-down shop? Wink

senseless

hero member

Activity: 1118

Merit: 541

Quote from: BarsMonster on August 17, 2012, 05:51:19 AM

Apparently noone cares :-)

If your shares are getting accepted then everything is fine. But you will have a high rejection rate as there is no long polling in the code.

kramble

sr. member

Activity: 384

Merit: 250

HC

Useful comments.

I must admit that when I started this project, getting involved in circuit design was rather the last thing on my mind. In fact I'd just ordered a Raspberry Pi from Farnell, and having opened an account with them I was looking through their webste for interesting stuff when I found the DE0-Nano. I thought to myself that at that price, I just had to have one, and it arrived the next day! So the Pi's gone on the back burner for now (it makes for nice screensaver video wallpaper on the TV though). It's only *then* that I came across BitCoin, while I was googling for cool things to do with my new toy. So I'm a *real* newbie here!

Yes, I agree that its not going to pay for itself at 7.6MH/s, which is why the freezer idea came to mind (I recall from my uni days that propagation delays scale with temperature, so it should clock faster at lower temperatures). But I don't intend building a farm of DE-0's, and I would hope to get much higher throughputs that that. I gather the sweet-spot for device peformance per dollar is a bit further up the the product range. Unless I can scrounge some free(ish) devices to build with Shocked

Of course the main issue is construction which is much more difficult these days than in my youth. BGA's look like a bitch to deal with, though you only really need the power pins connected for this application. Unless I can keep the parts and construction costs down, then it would never pay for itself.

I'm going to disagree with you about the cooling cost. FPGA's ought to be way more power efficient than GPU's (OK, thats a glib statment, I'd need to back it up with some calculations), and we're only looking at a heat-pump here, so I'd expect a COP somewhere around the 2 mark. So we're only doubling the power budget for what, a 25% speed improvement (completley wild-eyed guess there), and power buget is what FPGA's (and ultimately ASIC's) excell on. And forget the poly bags, I'm thinking sealed boxes filled with transformer oil (messy, but much simpler than heatsinks and cooling fans).

Anyway, this is getting to sound too much like a business proposal, and all I was looking for was something to while away the lazy days of my retirement. But things to do ... I need to work out if my performance figures are correct (like a jump from 7.6 to 14.9 MH/s for a 50% increase in clock speed is insane, I must have done something wrong there). But the proof's in the pudding and the 80MHz build has submitted 9 shares in the last 75 minutes, which I reckon is somewhere around the 8MH/s mark. Oh and I need to get some sleep at some point, it just gone midnight here Undecided

Mark

hardcore-fs

full member

Activity: 196

Merit: 100

Quote from: kramble on November 01, 2012, 06:18:38 PM

Quote from: hardcore-fs on November 01, 2012, 05:03:14 PM

Quote from: kramble on November 01, 2012, 10:22:52 AM

I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device

Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html

Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

Many thanks, that's very useful background! Unfortunately maths was never my strongest subject, but I'll take the time to understand it. I'm more the hack it together and see if it works type than the academic type. Roll Eyes

I was rather hoping that the fpgaminer code would work "out of the box", but it seems things are never that simple.

I have made some progress though. I've been comparing the different versions and compiled the xilinx branch LX150_makomk_Test ... it needed a little bit of tweaking (GOLDEN_NONCE_OFFSET was out by one), but its working at LOOP_LOG2=3 and generating valid hashes

Its bumped up the throughput by 50%, so now I'm getting 7.6 MH/s at 80MHz and 14.9 MH/s at 120MHz (OOPS, belay that remark, its kicking out bad hash'es at 120MHz, not so good).

Multi-core sounds good, perhaps mixing the sizes (say a LOOP_LOG2=3 plus a LOOP_LOG2=4) to fill up the device, however I rather expect throughput to ultimately be thermally bound (the power dissipation will scale with MH/s rather than MHz, at least to a first degree). I plan to see what performance I can get at -20C (freezer temeratures), as this is far more practical with a 10Watt FPGA than a 200W GPU! It would be nice to dynamically set the clock speed too, so the devices can self-calibrate and ramp themselves up to a maximum clock speed. As I said in my earlier post, this is going to be fun. And if I can get the kit to pay for itself, then that's just a bonus Grin

Again, many thanks, hope to stay in touch!

Maths is NOT my strong point either, but i can add up and multiply by 2 (right shift) 'and' 'xor'

e0
e1
ch
maj
sigma0
sigma1

Basically the speed 'weakness' in this algorithm is the long chain additions, the design can be broken down into TWO main sections.

The Expander & the Compressor, since an addition (x+y)+(p+z) is basically the same whichever way you do it.
you can calculate BOTH
(x+y)
(p+z)
At the SAME time, since neither independent result depends on the other.

consider:

w_out(511 downto 480) <= s1 + w_in(319 downto 288) + s0 + w_in(31 downto 0);

Whilst it executes within a "single clock cycle"
process(clk)
....
.....

The shear length of the additions DICTATES the number of logic levels and therefore the MINIMUM clock cycle length, due to the physical implementation of the routing.(you cannot go faster than a CLK cycle, all you can do is ensure your logic shortens it)

Also if you are going to stick shit into the freezer.

1. It ain't going to be a profitable way to mine at 7.6MH/s, since the cooling cost outweighs the bitcoin value
2. SEAL the device in a PLASTIC bag with some silica gel, because when you bring the stuff out of the freezer, moisture in the air is going to condense on the design and destroy it. (in a poly bag, it prevents condensation until the design reaches ambient , at which time it can be brought OUT of the bag. The silica gel acts as a buffer to ensure the bag is super low humidity)

3. Its NEVER going to pay for its-self at 7.6MH/S.

kramble

sr. member

Activity: 384

Merit: 250

Quote from: hardcore-fs on November 01, 2012, 05:03:14 PM

Quote from: kramble on November 01, 2012, 10:22:52 AM

I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device

Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html

Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

Many thanks, that's very useful background! Unfortunately maths was never my strongest subject, but I'll take the time to understand it. I'm more the hack it together and see if it works type than the academic type. Roll Eyes

I was rather hoping that the fpgaminer code would work "out of the box", but it seems things are never that simple.

I have made some progress though. I've been comparing the different versions and compiled the xilinx branch LX150_makomk_Test ... it needed a little bit of tweaking (GOLDEN_NONCE_OFFSET was out by one), but its working at LOOP_LOG2=3 and generating valid hashes

Its bumped up the throughput by 50%, so now I'm getting 7.6 MH/s at 80MHz and 14.9 MH/s at 120MHz (OOPS, belay that remark, its kicking out bad hash'es at 120MHz, not so good).

Multi-core sounds good, perhaps mixing the sizes (say a LOOP_LOG2=3 plus a LOOP_LOG2=4) to fill up the device, however I rather expect throughput to ultimately be thermally bound (the power dissipation will scale with MH/s rather than MHz, at least to a first degree). I plan to see what performance I can get at -20C (freezer temeratures), as this is far more practical with a 10Watt FPGA than a 200W GPU! It would be nice to dynamically set the clock speed too, so the devices can self-calibrate and ramp themselves up to a maximum clock speed. As I said in my earlier post, this is going to be fun. And if I can get the kit to pay for itself, then that's just a bonus Grin

Again, many thanks, hope to stay in touch!

hardcore-fs

full member

Activity: 196

Merit: 100

Quote from: kramble on November 01, 2012, 10:22:52 AM

I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device

Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html

Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

kramble

sr. member

Activity: 384

Merit: 250

Hi, Newbie here, just started playing with fpgaminer on a DE0-Nano.

I'm using the official code from https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner and I've got the DE2_70_Unoptimized_Pipelined branch working nicely (getting 7.5MH/s at 120MHz ... chip gets too hot if I try running any faster).

I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected! (The unoptimised code is fine, I've submitted 200 shares so far).

Now I've put together a test harness so I can mine offline (I don't want to offend the mining pool by submitting junk), and by manually checking the hashes (using a bit of python code that I know works ok), they are all garbage.

So, getting to the main point, has anyone got this code working on a DE0-Nano? I'm a little concerned as Quartus reports it is using ram as a substitute for some of the shifter logic (I haven't worked out how to disable this ... I'm new to Quartus and its quite a steep learning curve!) I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device (I've got to get to grips with the simulator first, and add some probes to check the intermediate results). So can anyone help me out by confirming whether Makomk's mod works on a DE0-Nano? And any hints for debugging this would be most welcome.

As an aside, I'm quite looking forward to getting up and running with mining on FPGA's as GPU's are on their way out, and there is going to be a window of opportunity between now and when the ASIC's eventually come on stream. I've got some ideas which I'll share as I go along. Its going to be a blast!
Grin

hardcore-fs

full member

Activity: 196

Merit: 100

Obviously you have an error some place.
Does the core validate a known hash correctly?

I.E take a completed hash

000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:4d47413a
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:8ca59c46
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:3c32c19e

and check your results match.

senseless

hero member

Activity: 1118

Merit: 541

Does anyone have a working Stratix IV?

I was able to fit 3 cores onto my EP4SGX230 and leave almost enough room for a 4th. Everything compiles fine with the "VHDL_StratixIV_OrphanedGland" project, but all shares produced are rejected.

Any ideas?

iidx

newbie

Activity: 35

Merit: 0

Quote from: Epicblood on October 17, 2012, 10:55:37 PM

So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

Yes, it works with the ML605 board. I have been running a modified version of the verilog port for over a year on 6 ML605s.

As stated by a previous poster you have to make minor changes to account for the different the board and chip.

You can see the performance achieved in the bitcoin mining hardware comparison article:

https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGAs

I added the entry when I got it working over a year ago, and no new ML605 entry has been added/updated, so I suppose non or not many have tried. I just happened to have access to the boards so I tried it out.

hardcore-fs

full member

Activity: 196

Merit: 100

Quote from: Epicblood on October 17, 2012, 10:55:37 PM

So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

yep you can build a miner for the ML605 but don't know the performance though.

HC

lame.duck

legendary

Activity: 1270

Merit: 1000

Since such expensive boards are not so common it seems nobody has tried it yet. You would have to take RTL-sources, fit the interface to the outer world to the ML605 boards need. adapt the pll and then build a bitstream for this board.

Epicblood

full member

Activity: 126

Merit: 100

So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

xiangfu

full member

Activity: 120

Merit: 100

Hi

I have manually soldering a http://en.qi-hardware.com/wiki/Mini-slx9 board. the 'osc', 'power modular', 'led' is connect by wires.

1. the osc is 50Mhz.
2. it load the bitstream by using jtag and this small program: https://github.com/xiangfu/mini-jtag
(I am using a FT2232H jtag board)
3. send a work to board by using : https://github.com/ngzhang/Icarus/blob/master/miner_software/scripts/payload.py
(only send the payload3, since it needs much less time)
4. then I get a result in ~25minutes

is that nomral? that return result in ~25 minutes??, can I optimize this? what document should I read that optimize the bitstream?

Thanks

BTW: where is your 200MHz come from??

Quote from: rph on August 21, 2011, 09:53:05 PM

Quote from: Venkatesh Srinivas on August 21, 2011, 09:13:52 AM

For anyone who has run this design on the LX9 microboard, what sort of hashrate did you get? And how many slices were used (and at what unrolling level?).

200MHz
5034 FF [44%]
3247 LUT6 [56%]
0 BRAM
0 DSP48A1
3.125MH/s

in xc6slx9-2. It finishes 1 SHA256(SHA256(x)) every 64 clocks.
With a few tricks it could probably fit 2 engines, for 6.25 MH/s total.

Not exactly going to beat an ATI GPU, but it's a fun toy. Grin

-rph

makomk

hero member

Activity: 686

Merit: 564

Quote from: BarsMonster on August 11, 2012, 12:54:20 AM

Guys, am I doing something wrong?
I just took DE2_115_makomk_mod, synthesized it for 140Mhz.... and it works.

Going on with further clocks & optimization settings... FPGA heats up quite violently, had to add radiator with thermal grease and active cooling.

Doesn't surprise me. Few people have Cyclone-IV FPGA boards and the most common one apparently can't supply enough power to the FPGA to handle Bitcoin mining at higher clock speeds. Couple that with people being unwilling to risk their expensive boards through overclocking and I don't think anyone's actually tried it yet.

xiangfu

full member

Activity: 120

Merit: 100

watching

BarsMonster

newbie

Activity: 39

Merit: 0

Apparently noone cares :-)

BarsMonster

newbie

Activity: 39

Merit: 0

Guys, am I doing something wrong?
I just took DE2_115_makomk_mod, synthesized it for 140Mhz.... and it works.

Going on with further clocks & optimization settings... FPGA heats up quite violently, had to add radiator with thermal grease and active cooling.

[08/11/2012 09:51:49] 140.03 MH/s (~145.04 MH/s) [Rej: 1/31 (3.23%)] [08/11/2012 09:51:51] 140.30 MH/s (~144.72 MH/s) [Rej: 1/31 (3.23%)] [08/11/2012 09:51:53] 139.95 MH/s (~144.41 MH/s) [Rej: 1/31 (3.23%)] [08/11/2012 09:51:55] 4082551a accepted [08/11/2012 09:51:57] 140.29 MH/s (~148.42 MH/s) [Rej: 1/32 (3.13%)] [08/11/2012 09:51:59] 139.95 MH/s (~148.10 MH/s) [Rej: 1/32 (3.13%)] [08/11/2012 09:52:01] 140.01 MH/s (~147.78 MH/s) [Rej: 1/32 (3.13%)] [08/11/2012 09:52:03] 140.02 MH/s (~147.47 MH/s) [Rej: 1/32 (3.13%)]

Update: Now 160:

[08/11/2012 23:00:48] 160.01 MH/s (~162.60 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:50] 160.03 MH/s (~162.30 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:52] 159.95 MH/s (~162.00 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:54] 160.03 MH/s (~161.70 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:56] 160.03 MH/s (~161.41 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:58] 159.95 MH/s (~161.11 MH/s) [Rej: 1/41 (2.44%)] [08/11/2012 23:00:58] e43bb477 accepted [08/11/2012 23:01:00] efc5a1dd accepted [08/11/2012 23:01:02] 160.27 MH/s (~168.35 MH/s) [Rej: 1/43 (2.33%)] [08/11/2012 23:01:04] 160.03 MH/s (~168.05 MH/s) [Rej: 1/43 (2.33%)] [08/11/2012 23:01:06] 2bdb2392 accepted [08/11/2012 23:01:08] 160.37 MH/s (~171.33 MH/s) [Rej: 1/44 (2.27%)]

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 20. (Read 432972 times)