Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 20. (Read 432891 times)

sr. member
Activity: 384
Merit: 250

On the FPGA heat, go looking on your PCB for the voltage regulators, then measure the temp on these  beasts.

If it is high then COOL the regulator then re-measure the FPGA temp.

When the FPGA "core" voltage drifts, then the FPGA get hot, but that can be due to the  Switched Mode regulators drifting because they are hot.

Cooling the FPGA in this situation does very little, since the  regulators will fail and may still even damage the FPGA because of the drift.




Good point. Its a linear regulator (LP385005D), and since its dropping from 5V (actually 4.2V under load) to 1.2V (chained via a 3.3V regulator, you can see why Makomk thought it was cheap!), most of the power dissipation is in the regulators rather than the EP4CE22F17C8N, and that was getting to around 60C (I reckon that if I can touch it without burning my finger, its still at a safe junction temperature).

*breathe!*

So what the regulators were doing I hate to guess  Angry

Anyway this board is now designated a test & burn device, and I'll get another couple to develop on (Farnell are great, just order it and it almost turns up straight down the intertubes pipe!)

But I guess I'm going to have to jump ship to Xilinx (shame, I was just getting the hang of Quartus), seeing as the word is they cream on hashing performance. The only slight worry is IP licensing (I was just reading the Altera terms, scary!), but I've seen nothing on the boards about problems from that, so maybe its just scareware.

TTFN
Mark
full member
Activity: 196
Merit: 100

On the FPGA heat, go looking on your PCB for the voltage regulators, then measure the temp on these  beasts.

If it is high then COOL the regulator then re-measure the FPGA temp.

When the FPGA "core" voltage drifts, then the FPGA get hot, but that can be due to the  Switched Mode regulators drifting because they are hot.

Cooling the FPGA in this situation does very little, since the  regulators will fail and may still even damage the FPGA because of the drift.


sr. member
Activity: 384
Merit: 250
 Embarrassed

Hi, if anyone's following my little saga, just a small confession. My figures for MH/s throughput a couple of post's above were rubbish.

In trying to get the code working, I'd messed with the PLL parameters (like, I thought, why is the clock specified at 20MHz, when its physically 50MHz?), so I'd changed it. Well I've got round to reading the documentation, and (as with all things FPGA), its not quite what it seems. AFAIK (and that's not far for considering I'd never touched one of these things until a week ago), the inclk0_input_frequency is just some sort of fudge factor to get the PLL built with sensible parameters for a real 50MHz clock input. So the PLL was hunting all over the place, giving somewhat variable MH/s rates. I'm amazed I was getting any results at all Shocked

Anway I set it back to 20, and I'm now geting a rock steady 12.5MH/s at 100 MHz (and a rather hot FPGA ... I need to sort out some cooling, and a proper 1.2V power supply as the USB rail is drooping horrendously). So considering the device is only about half utilized, and a bit of agressive cooling, I reckon I might get it up to 25MH/s, or a bit more. It'll would still take years to pay back the investment (assuming I ramped up the quantity somewhat), but of course that's not the point here (and BitCoin could be gone tomorrow), but perhaps it would be worth taking a little gamble, if only I could source some cheap parts. Anyone know a good source of scrap mid-range FPGA-based equipment that needs saving from some chinese (oops, no offence meant), melt-down shop?  Wink
hero member
Activity: 1118
Merit: 541
Apparently noone cares :-)

If your shares are getting accepted then everything is fine. But you will have a high rejection rate as there is no long polling in the code.

sr. member
Activity: 384
Merit: 250
HC

Useful comments.

I must admit that when I started this project, getting involved in circuit design was rather the last thing on my mind. In fact I'd just ordered a Raspberry Pi from Farnell, and having opened an account with them I was looking through their webste for interesting stuff when I found the DE0-Nano. I thought to myself that at that price, I just had to have one, and it arrived the next day! So the Pi's gone on the back burner for now (it makes for nice screensaver video wallpaper on the TV though). It's only *then* that I came across BitCoin, while I was googling for cool things to do with my new toy. So I'm a *real* newbie here!

Yes, I agree that its not going to pay for itself at 7.6MH/s, which is why the freezer idea came to mind (I recall from my uni days that propagation delays scale with temperature, so it should clock faster at lower temperatures). But I don't intend building a farm of DE-0's, and I would hope to get much higher throughputs that that. I gather the sweet-spot for device peformance per dollar is a bit further up the the product range. Unless I can scrounge some free(ish) devices to build with  Shocked Of course the main issue is construction which is much more difficult these days than in my youth. BGA's look like a bitch to deal with, though you only really need the power pins connected for this application. Unless I can keep the parts and construction costs down, then it would never pay for itself.

I'm going to disagree with you about the cooling cost. FPGA's ought to be way more power efficient than GPU's (OK, thats a glib statment, I'd need to back it up with some calculations), and we're only looking at a heat-pump here, so I'd expect a COP somewhere around the 2 mark. So we're only doubling the power budget for what, a 25% speed improvement (completley wild-eyed guess there), and power buget is what FPGA's (and ultimately ASIC's) excell on. And forget the poly bags, I'm thinking sealed boxes filled with transformer oil (messy, but much simpler than heatsinks and cooling fans).

Anyway, this is getting to sound too much like a business proposal, and all I was looking for was something to while away the lazy days of my retirement. But things to do ... I need to work out if my performance figures are correct (like a jump from 7.6 to 14.9 MH/s for a 50% increase in clock speed is insane, I must have done something wrong there). But the proof's in the pudding and the 80MHz build has submitted 9 shares in the last 75 minutes, which I reckon is somewhere around the 8MH/s mark. Oh and I need to get some sleep at some point, it just gone midnight here  Undecided

Mark
full member
Activity: 196
Merit: 100

I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device


Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html


Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

Many thanks, that's very useful background! Unfortunately maths was never my strongest subject, but I'll take the time to understand it. I'm more the hack it together and see if it works type than the academic type.  Roll Eyes

I was rather hoping that the fpgaminer code would work "out of the box", but it seems things are never that simple.

I have made some progress though. I've been comparing the different versions and compiled the xilinx branch LX150_makomk_Test ... it needed a little bit of tweaking (GOLDEN_NONCE_OFFSET was out by one), but its working at LOOP_LOG2=3 and generating valid hashes  Smiley Its bumped up the throughput by 50%, so now I'm getting 7.6 MH/s at 80MHz and 14.9 MH/s at 120MHz (OOPS, belay that remark, its kicking out bad hash'es at 120MHz, not so good).

Multi-core sounds good, perhaps mixing the sizes (say a LOOP_LOG2=3 plus a LOOP_LOG2=4) to fill up the device, however I rather expect throughput to ultimately be thermally bound (the power dissipation will scale with MH/s rather than MHz, at least to a first degree). I plan to see what performance I can get at -20C (freezer temeratures), as this is far more practical with a 10Watt FPGA than a 200W GPU! It would be nice to dynamically set the clock speed too, so the devices can self-calibrate and ramp themselves up to a maximum clock speed. As I said in my earlier post, this is going to be fun. And if I can get the kit to pay for itself, then that's just a bonus Grin

Again, many thanks, hope to stay in touch!

Maths is NOT my strong point either, but i can add up and multiply by 2 (right shift)  'and' 'xor'

e0
e1
ch
maj
sigma0
sigma1

Basically the speed 'weakness' in this algorithm is the long chain additions, the  design can be broken down into TWO main sections.

The Expander & the Compressor, since an addition  (x+y)+(p+z) is basically the same whichever way you do it.
you can calculate BOTH
(x+y)
(p+z)
At the SAME time, since neither independent result depends on the other.

consider:

w_out(511 downto 480) <= s1 + w_in(319 downto 288) + s0 + w_in(31 downto 0);

Whilst it executes within a "single clock cycle"
process(clk)
....
.....

The shear length of the additions DICTATES the number of logic levels and therefore the  MINIMUM clock cycle length, due to the physical implementation of the routing.(you cannot go faster than a CLK cycle, all you can do is ensure your logic shortens it)

Also if you are going to stick shit into the freezer.

1. It ain't going to be a profitable way to mine at 7.6MH/s, since the cooling cost outweighs the bitcoin value
2. SEAL the device in a PLASTIC bag with some silica gel, because when you bring the stuff out of the freezer, moisture in the air is going to condense on the design and destroy it. (in a poly bag, it prevents condensation until the design reaches ambient , at which time it can be brought OUT of the bag. The silica gel acts as a buffer to ensure the bag is super low humidity)

3. Its NEVER going to pay for its-self at 7.6MH/S.
sr. member
Activity: 384
Merit: 250
I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device


Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html


Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

Many thanks, that's very useful background! Unfortunately maths was never my strongest subject, but I'll take the time to understand it. I'm more the hack it together and see if it works type than the academic type.  Roll Eyes

I was rather hoping that the fpgaminer code would work "out of the box", but it seems things are never that simple.

I have made some progress though. I've been comparing the different versions and compiled the xilinx branch LX150_makomk_Test ... it needed a little bit of tweaking (GOLDEN_NONCE_OFFSET was out by one), but its working at LOOP_LOG2=3 and generating valid hashes  Smiley Its bumped up the throughput by 50%, so now I'm getting 7.6 MH/s at 80MHz and 14.9 MH/s at 120MHz (OOPS, belay that remark, its kicking out bad hash'es at 120MHz, not so good).

Multi-core sounds good, perhaps mixing the sizes (say a LOOP_LOG2=3 plus a LOOP_LOG2=4) to fill up the device, however I rather expect throughput to ultimately be thermally bound (the power dissipation will scale with MH/s rather than MHz, at least to a first degree). I plan to see what performance I can get at -20C (freezer temeratures), as this is far more practical with a 10Watt FPGA than a 200W GPU! It would be nice to dynamically set the clock speed too, so the devices can self-calibrate and ramp themselves up to a maximum clock speed. As I said in my earlier post, this is going to be fun. And if I can get the kit to pay for itself, then that's just a bonus Grin

Again, many thanks, hope to stay in touch!
full member
Activity: 196
Merit: 100
I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device


Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html


Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC
sr. member
Activity: 384
Merit: 250
Hi, Newbie here, just started playing with fpgaminer on a DE0-Nano.

I'm using the official code from https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner and I've got the DE2_70_Unoptimized_Pipelined branch working nicely (getting 7.5MH/s at 120MHz ... chip gets too hot if I try running any faster).

I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected! (The unoptimised code is fine, I've submitted 200 shares so far).

Now I've put together a test harness so I can mine offline (I don't want to offend the mining pool by submitting junk), and by manually checking the hashes (using a bit of python code that I know works ok), they are all garbage.

So, getting to the main point, has anyone got this code working on a DE0-Nano? I'm a little concerned as Quartus reports it is using ram as a substitute for some of the shifter logic (I haven't worked out how to disable this ... I'm new to Quartus and its quite a steep learning curve!) I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device (I've got to get to grips with the simulator first, and add some probes to check the intermediate results). So can anyone help me out by confirming whether Makomk's mod works on a DE0-Nano? And any hints for debugging this would be most welcome.

As an aside, I'm quite looking forward to getting up and running with mining on FPGA's as GPU's are on their way out, and there is going to be a window of opportunity between now and when the ASIC's eventually come on stream. I've got some ideas which I'll share as I go along. Its going to be a blast!
 Grin
full member
Activity: 196
Merit: 100
Obviously you have an error some place.
Does the core validate a known hash correctly?

I.E take a  completed hash

000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:4d47413a
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:8ca59c46
000000013ff476435d97eec040c3e302dd43eb3fca1a26dabaa8f9de0000039e000000000245dc2 c7b5c01ccc5c7f7a594c000837eb8dc2c4f9bb287417eeaf224c9e728509219df1a0513c5:3c32c19e


and check your results match.
hero member
Activity: 1118
Merit: 541

Does anyone have a working Stratix IV?

I was able to fit 3 cores onto my EP4SGX230 and leave almost enough room for a 4th. Everything compiles fine with the "VHDL_StratixIV_OrphanedGland" project, but all shares produced are rejected.

Any ideas?

newbie
Activity: 35
Merit: 0
So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

Yes, it works with the ML605 board.  I have been running a modified version of the verilog port for over a year on 6 ML605s.

As stated by a previous poster you have to make minor changes to account for the different the board and chip.

You can see the performance achieved in the bitcoin mining hardware comparison article:

https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGAs

I added the entry when I got it working over a year ago, and no new ML605 entry has been added/updated, so I suppose non or not many have tried.  I just happened to have access to the boards so I tried it out.
full member
Activity: 196
Merit: 100
So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

yep you can build a miner for the ML605 but don't know the performance though.

HC
legendary
Activity: 1270
Merit: 1000
Since  such expensive boards are not so common it seems nobody has tried it yet. You would have  to take RTL-sources, fit  the interface to the outer world  to the ML605 boards need. adapt the pll and then build a bitstream for this board.
full member
Activity: 126
Merit: 100
So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.
full member
Activity: 120
Merit: 100
Hi

I have manually soldering a http://en.qi-hardware.com/wiki/Mini-slx9 board. the 'osc', 'power modular', 'led' is connect by wires.

1. the osc is 50Mhz.
2. it load the bitstream by using jtag and this small program: https://github.com/xiangfu/mini-jtag
    (I am using a FT2232H jtag board)
3. send a work to board by using : https://github.com/ngzhang/Icarus/blob/master/miner_software/scripts/payload.py
(only send the payload3, since it needs much less time)
4. then I get a result in ~25minutes

is that nomral? that return result in ~25 minutes??, can I optimize this?  what document should I read that optimize the bitstream?

Thanks

BTW:  where is your 200MHz come from??


For anyone who has run this design on the LX9 microboard, what sort of hashrate did you get? And how many slices were used (and at what unrolling level?).

200MHz
5034 FF [44%]
3247 LUT6 [56%]
0 BRAM
0 DSP48A1
3.125MH/s

in xc6slx9-2. It finishes 1 SHA256(SHA256(x)) every 64 clocks.
With a few tricks it could probably fit 2 engines, for 6.25 MH/s total.

Not exactly going to beat an ATI GPU, but it's a fun toy.  Grin

-rph

hero member
Activity: 686
Merit: 564
Guys, am I doing something wrong?
I just took DE2_115_makomk_mod, synthesized it for 140Mhz.... and it works.

Going on with further clocks & optimization settings... FPGA heats up quite violently, had to add radiator with thermal grease and active cooling.
Doesn't surprise me. Few people have Cyclone-IV FPGA boards and the most common one apparently can't supply enough power to the FPGA to handle Bitcoin mining at higher clock speeds. Couple that with people being unwilling to risk their expensive boards through overclocking and I don't think anyone's actually tried it yet.
full member
Activity: 120
Merit: 100
newbie
Activity: 39
Merit: 0
Apparently noone cares :-)
newbie
Activity: 39
Merit: 0
Guys, am I doing something wrong?
I just took DE2_115_makomk_mod, synthesized it for 140Mhz.... and it works.

Going on with further clocks & optimization settings... FPGA heats up quite violently, had to add radiator with thermal grease and active cooling.

[08/11/2012 09:51:49] 140.03 MH/s (~145.04 MH/s) [Rej: 1/31 (3.23%)]
[08/11/2012 09:51:51] 140.30 MH/s (~144.72 MH/s) [Rej: 1/31 (3.23%)]
[08/11/2012 09:51:53] 139.95 MH/s (~144.41 MH/s) [Rej: 1/31 (3.23%)]
[08/11/2012 09:51:55] 4082551a accepted
[08/11/2012 09:51:57] 140.29 MH/s (~148.42 MH/s) [Rej: 1/32 (3.13%)]
[08/11/2012 09:51:59] 139.95 MH/s (~148.10 MH/s) [Rej: 1/32 (3.13%)]
[08/11/2012 09:52:01] 140.01 MH/s (~147.78 MH/s) [Rej: 1/32 (3.13%)]
[08/11/2012 09:52:03] 140.02 MH/s (~147.47 MH/s) [Rej: 1/32 (3.13%)]


Update: Now 160:

[08/11/2012 23:00:48] 160.01 MH/s (~162.60 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:50] 160.03 MH/s (~162.30 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:52] 159.95 MH/s (~162.00 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:54] 160.03 MH/s (~161.70 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:56] 160.03 MH/s (~161.41 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:58] 159.95 MH/s (~161.11 MH/s) [Rej: 1/41 (2.44%)]
[08/11/2012 23:00:58] e43bb477 accepted
[08/11/2012 23:01:00] efc5a1dd accepted
[08/11/2012 23:01:02] 160.27 MH/s (~168.35 MH/s) [Rej: 1/43 (2.33%)]
[08/11/2012 23:01:04] 160.03 MH/s (~168.05 MH/s) [Rej: 1/43 (2.33%)]
[08/11/2012 23:01:06] 2bdb2392 accepted
[08/11/2012 23:01:08] 160.37 MH/s (~171.33 MH/s) [Rej: 1/44 (2.27%)]
Pages:
Jump to: