Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 11. (Read 432890 times)

hero member
Activity: 560
Merit: 517
I've been asked a few times about a mining script for the current KC705 firmware.  I wrote a plugin for Modular Python Bitcoin Miner.  Here's the message I sent to someone about it:

Quote
I uploaded the custom MBPM module, which is compatible with the current KC705 mining code, here:
https://mega.co.nz/#!Oh5HTDRB!C0RLYW4yZN8gbg38FfgLpzmKFcseOql3Xx1i_gXTfdM

You'll want to download a copy of MPBM's testing branch.  Then extract the above archive into
Code:
modules/fpgamining
such that you end up with:

Code:
modules/fpgamining/kc705_uart/__init__.py
modules/fpgamining/kc705_uart/kc705uartworker.py

Once you start MPBM, you can now add a KC705 Worker by openning up the MPBM web-interface (http://127.0.0.1:8832) and clicking the "Workers" button on the left.  On Windows, I ran MPBM under Cygwin, and the "Port" ended up being /dev/com2 for me.  The Baudrate is 115200.

~fpgaminer

I haven't had a chance to clean it up and put it on the repo yet.
sr. member
Activity: 384
Merit: 250
I m currently playing with the DE0 Nano code from Kramble.

And i ve a question, you said that running it at higher speed than 40Mhz could damage an unmodified DE0 Nano, and i didn't understand why.

As from Quartus PowerPlay Power Analyser, the design at 50 Mhz use only 328mW, that s arround 273mA right ? it s supposed to support 500mA, isn't it ?

Did i miss something ?

No, I was just being conservative in case someone inexperienced just cranked it up to the max (and following the example of fpgaminer in his original readme). You can run it faster as long as you are happy the power supply will support it (I had a conversation with hardcore_fc a few months back about the regulators, it may be worth you looking back over it). I am currently running one board at 170Mhz (with a hardwired external 1.2V core supply as described at www.makomk.com) and a second at 80MHz on a conventional 3.3V external supply.

You are correct that a USB supply will probably be limited to 500mA, but this is at 5Volts. I haven't played with the Powerplay Analyser, but I would expect that this is reporting the power at the 1.2V fpga core rail. You have to account for the other devices on the DE0-Nano board too.

I just dug out some notes I made of measurements with the 3.3V supply. 40Mhz was 0.48A, 80Mhz 0,85A, 100Mhz 1.0A, 120MHz 1.2A and 140Mhz 1.36A, so roughly 10mA per Mhz. The regulators were getting very hot at the higher speeds (even though I was pointing a fan at the board), hence my caution at running the DE0-Nano at these sorts of speeds. The regulators themselves are overtemperature protected, but looking at the datasheet, this only kicks in at T(junction) of 175C, while the max operating temperature is 125C. It also quotes 85C/Watt junction-ambient assuming a big chunk of PCB copper dedicated to heatsinking, so you can work out roughly what they can practically support.

Given the tiny returns from mining on the Nano, my opinion was that its not worth risking the boards at the higher speeds. I'm happy with my current setup (as described above) as nothing is getting above 60C, but its your call on your own stuff.

[EDIT] I should add that I'm using a serial interface to communicate with the boards, rather than the quartus_stp jtag usb cable, which is why I can get away with a 3.3V external supply. If you are using the usb for communication, then an external 3.3V supply won't work as it will pull current from the usb instead (there are a couple of blocking diodes so no harm should occur). You could use a 5V external supply to supplement the usb's 500mA, but then its all getting a bit Heath Robinson, and the onboard regulators are under more heat stress at 5V than 3.3V. Oh, and the DE0-Nano manual says the minimum external supply is 3.6V (I just happened to have 3.3V to hand and it worked fine, but its technically out of spec so YMMV).

Regards
Mark
full member
Activity: 193
Merit: 100
I m currently playing with the DE0 Nano code from Kramble.

And i ve a question, you said that running it at higher speed than 40Mhz could damage an unmodified DE0 Nano, and i didn't understand why.

As from Quartus PowerPlay Power Analyser, the design at 50 Mhz use only 328mW, that s arround 273mA right ? it s supposed to support 500mA, isn't it ?

Did i miss something ?
newbie
Activity: 15
Merit: 0
Quote
When I replaced dsp_e with adder I got 302 MHz
I find it odd that your Fmax is dropping when you replace the DSPs with LUTs.  You may want to fiddle around with Vivado's settings to make sure register retiming (or whatever Vivado calls it) is enabled.  Alternatively, implement the adders as two stages of 16-bits each.  Since the DSPs that are being replaced are two stage (or three) anyway.

I used 2-stage adders, because DSP adders worked in 2 cycles and I didn't want to debug too much. IP core generator recommended 3 cycles for the best performance - I'll try that next.

After replacing dsp_e, dsp_wp and dsp_t1p I got 46% DSPs used - so it's enough to fit two cores.
hero member
Activity: 560
Merit: 517
Quote
When I replaced dsp_e with adder I got 302 MHz
I find it odd that your Fmax is dropping when you replace the DSPs with LUTs.  You may want to fiddle around with Vivado's settings to make sure register retiming (or whatever Vivado calls it) is enabled.  Alternatively, implement the adders as two stages of 16-bits each.  Since the DSPs that are being replaced are two stage (or three) anyway.

Also, for dsp_t1p, it would be best to replace both dsp_t1p and compressor_t1p with a single LUT adder, since the LUT fabric can implement 3 way additions just as efficiently as 2-way addition.
newbie
Activity: 15
Merit: 0
This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.

Have you done any testing as to which adders provide the best increase to the fmax? In order to get multiple cores in there going to need to pick and choose which adders to replace with dsps and which not to. I'm currently at 66% LUT usage with 99% memory LUT and 108% dsp usage with 2 unrolled cores (I had one core do even nonces while the other does odd nonces to make life easy). I've been slowly working down the number of dsps utilized per core to make it fit. I'm thinking it might be possible to get 3 full cores on the A7 200.

Does the DSP performance increase compound? If I change one adder over to DSP utilization and it gives a 10% fmax increase... would changing additional adders down the chain affect that 10%? or will that one adder always give a 10% boost? I'm wondering if it will be possible to go through the adders one by one and calculate the increase in frequency for each one to find which adders would be the most effectively utilized under DSP48 blocks to get the best timing.


I compiled fpgaminer's DSP code on A7 200 and I got 356 MHz on -3 grade, 311 MHz on -2 grade and 262 MHz on -1. The -3 variant only exists in extended temperature version, so it's much more expensive - so the -2 is the best choice in my opinion.

The usage was 20% slice logic, 34% slice logic distribution and 92% DSP.

What were your results? I.e. what maximum clocking do you have without DSP?

Now I'm trying to replace some DSPs with adder IP core - I think best candidates are these that don't use PCIN input (because they are simpler), like dsp_e, dsp_wp and dsp_t1p. When I replaced dsp_e with adder I got 302 MHz (-2 version), 23% logic, 37% distrib, 75% DSP. Then I replaced dsp_wp: 271 MHz, 24% logic, 38% distrib, 63% DSP. Compilation took over 5 hours, while it takes 30 min when using only DSP. Then I replaced dsp_t1p and the compilation takes ages to complete (it didn't complete yet) Sad

The estimation is that DSP usage will be 49%, so theoretically I should be able to fit two such cores. Even if I have to lower the clock to, say, 200 MHz then total output would be 400 MH/s, which would be better than 311 MH/s with one DSP-only core.
hero member
Activity: 767
Merit: 500
hi, another question from a newbs.. Grin


have any of you guys heard of parallella? http://www.parallella.org
what you guys think about it?  Cheesy

Ahh yes, that my friend is a completely different ball game to FPGA
i've been waiting for them to kick off, i want one to play with 64 threads per chip... mmmm
newbie
Activity: 13
Merit: 0
hi, another question from a newbs.. Grin


have any of you guys heard of parallella? http://www.parallella.org
what you guys think about it?  Cheesy
hero member
Activity: 1118
Merit: 541
This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.

Have you done any testing as to which adders provide the best increase to the fmax? In order to get multiple cores in there going to need to pick and choose which adders to replace with dsps and which not to. I'm currently at 66% LUT usage with 99% memory LUT and 108% dsp usage with 2 unrolled cores (I had one core do even nonces while the other does odd nonces to make life easy). I've been slowly working down the number of dsps utilized per core to make it fit. I'm thinking it might be possible to get 3 full cores on the A7 200.

Does the DSP performance increase compound? If I change one adder over to DSP utilization and it gives a 10% fmax increase... would changing additional adders down the chain affect that 10%? or will that one adder always give a 10% boost? I'm wondering if it will be possible to go through the adders one by one and calculate the increase in frequency for each one to find which adders would be the most effectively utilized under DSP48 blocks to get the best timing.




sr. member
Activity: 384
Merit: 250
The makomk_mod version fits using factor 2 (but all works are rejected, I don't know way!). It reports 12MH/s.

I had the same problem with the DE0-Nano (22k LE), this was Makomk's response ...

I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected!
Yeah, that branch doesn't work with CONFIG_LOOP_LOG2!=1. You probably want http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=summary de0-nano-hax branch, projects/DE2_115_Unoptimized_Pipelined project. The voltage regulators are also indeed horribly inefficient on the DE0-nano.

I can't answer AJRGale's query about the LE's needed for a fully unrolled core as I haven't built anything larger than a one-sixth core which (just) fitted into 22k LE on an EP4CE22 on the Nano.

Regards
Mark
newbie
Activity: 14
Merit: 0
Basically i want to know what a full miner roll out fits on, how many LEs i'll go to digi-key and look something up and go from there

Hi,

The Altera DE1 has 18K LE.

The non-optimized version fits using the factor 4 in the roll(?), for a total of 16K LE used. I get 3.10 MH/s.
The makomk_mod version fits using factor 2 (but all works are rejected, I don't know way!). It reports 12MH/s.


hero member
Activity: 767
Merit: 500
So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )

NO!! Don't confuse gate with LE (logic element). Older fpga's often quoted a gate count (such as the one you linked to Spartan 3E 250K gates). Newer fpga's use a Logic Element (or Logic Cell) count (and google tells me there are 12 gates to a LE). So a Spartan 6 LX150 with 147,443 logic cells roughly equates to 1.7 million gates by my calculation (I can't find any direct quote for the actual figure, so take that as very approximate). You can see the spartan family spec at http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

The board you linked to will be (almost) useless for mining. You need to look for a purpose-built Spartan LX150 based miner and use the firmware (bitstream) that comes with it (and even then the economics look pretty grim).

If you want to compile your own bitstream for the Spartan series, you can download free software from the Xilinx web site http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm but beware that it is limited to the smaller devices (LX75 maximum I think, but do your own due dilligence). You need the full (very expensive) version to compile for the LX150.

Regards
Mark

Ah, Sorry for my newbishness, never played with one of these devices (blame the 2 companies for their heavy secretive efforts unless you buy their $5000 suite) 
my mistake, so when a company quotes "Gates" number, i have to look for  ALM, LE, Slice etc?

Basically i want to know what a full miner roll out fits on, how many LEs i'll go to digi-key and look something up and go from there
sr. member
Activity: 384
Merit: 250
So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )

NO!! Don't confuse gate with LE (logic element). Older fpga's often quoted a gate count (such as the one you linked to Spartan 3E 250K gates). Newer fpga's use a Logic Element (or Logic Cell) count (and google tells me there are 12 gates to a LE). So a Spartan 6 LX150 with 147,443 logic cells roughly equates to 1.7 million gates by my calculation (I can't find any direct quote for the actual figure, so take that as very approximate). You can see the spartan family spec at http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

The board you linked to will be (almost) useless for mining. You need to look for a purpose-built Spartan LX150 based miner and use the firmware (bitstream) that comes with it (and even then the economics look pretty grim).

If you want to compile your own bitstream for the Spartan series, you can download free software from the Xilinx web site http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm but beware that it is limited to the smaller devices (LX75 maximum I think, but do your own due dilligence). You need the full (very expensive) version to compile for the LX150.

Regards
Mark
newbie
Activity: 19
Merit: 0
Thx for your work you put in the miner!

I ported the Xilinx_VHDL miner to the ml605 dev board.

Actually, straight forward ... Replaced the dcm with a newer Virtex6-aquivalent, wired the pins to rs232 and clock, adjusted the baud rate and it run instantly.

It does 200MHash/sec and is user by about 85% ...
hero member
Activity: 767
Merit: 500
the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable

The DE0-Nano is great to get started learning about fpga's, but it won't make you any useful coin. 5MHash/sec is about right, it will go faster but not without risk of overheating, and certainly no more than about 25MHash/sec (using Makomk's modified power supply). To put that in context 5MHash/sec will currently earn approx 0.0003 bitcoin per day (and getting less by roughly 20% every 2 weeks as the difficulty increases).

If you do decide to get a DE0-Nano, start with the DE2_70_Unoptimized_Pipelined project. You'll need to increase CONFIG_LOG_LOOP2 to 4 to get it to compile (that's one eighth of a core, I think). I cheated and edited the fpgaminer.qsf file directly to configure it for the EP4CE22, but its probably safer to create a new project from scratch and add in the source files.

Mark

Heh, thats about $1 a month (at the ~$100/coin mark) so that thing is not going to break even this life time i thinks

So, i might have to go hunt out some 2nd hand Spartan6 with 150K gates (or similar)

So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )
sr. member
Activity: 384
Merit: 250
the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable

The DE0-Nano is great to get started learning about fpga's, but it won't make you any useful coin. 5MHash/sec is about right, it will go faster but not without risk of overheating, and certainly no more than about 25MHash/sec (using Makomk's modified power supply). To put that in context 5MHash/sec will currently earn approx 0.0003 bitcoin per day (and getting less by roughly 20% every 2 weeks as the difficulty increases).

If you do decide to get a DE0-Nano, start with the DE2_70_Unoptimized_Pipelined project. You'll need to increase CONFIG_LOG_LOOP2 to 4 to get it to compile (that's one eighth of a core, I think). I cheated and edited the fpgaminer.qsf file directly to configure it for the EP4CE22, but its probably safer to create a new project from scratch and add in the source files.

Mark
hero member
Activity: 560
Merit: 517
Quote
So I need to figure out what additional logic exists besides the SHA-256 module, how do they interact with each other and how do they interact with the SHA-256 modules?
First, please note that there are multiple "flavors" of the hashing code, and for the most part they are optimized for synthesis to FPGA targets.  I would highly suggest you hire an ASIC engineer who can take the time to understand SHA-256 and the needs of the Bitcoin proof of work mining algorithm himself.

Second, the SHA-256 hashing units do need a controlling unit.  As you can see in the modules you linked, there are a few top-level signals that are expected to be driven by a controller.  Most importantly rx_state and rx_input.  And you need a controller to check the results, and talk to the outside world.

This is the top-level module for one of the projects you linked to.  In there you will find the code that controls the sha256_transform instances, and how they are connected together.
hero member
Activity: 767
Merit: 500
that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one

My biggest problem was software. If you're going for used chips, make sure whatever chip you buy has free development software for it. Wanting licensed software for a dev board is one of the reason I bought the kit.

the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable
hero member
Activity: 1118
Merit: 541
that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one

My biggest problem was software. If you're going for used chips, make sure whatever chip you buy has free development software for it. Wanting licensed software for a dev board is one of the reason I bought the kit.
hero member
Activity: 767
Merit: 500
I have just pushed the experimental KC705 code to the repo.
....
AJR,

If you're going to get into it I would highly recommend you get the 705 or the 701.
...


that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one
Pages:
Jump to: