Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 37. (Read 432950 times)

hero member
Activity: 560
Merit: 517
Quote
Anyways, my question is this: How fast am I hashing?
80 MHz at CONFIG_LOOP_LOG2 should be 10MH/s. If you're seeing 25MH/s at a pool then perhaps you've been lucky? Over what timespan is that average made?

Quote
First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it
That's great! Makes me happy to spread the FPGA love  Grin
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it Smiley

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.

Anyways, my question is this: How fast am I hashing?

If I understand correctly, the frequency that I'm running at is approximately my hashing speed... So assuming fully unrolled, 80 MHz would give me 80 MHashes/s. However, with CONFIG_LOOP_LOG2=3, my hashing power should be 80 * (0.5 ** 3) or 10 MHashes/s. However, based on shares submitted to a pool,  I'm very roughly estimating ~25MHashes/s. Is there a way to get a better idea of how fast I'm hashing?

10MH/s sounds correct, and pool hashrate estimates being massively off is not something unusual. Average it over a couple of hours, and you should end up with 8-12MH/s.
newbie
Activity: 8
Merit: 0
First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it Smiley

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.

Anyways, my question is this: How fast am I hashing?

If I understand correctly, the frequency that I'm running at is approximately my hashing speed... So assuming fully unrolled, 80 MHz would give me 80 MHashes/s. However, with CONFIG_LOOP_LOG2=3, my hashing power should be 80 * (0.5 ** 3) or 10 MHashes/s. However, based on shares submitted to a pool,  I'm very roughly estimating ~25MHashes/s. Is there a way to get a better idea of how fast I'm hashing?
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Great info TheSeven, thank you Smiley Do you have any more specific information on the clock generators and bypassing caps? I'm pretty new to this level of circuit design. It shouldn't be too hard to build a power supply that can drop +12v down to the 3.3 and 1.8 (And 1.2?) that the FPGA needs. Also good to know I just need two I/O lines for the serial converter. In all honesty I'll probably use an FTDI serial to usb since that's what myself and pretty much everyone else will need to use after the board anyways.

I planned on basically making it a breakout board so that all pins could be used, but you bring up a valid point with the ESD risks. It would also be much simpler and straightforward to just throw a couple of status LEDs on there and maybe a switch or two. Perhaps a small 8 pin header for other uses?

JTAG seems like it would be the best way to go so far as a programming interface is concerned.  Thoughts?

Oh and FPGA miner I'll gladly start a new thread if this is too far off topic from your original intent Smiley

Thanks for the help guys!

I have never used that FPGA series myself, so you might want to ask someone else who has more experience with them regarding power supply and clocking, for example ArtForz, who is working on building an FPGA cluster and seems to be pretty knowledgeable. Anyway, the 1.2V rail should be designed for at least 10 amps. As you don't need anything >3.3V, the input voltage range would ideally be something like 5-15V (standard barrel connector?) with an LDO for 3.3V and a POL switcher for 1.2V.

Yeah, an FTDI might be the way to go. And JTAG surely is the way to go for programming. There's just no point in spending money on a flash chip for this application, as the end user will need to have a JTAG device anyway.

Oh, and don't forget the trivial things like mounting holes Smiley
inh
full member
Activity: 155
Merit: 100
Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf

A good starting point, yes. But remember to add some more bypassing caps and a stronger power supply, as bitcoin mining drives the FPGA close to its TDP, it not even out of spec. Cooling and power supply stability and efficiency will be the most important factor here. Account for at least 10 amps per FPGA!

The FPGA you would want to use would be one of the XC6SLX150 types without T, choose whichever package is suited best. The better speed grade will probably not improve timings enough to be worth it though.

Regarding I/Os: Apart from a clock generator (I'd think that anywhere from 20MHz to 100MHz should work fine, but that might need a closer look), you'll just need two I/Os connected to a serial port through a MAX232 or similar. This will limit reusability for other purposes, but allows to cut costs, which seems to be the goal here. If it doesn't complicate PCB routing, you might want to allow for LED drivers and a couple of status LEDs to be added (or maybe some headers exposing unused I/O pins, but that might increase ESD risks)

Oh, and remember that prototyping this isn't exactly cheap with the FPGAs costing around $200 each.

Great info TheSeven, thank you Smiley Do you have any more specific information on the clock generators and bypassing caps? I'm pretty new to this level of circuit design. It shouldn't be too hard to build a power supply that can drop +12v down to the 3.3 and 1.8 (And 1.2?) that the FPGA needs. Also good to know I just need two I/O lines for the serial converter. In all honesty I'll probably use an FTDI serial to usb since that's what myself and pretty much everyone else will need to use after the board anyways.

I planned on basically making it a breakout board so that all pins could be used, but you bring up a valid point with the ESD risks. It would also be much simpler and straightforward to just throw a couple of status LEDs on there and maybe a switch or two. Perhaps a small 8 pin header for other uses?

JTAG seems like it would be the best way to go so far as a programming interface is concerned.  Thoughts?

Oh and FPGA miner I'll gladly start a new thread if this is too far off topic from your original intent Smiley

Thanks for the help guys!
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf

A good starting point, yes. But remember to add some more bypassing caps and a stronger power supply, as bitcoin mining drives the FPGA close to its TDP, if not even out of spec. Cooling and power supply stability and efficiency will be the most important factor here. Account for at least 10 amps per FPGA!

The FPGA you would want to use would be one of the XC6SLX150 types without T, choose whichever package is suited best. The better speed grade will probably not improve timings enough to be worth it though.

Regarding I/Os: Apart from a clock generator (I'd think that anywhere from 20MHz to 100MHz should work fine, but that might need a closer look), you'll just need two I/Os connected to a serial port through a MAX232 or similar. This will limit reusability for other purposes, but allows to cut costs, which seems to be the goal here. If it doesn't complicate PCB routing, you might want to allow for LED drivers and a couple of status LEDs to be added (or maybe some headers exposing unused I/O pins, but that might increase ESD risks)

Oh, and remember that prototyping this isn't exactly cheap with the FPGAs costing around $200 each.
inh
full member
Activity: 155
Merit: 100
Thanks Smiley I spent all last night looking for exactly that Smiley
member
Activity: 70
Merit: 10
Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf
inh
full member
Activity: 155
Merit: 100
Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)
member
Activity: 70
Merit: 10
just adding my 2 cents to this thread

At work I have a lot of Spartan-3E 1600's laying around - it's our main dev FPGA for our main product.

I initially wasted a day debugging the serial connection.  But after re-reading the thread, I saw you had a 120MHz clock source.  So just some counter adjustments for the uart's clock dividers fixed that.

Unfortunately, the Spartan 3 routing is horrible.  I've only been able to synthesize TheSeven's version of the code with a depth of 2 - and that barely missed the timing analysis running it all at 50MHz ( 214 signals had a data path delay of 21.5 ns - 50MHz = 20 ns ).

Gatewise, it looks like the 3E 1600 part with a depth of 2 is using 23% of the FFs and 38% of 4-input LUTs.  So I imagine gatewise it could hold a depth of 3, possibly 4.

But routing is another story.  Even with a depth of 2 running at 50MHz barely works.  I could probably underclock it a bit more, but then I have to use the clk_dv output on the dcm with some odd scaling factors.

This system clocked in using the pyminer a hashrate of 3.24 MH/s.

It may be possible to further optimize this for smaller devices by doing some more pipelining.  But I'm not sure how well that will fit into your parameterized SHA rounds.  But basically by looking at the static timing report, it looks like the longest path delays have to go through multiple adders.  A possible solution to synthesize this better on devices with lesser routing may be to split up these adders into multiple cycles.  You take a hit by using more cycles - but this may make routing a bit easier on the chip as well, and if that is the case, then you may be able to better unroll the SHA rounds and still obtain a good clock rate.

Extra pipelining can be inserted.  I have sent some code to fpgaminer that shows how it is done.
newbie
Activity: 44
Merit: 0
just adding my 2 cents to this thread

At work I have a lot of Spartan-3E 1600's laying around - it's our main dev FPGA for our main product.

I initially wasted a day debugging the serial connection.  But after re-reading the thread, I saw you had a 120MHz clock source.  So just some counter adjustments for the uart's clock dividers fixed that.

Unfortunately, the Spartan 3 routing is horrible.  I've only been able to synthesize TheSeven's version of the code with a depth of 2 - and that barely missed the timing analysis running it all at 50MHz ( 214 signals had a data path delay of 21.5 ns - 50MHz = 20 ns ).

Gatewise, it looks like the 3E 1600 part with a depth of 2 is using 23% of the FFs and 38% of 4-input LUTs.  So I imagine gatewise it could hold a depth of 3, possibly 4.

But routing is another story.  Even with a depth of 2 running at 50MHz barely works.  I could probably underclock it a bit more, but then I have to use the clk_dv output on the dcm with some odd scaling factors.

This system clocked in using the pyminer a hashrate of 3.24 MH/s.

It may be possible to further optimize this for smaller devices by doing some more pipelining.  But I'm not sure how well that will fit into your parameterized SHA rounds.  But basically by looking at the static timing report, it looks like the longest path delays have to go through multiple adders.  A possible solution to synthesize this better on devices with lesser routing may be to split up these adders into multiple cycles.  You take a hit by using more cycles - but this may make routing a bit easier on the chip as well, and if that is the case, then you may be able to better unroll the SHA rounds and still obtain a good clock rate.
hero member
Activity: 560
Merit: 517
Quote
edit: looks like 45k LUTs isnt nearly enough for a pipelined version. Might have to make my own board for this =/
The current version of the code can unroll as much or as little as you want, so you can make it fit into 45K.
inh
full member
Activity: 155
Merit: 100
This board should be good enough to get playing with this, yea?

http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,836&Prod=ATLYS

edit: looks like 45k LUTs isnt nearly enough for a pipelined version. Might have to make my own board for this =/
hero member
Activity: 560
Merit: 517
Quote
Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?
Probably what kokjo said, but I wouldn't rule out the possibility of making it more generalized. The algorithm can be optimized specifically for Bitcoin, and there are benefits to that which include both increased ASIC performance and reduced cost. What one would have to do is determine how much interest by non-Bitcoin parties there would be in such a chip, and if the cost benefits lost due to supporting a more general approach are out-weighed by the increased demand.

Also, it might be a good idea to keep the ASIC general simply because of risk. If Bitcoin fails to meet exceptions during the production of the ASIC chips, the manufacturer would at least have chips that could possibly be sold into other markets.

On a somewhat related note, I'd like to stress that I greatly believe any ASIC developments should all be done in the open, with open-source/open-hardware licenses on as much of the process as possible up to and including the free release of the masks. Now, that's a pretty bold statement, since the masks are very expensive. But if, for example, the entire venture is publicly funded that might not be such a crazy idea.
legendary
Activity: 1050
Merit: 1000
You are WRONG!
One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
Both of these can easily be handled on the controller CPU up to an FPGA speed of several gigahashes, so this is just a firmware matter and not relevant for the actual ASIC.
Oh, and yes, my miner ignores the requested difficulty. This just means that it keeps sending difficulty 1 shares, which means that it refuses to reduce the server load and ends up with roughly half of the shares being rejected, but will work perfectly fine apart from that. While it should of course be fixed, this issue isn't critical.

Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?

to make it highly specialized. Cheesy
full member
Activity: 154
Merit: 100
One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
Both of these can easily be handled on the controller CPU up to an FPGA speed of several gigahashes, so this is just a firmware matter and not relevant for the actual ASIC.
Oh, and yes, my miner ignores the requested difficulty. This just means that it keeps sending difficulty 1 shares, which means that it refuses to reduce the server load and ends up with roughly half of the shares being rejected, but will work perfectly fine apart from that. While it should of course be fixed, this issue isn't critical.

Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
Both of these can easily be handled on the controller CPU up to an FPGA speed of several gigahashes, so this is just a firmware matter and not relevant for the actual ASIC.
Oh, and yes, my miner ignores the requested difficulty. This just means that it keeps sending difficulty 1 shares, which means that it refuses to reduce the server load and ends up with roughly half of the shares being rejected, but will work perfectly fine apart from that. While it should of course be fixed, this issue isn't critical.
hero member
Activity: 686
Merit: 564
One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Why have servers at all? For a couple of dollars more, you could equip those boards with an ethernet port and a small ARM processor running linux, with a ready-to-go firmware preinstalled, which would be configured through a web interface. Or possibly a backplane with the ARM processor and ethernet, which has a couple of slots for crypto slave boards containing the ASICs.

I like this idea, however maybe it's a case of walking before you run?  Surely the flexibility of PCIe makes it useful in may cases, and adapting that to an all-in-one arm/linux unit is a small (and logical) next step.

I know I have two computers with spare PCIe slots that I'd use to start with, test it out, make sure it all works well.  Unless the arm/linux all-in-one miner is cheaper (by some measure)...

I think that not everyone who would consider buying an ASIC miner will want to have a dedicated machine for it - just like today not everyone has a dedicated GPU mining rig.

I'd estimate the cost of the all-in-one standalone solution at about the same as a PCIe card (PCIe is a fast, but completely overkill and complicated interface, which just doesn't belong here). If you consider the price of the PC mainboard / CPU the standalone solution will definitely be cheaper.

The ASIC card would be dedicated anyway. So why occupy a PC with it, if you can have the same functionality for the same price without the need for a PC? Yeah, you might save an ethernet switch port Grin
inh
full member
Activity: 155
Merit: 100
Those of you with the xc6slx150's, what board are you using, or is it a custom design? I've yet to find anything (somewhat affordable) with that model on it..
Pages:
Jump to: