Pages:
Author

Topic: Modular FPGA Miner Hardware Design Development - page 25. (Read 119320 times)

member
Activity: 70
Merit: 10
[...]
Each board will need its dedicated I2C bus anyway, so why not have a dedicated JTAG bus as well?

We only need one I2C bus, it just needs to be fragmented into different partitions by a switch. I mentioned one example for such a switch before, the NXP PCA9547PW. The reason why I2C needs to be partitioned is the limited availability of addresses on the bus. That is not a problem for the JTAG bus, though. Logically, you can make it as long as you like. Electrically, you need drivers in the TCK and TMS lines for a design with many chips.

Given that there wasn't more than one I2C bus planned and that no more than one JTAG chain are needed, can you clarify why you think more JTAG chains are needed?

For the cheap boards, you could just connect to the USB pins via the DIMM connector, and basically just have a hub and power supply on the backplane. The more expensive boards might have an ARM and ethernet.

So not use the JTAG or I2C signals on the bus connector at all, just the USB D+ and D- lines? That is a very interesting idea: it simplifies the design a lot if it works: none of the non-supply signals I mentioned in my last post are needed in that case, as the backplane can detect the presence of a DIMM the "USB" way. So a simple backplane contains wires and a couple of mini-USB connectors? Or it contains a home-grown USB-hub? (I am limiting myself to a cheap backplane in this discussion because the intelligent one with a CPU can be build on top of the cheap design in a second step).

This is basically shifting the interface chip completely on the DIMM, removing (by design, not material cost) the overhead of supporting hybrid DIMMs. Of the different options, it is not the cheapest, but certainly elegant:

  • slave-only DIMMs, USB-chip only on backplane: cheapest, JTAG and I2C on bus
  • hybrid DIMMs, USB-chip only on DIMMs: mid price, simple bus with only USB, but needs hub somewhere
  • hybrid DIMMs, USB-chip both on DIMMs and backplane: most expensive, JTAG and I2C on bus

[...]
Oh, and don't forget to add a means for boards to interrupt the backplane, e.g. when a share was found or keyspace was exhausted.

Is that actually needed? I agree that a later backplane that contains a CPU may make good use of the interrupt, but for the USB based devices it is only a question of how much data to transmit: you still need to use polling because USB does not have a direct IRQ. I admit that reading the GPIO value of an FT2232 connected to the IRQ signal is quicker than reading the JTAG chain. But how bad is that for even 10 boards each with 16 FPGAs?
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
  • Auto bridge: The backplane can automatically bridge the JTAG signals over unpopulated slots. If not implemented, jumpers need to be used to bridge open slots.

While this might be sensible (for cost reasons) for some low-cost backplanes, I don't think it will scale well for bigger backplanes with multi-FPGA cards.
Each board will need its dedicated I2C bus anyway, so why not have a dedicated JTAG bus as well?
For the cheap boards, you could just connect to the USB pins via the DIMM connector, and basically just have a hub and power supply on the backplane. The more expensive boards might have an ARM and ethernet.

One additional issue we should discuss but which which has no implication on the current step of the specification process: Should the FPGAs also be connected to the I2C bus? If no, we can potentially save one bi-directional level shifter as the EEPROM can run at a different voltage than the FPGAs..

I think it will be advantageous to connect I2C to the FPGAs, and at least have the option to transmit work/shares that way. A level shifter really doesn't cost much compared to an FPGA, and if we go for a 2.5V interface we can possibly remove it altogether.

Oh, and don't forget to add a means for boards to interrupt the backplane, e.g. when a share was found or keyspace was exhausted.
member
Activity: 70
Merit: 10
While the poll for which FPGA to use is running, we can already decide on the specifics of the DIMM connector. This can probably be split into four steps:
  • Conceptual: see below
  • Electrical: Specify a table of signal name, voltage, current and comments (e.g. where to put pull-up resistors, ...)
  • Mechanical: Specify which DIMM connector to use and how much space to leave between DIMMs and around the DIMM in general
  • Pinout: Specify a table of pin number and signal name

To get the discussion started, here is a suggestion for the conceptual step: which features to include and how to solve certain issues. While I write this in a firm language, it is only a suggestion. Everyone on the board should comment or amend this and then O_Shovah should probably make a final selection.

The following signals to be included into the connector represent the minimum needed for our design of the DIMM:

SignalDescription
+VThe supply voltage for the FPGAs on the DIMM. Has a high current and a wide voltage range.
+VBUSThe supply voltage for all logic signals on the bus. Provided to the DIMM to power its interface logic.
GNDThe return for both +V and +VBUS. All logic signals are also relative to this signal.
TCKThe clock signal for the JTAG bus. Input into the DIMM.
TMSThe mode select signal for the JTAG bus. Input into the DIMM.
TDIThe serial data input signal for the JTAG bus. Input into the DIMM.
TDOThe serial data output signal for the JTAG bus. Output from the DIMM.

The following signals to be included into the connector are not strictly needed in all use cases. Their inclusion depends on the implementation of the feature listed below:

SignalFeatureDescription
DET_DIMMAuto bridgePin to allow the backplane to detect the presence of a DIMM in the slot. Shorted to GND on the DIMM.
DET_BPHybrid boardPin to allow the DIMM to detect the presence of a backplane. Shorted to GND on the backplane.
SCLEEPROMThe clock signal of an I2C bus. Input into the DIMM.
SDAEEPROMThe serial data signal of an I2C bus. Bidirectional I/O.
LEDInfo-LEDSignal to enable an LED on the DIMM. Input into the DIMM.

List of features:

  • Auto bridge: The backplane can automatically bridge the JTAG signals over unpopulated slots. If not implemented, jumpers need to be used to bridge open slots.
  • Hybrid board: The DIMM can also operate in a standalone mode without a backplane.
  • EEPROM: The DIMM contains an EEPROM to store details of the DIMM: type and number of FPGAs, batch number, serial number...
  • Info-LED: The backplane can switch on an LED on the edge of the DIMM under software control. May be used to identify defective boards to the user. This feature could also be implemented via I2C.

One additional issue we should discuss but which which has no implication on the current step of the specification process: Should the FPGAs also be connected to the I2C bus? If no, we can potentially save one bi-directional level shifter as the EEPROM can run at a different voltage than the FPGAs..
member
Activity: 70
Merit: 10
[...]
Then give the poll your click  Wink

I scroll down to the end of the discussion too fast, it seems...
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Xilinx Spartan 6 XC6LX150: cheaper, claims to be faster.

Then give the poll your click  Wink
member
Activity: 70
Merit: 10
Xilinx Spartan 6 XC6LX150: cheaper, claims to be faster.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
As this is basically down to  Xilinx Spartan 6 Lx 150  Vs  Altera Cyclone IV 75K i think we should have a poll on that.

Although i personally would prefere to have a design proofen to be working in simulation at least,plus we will be dependet on someone to provide us with the bitstream in case we use The Spartan.


Please make your decision on your FPGA of choice. This poll will be running until Saturday the 9.07.2011 22:00




hero member
Activity: 686
Merit: 564
Good to see you have allowed 3% for interface changes  Roll Eyes
Worse, actually - 3% for adding an interface other than JTAG at all ;-). I figure that if it's possible to offer a choice between a decent selection of basic interface options, that'll be enough and anything fancier like Ethernet is probably best done in an external microcontroller. Of course, that's a big if!

Edited to add:
If you can compile for the  XC6SLX150, can you just take any of the currently available codes and compile it with default settings? Even a not optimised result is better than nothing! We just want to know if the FPGA can run a fully unrolled core with more than 84MHz.
I've heard that with the default settings you can't actually get it to pass place-and-route. (The workaround is *probably* quite easy; modifying the Map settings to ignore user timing constraints and run in non-timing driven mode should work, though obviously I can't test this.)
member
Activity: 70
Merit: 10
[...]
What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates.
As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.

Xilinx would be my preferred solution, because I read more of their datasheets. But while 190MHash/s is a very impressive number, I would really like to have someone state that this or that available code gives this or that performance. Especially since not all of us can compile code for that FPGA.

If you can compile for the  XC6SLX150, can you just take any of the currently available codes and compile it with default settings? Even a not optimised result is better than nothing! We just want to know if the FPGA can run a fully unrolled core with more than 84MHz.
full member
Activity: 354
Merit: 103
Watching this thread with great interest, I know vhdl better than verilog :-)
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates.
As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.
member
Activity: 70
Merit: 10
Good to see you have allowed 3% for interface changes  Roll Eyes
hero member
Activity: 686
Merit: 564
I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?
The fully unrolled design does one hash per clock cycle, so yeah, they are the same.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
newbie
Activity: 25
Merit: 0
[...]In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:[...]

I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?

Btw congrats makomk, your result sounds great Wink
full member
Activity: 210
Merit: 100
You guys might have seen my work on a Cyclone IV board on this thread:
http://forum.bitcoin.org/index.php?topic=9047.msg299381#msg299381

makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

Just last night I waded through the documents and made a spreadsheet of the pinout. I'll sell it for 2 BTC, PM me if interested. It will get you started-- and save you several hours of boring, tedious, sometimes confusing work. It's for a JTAG-configured device with one clock input. I might also share a PCB design for several BTC.

Or is this one of those threads where a lot of talk happens, but no action? (excluding makomk, of course)

It's not possible for one set of pads to support both EP4CE75 and EP4CE115, unfortunately- too many different pins.
member
Activity: 70
Merit: 10
New version of the table with lower Altera prices (assuming 1USD=0.6891EUR):

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE75F23C7N109.29-156.750.697-
Altera EP4CE115F23C7N804.4271.790.29418.2
Altera EP4CE115F23C7N109-271.790.401-
Xilinx XC6SLX75-3CSG484C??67.29??
Xilinx XC6SLX100-3CSG484C??83.86??
Xilinx XC6SLX150-3CSG484C??120.47??
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534
Xilinx XC5VLX110-1FFG676C120-1126.510.107-

An intermediate result: in order to beat the Altera EP4CE75 (in terms of Rate/Price), the Xilinx XC6SLX75 must achieve more than 46.9MHash/s, and the XC6SLX150 must beat 84MHash/s.
member
Activity: 70
Merit: 10
[...]
As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)

AFAIK: Correct.

[...]
If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.  

Same BOM, but different boards: the pinout is different. Though much of the development work is identical, redrawing the layout for the other chip is a lot of work.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Cheaper just to choose one.  Looks like LX150 is probably the best bet, but would be nice to have some compilation results.

I had a look into the documentation of both the Spartan 6 and the Cyclone IV series.
Xilinx SP 6 :http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
Altera C IV :http://www.altera.com/literature/hb/cyclone-iv/cyiv-53001.pdf

As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)
The EPE power calculation gives the ability to estimate the power needed for each rail.

Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.

So i see it as a given it is possible to run a full miner core on the Altera cyclone IV 75k. Therefore it shall be one final canidate.

For increased performance lateron we might have a look into the Cyclone IV GX 150k device. Maybe someone could run a compilation for this one.   Also i checked the website of altera and found the FPGA'S in their online shop to be cheaper than at digikey in some cases.This might improve economy.


If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.  
member
Activity: 70
Merit: 10
[...]
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.

Very good. In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE75F23C7N109.29-174.470.626-
Altera EP4CE115F23C7N804.4303.690.26318.2
Altera EP4CE115F23C7N109-303.690.359-
Xilinx XC6SLX75-3CSG484C??67.29??
Xilinx XC6SLX100-3CSG484C??83.86??
Xilinx XC6SLX150-3CSG484C??120.47??
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534
Xilinx XC5VLX110-1FFG676C120-1126.510.107-

Please fill in what is missing.
Pages:
Jump to: