Modular FPGA Miner Hardware Design Development - page 25.

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: TheSeven on July 05, 2011, 11:48:37 AM

[...]
Each board will need its dedicated I2C bus anyway, so why not have a dedicated JTAG bus as well?

We only need one I²C bus, it just needs to be fragmented into different partitions by a switch. I mentioned one example for such a switch before, the NXP PCA9547PW. The reason why I²C needs to be partitioned is the limited availability of addresses on the bus. That is not a problem for the JTAG bus, though. Logically, you can make it as long as you like. Electrically, you need drivers in the TCK and TMS lines for a design with many chips.

Given that there wasn't more than one I²C bus planned and that no more than one JTAG chain are needed, can you clarify why you think more JTAG chains are needed?

Quote from: TheSeven on July 05, 2011, 11:48:37 AM

For the cheap boards, you could just connect to the USB pins via the DIMM connector, and basically just have a hub and power supply on the backplane. The more expensive boards might have an ARM and ethernet.

So not use the JTAG or I²C signals on the bus connector at all, just the USB D+ and D- lines? That is a very interesting idea: it simplifies the design a lot if it works: none of the non-supply signals I mentioned in my last post are needed in that case, as the backplane can detect the presence of a DIMM the "USB" way. So a simple backplane contains wires and a couple of mini-USB connectors? Or it contains a home-grown USB-hub? (I am limiting myself to a cheap backplane in this discussion because the intelligent one with a CPU can be build on top of the cheap design in a second step).

This is basically shifting the interface chip completely on the DIMM, removing (by design, not material cost) the overhead of supporting hybrid DIMMs. Of the different options, it is not the cheapest, but certainly elegant:

slave-only DIMMs, USB-chip only on backplane: cheapest, JTAG and I²C on bus
hybrid DIMMs, USB-chip only on DIMMs: mid price, simple bus with only USB, but needs hub somewhere
hybrid DIMMs, USB-chip both on DIMMs and backplane: most expensive, JTAG and I²C on bus

Quote from: Olaf.Mandel on July 05, 2011, 03:22:44 AM

[...]
Oh, and don't forget to add a means for boards to interrupt the backplane, e.g. when a share was found or keyspace was exhausted.

Is that actually needed? I agree that a later backplane that contains a CPU may make good use of the interrupt, but for the USB based devices it is only a question of how much data to transmit: you still need to use polling because USB does not have a direct IRQ. I admit that reading the GPIO value of an FT2232 connected to the IRQ signal is quicker than reading the JTAG chain. But how bad is that for even 10 boards each with 16 FPGAs?

TheSeven

hero member

Activity: 504

Merit: 500

FPGA Mining LLC

Quote from: Olaf.Mandel on July 05, 2011, 03:22:44 AM

Auto bridge: The backplane can automatically bridge the JTAG signals over unpopulated slots. If not implemented, jumpers need to be used to bridge open slots.

While this might be sensible (for cost reasons) for some low-cost backplanes, I don't think it will scale well for bigger backplanes with multi-FPGA cards.
Each board will need its dedicated I2C bus anyway, so why not have a dedicated JTAG bus as well?
For the cheap boards, you could just connect to the USB pins via the DIMM connector, and basically just have a hub and power supply on the backplane. The more expensive boards might have an ARM and ethernet.

Quote from: Olaf.Mandel on July 05, 2011, 03:22:44 AM

One additional issue we should discuss but which which has no implication on the current step of the specification process: Should the FPGAs also be connected to the I²C bus? If no, we can potentially save one bi-directional level shifter as the EEPROM can run at a different voltage than the FPGAs..

I think it will be advantageous to connect I2C to the FPGAs, and at least have the option to transmit work/shares that way. A level shifter really doesn't cost much compared to an FPGA, and if we go for a 2.5V interface we can possibly remove it altogether.

Oh, and don't forget to add a means for boards to interrupt the backplane, e.g. when a share was found or keyspace was exhausted.

Olaf.Mandel

member

Activity: 70

Merit: 10

While the poll for which FPGA to use is running, we can already decide on the specifics of the DIMM connector. This can probably be split into four steps:

Conceptual: see below
Electrical: Specify a table of signal name, voltage, current and comments (e.g. where to put pull-up resistors, ...)
Mechanical: Specify which DIMM connector to use and how much space to leave between DIMMs and around the DIMM in general
Pinout: Specify a table of pin number and signal name

To get the discussion started, here is a suggestion for the conceptual step: which features to include and how to solve certain issues. While I write this in a firm language, it is only a suggestion. Everyone on the board should comment or amend this and then O_Shovah should probably make a final selection.

The following signals to be included into the connector represent the minimum needed for our design of the DIMM:

Signal	Description
+V	The supply voltage for the FPGAs on the DIMM. Has a high current and a wide voltage range.
+VBUS	The supply voltage for all logic signals on the bus. Provided to the DIMM to power its interface logic.
GND	The return for both +V and +VBUS. All logic signals are also relative to this signal.
TCK	The clock signal for the JTAG bus. Input into the DIMM.
TMS	The mode select signal for the JTAG bus. Input into the DIMM.
TDI	The serial data input signal for the JTAG bus. Input into the DIMM.
TDO	The serial data output signal for the JTAG bus. Output from the DIMM.

The following signals to be included into the connector are not strictly needed in all use cases. Their inclusion depends on the implementation of the feature listed below:

Signal	Feature	Description
DET_DIMM	Auto bridge	Pin to allow the backplane to detect the presence of a DIMM in the slot. Shorted to GND on the DIMM.
DET_BP	Hybrid board	Pin to allow the DIMM to detect the presence of a backplane. Shorted to GND on the backplane.
SCL	EEPROM	The clock signal of an I²C bus. Input into the DIMM.
SDA	EEPROM	The serial data signal of an I²C bus. Bidirectional I/O.
LED	Info-LED	Signal to enable an LED on the DIMM. Input into the DIMM.

List of features:

Auto bridge: The backplane can automatically bridge the JTAG signals over unpopulated slots. If not implemented, jumpers need to be used to bridge open slots.
Hybrid board: The DIMM can also operate in a standalone mode without a backplane.
EEPROM: The DIMM contains an EEPROM to store details of the DIMM: type and number of FPGAs, batch number, serial number...
Info-LED: The backplane can switch on an LED on the edge of the DIMM under software control. May be used to identify defective boards to the user. This feature could also be implemented via I²C.

One additional issue we should discuss but which which has no implication on the current step of the specification process: Should the FPGAs also be connected to the I²C bus? If no, we can potentially save one bi-directional level shifter as the EEPROM can run at a different voltage than the FPGAs..

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: O_Shovah on July 04, 2011, 03:21:52 PM

[...]
Then give the poll your click Wink

I scroll down to the end of the discussion too fast, it seems...

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

Quote from: Olaf.Mandel on July 04, 2011, 03:18:51 PM

Xilinx Spartan 6 XC6LX150: cheaper, claims to be faster.

Then give the poll your click Wink

Olaf.Mandel

member

Activity: 70

Merit: 10

Xilinx Spartan 6 XC6LX150: cheaper, claims to be faster.

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

As this is basically down to Xilinx Spartan 6 Lx 150 Vs Altera Cyclone IV 75K i think we should have a poll on that.

Although i personally would prefere to have a design proofen to be working in simulation at least,plus we will be dependet on someone to provide us with the bitstream in case we use The Spartan.

Please make your decision on your FPGA of choice. This poll will be running until Saturday the 9.07.2011 22:00

makomk

hero member

Activity: 686

Merit: 564

Quote from: OrphanedGland on July 03, 2011, 08:15:45 PM

Good to see you have allowed 3% for interface changes Roll Eyes

Worse, actually - 3% for adding an interface other than JTAG at all ;-). I figure that if it's possible to offer a choice between a decent selection of basic interface options, that'll be enough and anything fancier like Ethernet is probably best done in an external microcontroller. Of course, that's a big if!

Edited to add:

Quote from: Olaf.Mandel on July 04, 2011, 04:23:18 AM

If you can compile for the XC6SLX150, can you just take any of the currently available codes and compile it with default settings? Even a not optimised result is better than nothing! We just want to know if the FPGA can run a fully unrolled core with more than 84MHz.

I've heard that with the default settings you can't actually get it to pass place-and-route. (The workaround is *probably* quite easy; modifying the Map settings to ignore user timing constraints and run in non-timing driven mode should work, though obviously I can't test this.)

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: TheSeven on July 04, 2011, 03:32:40 AM

[...]
What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates.
As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.

Xilinx would be my preferred solution, because I read more of their datasheets. But while 190MHash/s is a very impressive number, I would really like to have someone state that this or that available code gives this or that performance. Especially since not all of us can compile code for that FPGA.

If you can compile for the XC6SLX150, can you just take any of the currently available codes and compile it with default settings? Even a not optimised result is better than nothing! We just want to know if the FPGA can run a fully unrolled core with more than 84MHz.

mimarob

full member

Activity: 354

Merit: 103

Watching this thread with great interest, I know vhdl better than verilog :-)

TheSeven

hero member

Activity: 504

Merit: 500

FPGA Mining LLC

Quote from: newMeat1 on July 03, 2011, 03:35:44 PM

makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates.
As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.

OrphanedGland

member

Activity: 70

Merit: 10

Good to see you have allowed 3% for interface changes Roll Eyes

makomk

hero member

Activity: 686

Merit: 564

Quote from: max3t on July 03, 2011, 03:43:50 PM

I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?

The fully unrolled design does one hash per clock cycle, so yeah, they are the same.

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

max3t

newbie

Activity: 25

Merit: 0

Quote from: Olaf.Mandel on July 03, 2011, 12:58:56 PM

[...]In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:[...]

I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?

Btw congrats makomk, your result sounds great Wink

newMeat1

full member

Activity: 210

Merit: 100

You guys might have seen my work on a Cyclone IV board on this thread:
http://forum.bitcoin.org/index.php?topic=9047.msg299381#msg299381

makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

Just last night I waded through the documents and made a spreadsheet of the pinout. I'll sell it for 2 BTC, PM me if interested. It will get you started-- and save you several hours of boring, tedious, sometimes confusing work. It's for a JTAG-configured device with one clock input. I might also share a PCB design for several BTC.

Or is this one of those threads where a lot of talk happens, but no action? (excluding makomk, of course)

It's not possible for one set of pads to support both EP4CE75 and EP4CE115, unfortunately- too many different pins.

Olaf.Mandel

member

Activity: 70

Merit: 10

New version of the table with lower Altera prices (assuming 1USD=0.6891EUR):

Chip	Rate [MHash/s]	Power [W]	Price [EUR]	Rate/Price [MHash/s/EUR]	Rate/Power [MHash/J]
Altera EP4CE75F23C7N	109.29	-	156.75	0.697	-
Altera EP4CE115F23C7N	80	4.4	271.79	0.294	18.2
Altera EP4CE115F23C7N	109	-	271.79	0.401	-
Xilinx XC6SLX75-3CSG484C	?	?	67.29	?	?
Xilinx XC6SLX100-3CSG484C	?	?	83.86	?	?
Xilinx XC6SLX150-3CSG484C	?	?	120.47	?	?
Xilinx XC3S500E-5CPG132C	3.125	0.78	20.38	0.153	4
Xilinx XC5VLX110-1FFG676C	120	-	1126.51	0.107	-

An intermediate result: in order to beat the Altera EP4CE75 (in terms of Rate/Price), the Xilinx XC6SLX75 must achieve more than 46.9MHash/s, and the XC6SLX150 must beat 84MHash/s.

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: O_Shovah on July 03, 2011, 03:04:59 PM

[...]
As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)

AFAIK: Correct.

Quote from: O_Shovah on July 03, 2011, 03:04:59 PM

[...]
If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.

Same BOM, but different boards: the pinout is different. Though much of the development work is identical, redrawing the layout for the other chip is a lot of work.

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

Quote from: OrphanedGland on July 03, 2011, 06:58:21 AM

Cheaper just to choose one. Looks like LX150 is probably the best bet, but would be nice to have some compilation results.

I had a look into the documentation of both the Spartan 6 and the Cyclone IV series.
Xilinx SP 6 :http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
Altera C IV :http://www.altera.com/literature/hb/cyclone-iv/cyiv-53001.pdf

As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)
The EPE power calculation gives the ability to estimate the power needed for each rail.

Quote from: makomk on July 03, 2011, 08:42:53 AM

Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.

So i see it as a given it is possible to run a full miner core on the Altera cyclone IV 75k. Therefore it shall be one final canidate.

For increased performance lateron we might have a look into the Cyclone IV GX 150k device. Maybe someone could run a compilation for this one. Also i checked the website of altera and found the FPGA'S in their online shop to be cheaper than at digikey in some cases.This might improve economy.

If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: makomk on July 03, 2011, 08:42:53 AM

[...]
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.

Very good. In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:

Chip	Rate [MHash/s]	Power [W]	Price [EUR]	Rate/Price [MHash/s/EUR]	Rate/Power [MHash/J]
Altera EP4CE75F23C7N	109.29	-	174.47	0.626	-
Altera EP4CE115F23C7N	80	4.4	303.69	0.263	18.2
Altera EP4CE115F23C7N	109	-	303.69	0.359	-
Xilinx XC6SLX75-3CSG484C	?	?	67.29	?	?
Xilinx XC6SLX100-3CSG484C	?	?	83.86	?	?
Xilinx XC6SLX150-3CSG484C	?	?	120.47	?	?
Xilinx XC3S500E-5CPG132C	3.125	0.78	20.38	0.153	4
Xilinx XC5VLX110-1FFG676C	120	-	1126.51	0.107	-

Please fill in what is missing.

Topic: Modular FPGA Miner Hardware Design Development - page 25. (Read 119327 times)