Modular FPGA Miner Hardware Design Development - page 10.

Olaf.Mandel

member

Activity: 70

Merit: 10

Quote from: li_gangyi on July 24, 2011, 03:15:53 PM

It looks good, I've checked and it looks like it has the adequate number of decoupling capacitors, and routed quite well. Now we just need to see how the rest of the parts fit in.

Thanks. I just uploaded a new PSU version. The only important change is the names of the 1.2V nets. those were originally VCCAUX[12], which don't exist in the FPGA schematic. The rest are cosmetic changes. Comments on getting rid of the through-hole parts?

li_gangyi

full member

Activity: 157

Merit: 100

It looks good, I've checked and it looks like it has the adequate number of decoupling capacitors, and routed quite well. Now we just need to see how the rest of the parts fit in.

Olaf.Mandel

member

Activity: 70

Merit: 10

I uploaded a new version of li_gangyi's PSU schematic and board to github and dropbox. Commit log:

Bugfixes and cosmetic changes to PSU design

Renamed power signal names (to VCCINT[01])
Changed connectors to SMD versions (includes part in library)
Explicitly added PSU_EN signal
Changed labeling of components (1.2K -> 1k2)
Changed paper format from letter to DIN-A4

I thought that going all SMD looks nicer... Undo that if you disagree.

Still TODO: There is over a hundred error messages by the design rules check: the way the traces and polygons are done is a problem for eagle. Vias to the other layers need to be added. And what about proper heat sink areas?

li_gangyi

full member

Activity: 157

Merit: 100

Quote from: newMeat1 on July 23, 2011, 03:17:49 PM

Well, I'm talking about ceramic surface-mount caps. By bloated I guess you mean an electrolytic capacitor with leads? Why are you using one of those? I would stick to all SMT

Edit: although I agree that high-current buses whould have low-ESR

I've not seen 330uF ceramic capacitors yet, I think on digikey the highest u can find is probably 100uF. Electrolytics can be surface-mount types.

We're not trying to cut costs for the initial prototype, we're trying to make sure the chances of it working are high. If we want to, we can start removing bits to test stability after the prototype is done.

newMeat1

full member

Activity: 210

Merit: 100

Well, I'm talking about ceramic surface-mount caps. By bloated I guess you mean an electrolytic capacitor with leads? Why are you using one of those? I would stick to all SMT

Edit: although I agree that high-current buses whould have low-ESR

li_gangyi

full member

Activity: 157

Merit: 100

Quote from: newMeat1 on July 23, 2011, 02:38:06 PM

You can switch from low-ESR caps to normal caps to save a bit more money. The low-ESR caps will use slightly less energy, but they won't filter the signal any better (unless you can show me solid proof to the contrary). I get my stance from this:
http://www.ultracad.com/mentor/esr%20and%20bypass%20caps.pdf

In the SMPSU, you'd want to have caps rated for the ripple current, you can get away with using standard capacitors, but you'd need multiple parallel caps. The standard ESR will also cause temps to go up, leading to bloated capacitors. Seen this way too often on cheapass PSUs.

TheSeven

hero member

Activity: 504

Merit: 500

FPGA Mining LLC

Quote from: magik on July 23, 2011, 11:39:39 AM

With what I'm used to that sounds perfectly fine. We use a 100 MHz crystal - and the PLL/DCM can easily turn that into 100 or 50 using the least noisy clk0 output, and we can get all kinds of different ranges with the clkfx output - multiply the clock or reduce it in IIRC any integer between 1-256 over 1-256

I'm just unsure of what clock levels we can obtain we the hashing code. You know, I'll try to run some compiles right now and see what we can get, but now that I think about it yeah, if someone else said they had it running around 100Mhz then that sounds about right, we can tune from there.

There's a lot more projects in that open source fpga miner github now heh.... hrm.... i should probably go read that thread again

btw if anyone wants to read up on it:
Spartan 6 FPGA Clocking Resources
http://www.xilinx.com/support/documentation/user_guides/ug382.pdf

I had a quick try, and without any optimizations it was able to reach 50MHz. 100MHz is pretty certainly doable, and there are even reports that, with proper optimizations, a 190MHz synthesis would succeed in like 5% of the attempts. To reach optimum performance, it's very likely that we'll need CLKFX.

newMeat1

full member

Activity: 210

Merit: 100

Well I am surprised by this thread. I never expected a group to pull together and finish up a design. I still think you've made some poor design decisions, but that was bound to happen with this group mentality.

Let me know if you want helping moving from a 4-layer board to a 2-layer board. That should save you at least $10 for every board.

You can switch from low-ESR caps to normal caps to save a bit more money. The low-ESR caps will use slightly less energy, but they won't filter the signal any better (unless you can show me solid proof to the contrary). I get my stance from this:
http://www.ultracad.com/mentor/esr%20and%20bypass%20caps.pdf

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

@li_gangyi:
I m fine with you offer. This procedure will simplify the money things.
So everybody who would do the firmware and software programming should get a copy.

Wow,suprise. I didn't know there were such detailed Altium schematics of a commercial board avaidable.Certainly a great source for comparison.
Maybe you want to change the clock source you propose into the MSP/FPGA setup yourself as i wont be abled to do this the next days.

magik

newbie

Activity: 44

Merit: 0

With what I'm used to that sounds perfectly fine. We use a 100 MHz crystal - and the PLL/DCM can easily turn that into 100 or 50 using the least noisy clk0 output, and we can get all kinds of different ranges with the clkfx output - multiply the clock or reduce it in IIRC any integer between 1-256 over 1-256

I'm just unsure of what clock levels we can obtain we the hashing code. You know, I'll try to run some compiles right now and see what we can get, but now that I think about it yeah, if someone else said they had it running around 100Mhz then that sounds about right, we can tune from there.

There's a lot more projects in that open source fpga miner github now heh.... hrm.... i should probably go read that thread again

btw if anyone wants to read up on it:
Spartan 6 FPGA Clocking Resources
http://www.xilinx.com/support/documentation/user_guides/ug382.pdf

li_gangyi

full member

Activity: 157

Merit: 100

I can fund all 5 initial boards and then some1 come up with a simple test procedure, say a Jtag boundary scan, I run the tests and if they pass I sell the board at cost + shipping to whoever needs them. How's that?

Regarding the clk source, I know the AVnet boards that the guys over on the Open Source FPGA thread are using, have an external 100Mhz oscillator.

http://www.files.em.avnet.com/files/178/xlx_s6_lx150t_dev-sch-revd090810.pdf

Sheet 20 shows the clocks. (Thanks to Fpgaminer for pointing me in the right direction.)

We should go with this known working solution.

Once Olaf is done, including the MSP430, I can route in the PSU, I think we should leave out the external LDO for the MSP for now and just use suspend, to keep it simple.

Then we'd be able to let every1 vet through the design, spot any major errors and move on to production.

magik

newbie

Activity: 44

Merit: 0

FPGA Power considerations

Spartan 6 FPGA Power Management
http://www.xilinx.com/support/documentation/user_guides/ug394.pdf

Quote

The FPGA can only enter suspend mode if enabled in the configuration bitstream (see
Enable the Suspend Feature and Glitch Filtering, page 14). The SUSPEND pin must be Low
during power up and configuration. Once enabled through the bitstream, and the
SUSPEND_SYNC primitive is not present in the design, when the SUSPEND pin is
asserted, the FPGA unconditionally and quickly enters suspend mode.
...
There are four possible ways to exit suspend mode in a powered system:
• Drive the SUSPEND input Low, exiting suspend mode.
• If multi-pin wake-up mode is enabled, drive the SUSPEND input Low and then assert
any one of the user enabled SCP pins.
• Pulse the PROGRAM_B input Low to reset the FPGA and cause the FPGA to
reprogram.
• Power cycle the FPGA, causing the FPGA to reprogram.

sounds pretty simple to put this guy into suspend, and it will retain it's programming in that state too, and all that's needed is to enable it in the bitstream and to assert the SUSPEND pin when you want it to go to sleep. Sounds like you guys want to tie that to the MCP so then it can control if the FPGA is on or not.

there is also a hibernate mode, but it basically just sounds like a way to safer way to power up/down in hot-swapping situations

What is the configuration consensus again? Bit banging the MCP's I/O and a JTAG chain of the 2 FPGAs?

it also says:

Quote

Saving Power
...
The lowest power state is the quiescent state with no inputs toggling, all outputs disabled,
and no pull-up or pull-down resistors in use.

magik

newbie

Activity: 44

Merit: 0

I think another thing to note is locations of everyone in this project. If you plan on shipping these things around to different people for different things the shipping costs alone may become very large. I'm in the USA, California. But it sounds like a lot of you guys are in Europe or Asia... I hate to say it because it may mean it will be more difficult for me to get a board here in the US, but it almost sounds like you guys in Euro/Asia are closer together - and it might make more sense to get everything done semi-close to you guys, e.g. board fab/assembly/debugging. I'd love to get my hands on one of these myself, so it kind of sucks if us US guys are the minority, but it goes both ways too.... But basically to reduce costs we are forced to centralize things - we can't not buy all the boards at once without it costing much more $$$, same goes for the part orders. Assembly may not be as bad as li_gangyi seems to have that covered for now. But again, if he's doing the assembly, and correct me if I'm wrong, he's in Singapore, that may be part of the decision right there. I could do any soldering/assembly myself, except for the reflow/bga stuff.

I don't mind putting up some money for prototyping, but without getting any hardware myself I'd be much less likely to put money up ( I imagine I'd only want to part with USD$100-$200 if it meant I wouldn't see anything for a while - plus I'd love it if that meant I got some sort of discount later or guaranteed first production run etc... ). If I knew I was going to get hardware for this, I may be more willing to spend around USD$300-$500 for prototyping.

Here's a thought - how hard are LX150's to re-sell. Maybe we should try to acquire enough funds to get the first price break on a batch of 10 of those? It sounds like we're going to need at least 1-2 prototypes, but it's sounding almost more like we might make 5-6 and ship them around to that various devs. Dunno how much of a price savings it is, but if everyone is very convinced that's the FPGA we're going to use, then maybe this might be a good cost savings investment now?

edit: seems like @ digikey there is no bulk discounting at all for these chips

On another note to the FPGA devs out here - what FPGA code are we currently going with? I havn't kept up with the Open Source FPGA Miner project in a bit, have any major improvements be made? I was thinking about setting up some simulation test benches in Xilinx and trying to further optimize what they've done. IIRC the design they have made IMO doesn't perfectly match up with the Xilinx architecture - and that's likely why people have been unable to fully maximize/optimize a hashing core for this chip. I think Xilinx does much better with faster clocks and smaller logic - but I'm by far no guru or expert, this is just my gut feeling - we'd really have to tear deep down into timing analysis and RTL diagrams to really optimize well.

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

@li_gangyi:

I m not sure of the timeframe. It's mostly dependent on how fast we get a final testboard layout so i would like to return this question also to you and Olaf.Mandel Wink

I hope we may finalise the FPGA section and the implementation of all parts next week but i m not the one to guarantee that.

The 65$ are for the boards alone without any fingers and without any population.(little numbers make them costly)

I hope we may still take advantage of you offer to populate the boards at you place.
In that case we should also define how to get all the parts to you and pay them.

In my calculation we would roughly get 70$ ( board and shipping) + 160$ ( Spartan 6 Lx150 FGG484) + 100$(power supply) + 50$(MSP430,Connectors,small parts) = 380$ or 270 €
These numbers may cetainly varie depending on parts sources , so please correct them according to your estimations.

I will fund at least one of these boards and maybe a second one when months salary gets in. But i hope also somebody else will participate here.

magik

newbie

Activity: 44

Merit: 0

OK, so took me a day and a half, but I finally went through and read all 23 pages of this thread. Kind of glossed over it, so I'm not 100% on what all the decisions are made, but I have a much better feel of where you guys are at in terms of the design and progress.

Some things i'd like to add to the discussion for now:

In terms of Xilinx licenses - as someone said above - Xilinx doesn't care too much where you get the software or license - as in the end you will require a Xilinx FPGA to program and that's really where they make their money. If anyone cares I can show you where to obtain a less than reputable ( read: pirate ) ISE license for ISE 13 with everything unlocked ( including the LX150/T design targets )

The unconnected I/O are unimportant for this design. You can tie them all to ground, you can tie them all to Vcc, you can leave them floating. It doesn't matter - personally I'd say just leave them floating. This design is not going to be requiring non-noisy I/O lines as we aren't going to be doing anything like a high speed bus on the I/O. At my workplace we build industrial monitoring equipment and on a lot of our boards we just leave unconnected I/O hanging and set the ISE to either use a weak pull down or leave the I/O floating. It's really not much of an issue unless you are requiring very high speed accurate I/O ( think 100-200MHz clock ranges ). These settings can be found under right clicking on Generate Programming File -> Process Properties -> Configuration Options -> Unused IOB pins ( Pull Down, Pull Up, Floating )

Also, again I'm a bit unclear on how the JTAG is going to be set up. But I would expect a JTAG header on every DIMM, or at least one JTAG somewhere with all FPGAs chained. I would also probably expect a couple of LEDs. You probably are going to want to put one between VCC and GND to show you that the baord has power. You will probably also want an LED or two as debug outputs. In my mind, I would also love to have a couple other test points as just unused I/O pins routed out to a TP - these are also very useful for debugging. On the other hand, this design shouldn't require too much debugging as it's pretty dead simple - but in terms of future application/expansion, it may be helpful for debugging new features. Also, typically there will be an LED or two somehow tied to the tx/rx lines of the USB bus so you can tell when communications is occurring. I would love to be able to route all the unused I/O to pin headers or test points, but I agree with what has been said above, and the cost/complexity to do it is just not worth it. Sure you'd be able to use the board as a spartan6 dev board, but I don't think that's the goal of this project. So to keep PCB complexity down I agree with you guys - just route exactly what's needed, and then possibly add a handful more I/Os for debugging/indicators and then a few more that are brought out for future expansion - either to DIMM pins or test points ( think extra comm protocols etc... ). If not all DIMM pins are going to be routed ( not taken for power or I/O ) - I would also prefer to have a TP for each of these unused DIMM pins - that way we could deadbug new features or bug fixes this way. If we leave just enough room for error that we can hack something on for this prototype it will save us a lot of time/effort later because it'll likely help us skip re-spinning. I agree with what was said above that this is a prototype - it should have a slight excess of what's necessary to help us debug/fix any potential problems/errors we may make before the first spin.

I'm unsure about what internal clock rates any said miner design will be able to obtain inside the FPGA. But I would guess it's going to be around 25-50 MHz. I would probably advise against using the same 25 MHz crystal for the MCP and the FPGAs just because to get 25MHz to 100 MHz using a PLL in the FPGA will require using the CLK_FX output to multiply the input clock and this is generally a noisier clock solution. It's definitely doable, but maybe not the best implementation. I don't have any experience with Spartan6 devices either ( we use a lot of Spartan3s ), so I'm unsure if it's possible to get useful computation done in this chip at higher clock rates. One thing I definitely do know is you will want to route your clock input into one of the global clock pins - basically there are certain pins in each bank that route closer/more directly to the BUFGMUXs that control the quadrant clock lines. These clock lines are the best lines to use for distribution clocks throughout the FPGA as they are built for this function and provide the least amount of clock skew along these lines. You can definitely still take a clock in on any I/O pad and route it to one of these BUFGMUXs - it's just sometimes that trace path ( between non-clock I/O and BUFGMUX ) is not the most optimal path. There are a bunch of different pins you can use, but I would stray away from the quadrant/side locked clock inputs and just use one of the global clock inputs ( there should be at least 4 ). The quadrant/side clocks are useful if you partition your FPGA device into regions based on clock domains - then you can free up global clock resources and only use clocks in one quadrant or one side if needed - but this isn't necessary for our design - I envision one main clock for the hashing engine, and potentially one other clock for communications. The hashing engine clock is the only one I'm worried about - the comm clock could be derived internally off a counter as it's not fast speed or touches a lot of resources. If you'd like to read more up on it -ug382.pdf describes clocking mechanisms for the Spartan6 family.

TL;DR So when you are looking to route the clock to the FPGA - make sure you connect to a GCLK I/O pin.

li_gangyi

full member

Activity: 157

Merit: 100

How soon are we proposing to get these boards out? I need to clear some storage space for the parts and boards.

$65 is with or without the cost of populating these 5 boards?

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

Regarding the PCB specs i would propose the following:

Pads: 0.5mm just for the safe side regarding the BGA

Drill: 0.3mm Should provide the nessecary current safetyand needed for the 1:5 aspect ratio to get through the 1.2 mm board.

Annulus: 0.1mm to accomplish the 0.5 mm Diameter and clearance

Trace with: 0.15mm

Cooper thickness: 35um should be suffiecient and is also for testing purpose

I also agree with the layer stackup.

I did a brief calculation at pcbcart for these specs. For a 150mmx75mmx1.2mm
I got a 65$ qoute for each board if we order 5 boards wich should be more than enough for the testing.

We should also define how may boards we will order inthe first run and who is best fit to debug them.

pusle

member

Activity: 89

Merit: 10

Routing:
I would think a T junction when the line has reached the small board splitting to the two fpga's is no problem.
Just put a series resistor at the driver end (MCP).
For several sockets on a motherboard either use individual traces or resistors at the junction points feeding them to prevent ringing/standing waves.
If you want simplicity and the speed is just a few MHz you can just slow the edge rise time with a larger single resistor value at the driver side.

Routing straight out from one side of the fpga was in reference to the I/O's needed (spi etc). I assumed routed to a pcb edge connector? dimm?
All the I/O's you need are there. Rows 1 and 2 you can route without via's, just traces on the top layer.
Rows 3 and 4 with using vias to the bottom layer. Then you only have to route power, clock and jtag under/across the fpga area.

PCB spec, my suggestions:

trace width/clearance: 0.15mm I don't think there are any factories not capable of 0.15mm today and even home made pcb's with decent equipment can manage it.

Copper thickness, inner/outer layers: 35um This seems like industry standard and it's enough for this board. Sure the thicker the better but if you want more you usually have to pay some extra.

Hole diameter: 0.20mm or 0.30mm Please check with the pcb company about the number of layers you can have and board thickness for each diameter.

Annular ring: as big as possible. Without violating the clearance between the fpga balls.
Even if you have to pay the 0.1mm price, you should make it larger so it's easier to manufacture and more factories can manage it.

0.15mm width/clearance is easy for all PCB manufactures I know of.
0.1mm annular ring and small drill sizes like 0.1 and 0.2mm through thick boards is much harder. or rather this is where there is a larger difference from factory to factory.

Using 0.4mm pads for the fpga should not be a problem. But as I stated previously with our pcb manufacturing clearning house we settled on 0.5mm. 0.45mm is also an option Cheesy

And if you read about soldermask defined pads for fpga just ignore it. Always use cobber defined and pull the mask away as to not get any on the pad itself.
But not so much as to expose the via/traces between the pads and increase the risk of solder shorts. Check the solder mask precision they specify.

Olaf.Mandel

member

Activity: 70

Merit: 10

Before I reroute the FPGA again to incorporate the newest suggestions, I would like to get a final consensus on design rules like trace width, clearance, dril size, etc. This would also help li_gangyi and O_Shovah, I am sure.

So first, as we basically decided to go with pcbcart as our board supplier, here are their specs as seen in the ordering form. Stated are the selectable minimum sizes: larger is cheaper, but the price steps are surprisingly small (maybe not so small for the drills: didn't check).

Specification	Possible minimum values
Copper thickess, outer layers	35um, 70um, 105um, 140um
Copper thickess, inner layers	18um, 35um, 70um, 105um
Trace width / Clearance	0.10mm, 0.15mm, 0.20mm
Annular ring	0.10mm, 0.25mm
Hole diameter	0.10mm, 0.20mm, 0.30mm, 0.40mm

Starting from the FPGA BGA (in the FGG housing), as it is the smallest as far as I see so should drive the specs, we have first to look at the pad diameter: the datasheet specifies 0.4mm as the minimum diameter, but both the default library element and pusles recommendation are for 0.5mm.

The following table states different combinations that allow placing a via in the middle of four pads for the smaller pad size. The "Via" column is checked if this also works for the larger pad size. The "Routing" column is checked, if it is also possible to route a trace between two pads in the top layer for the larger pad size (this is always possible for the smaller pad size). The table is sorted in descending order by drill size, annular ring and trace width / clearance.

Drill size	Annulus	Trace width	Via	Routing
0.40	0.10	0.20	-	-
0.40	0.10	0.15	x	x
0.40	0.10	0.10	-	-
0.30	0.25	0.10	-	x
0.30	0.10	0.20	x	-
0.30	0.10	0.15	x	x
0.30	0.10	0.10	x	x
0.20	0.25	0.15	-	x
0.20	0.25	0.10	x	x
0.20	0.10	0.20	x	-
0.20	0.10	0.15	x	x
0.20	0.10	0.10	x	x
0.10	0.25	0.20	-	-
0.10	0.25	0.15	x	x
0.10	0.25	0.10	x	x
0.10	0.10	0.20	x	-
0.10	0.10	0.15	x	x
0.10	0.10	0.10	x	x

Any of the combinations given above works if we decide on 0.4mm pads. If we want to be safe and use 0.5mm pads, we need to have at least the Via column checked, better to have even the Routing column checked. Please give me your opinion on pad size and which design rules to use.

As for layer setup and other things, I think we now are in agreement to use the following setup:

Layer 1: Component and routing layer. Polygons for large components with large currents
Layer 2: Supply layer GND: no routing possible (prevented by Eagle)
Layer 15: Routing layer, dominated by power polygons. VCCINT[01] for the FPGA and PSU, ?? for the MCU
Layer 16: (Small) Component and routing layer. Polygons for areas with many connections (VCCIO for FPGAS, ?? for PSU and MCU)

The placement grid for components should be 50mil or 25mil. I am violating this with the caps on the backside of the FPGAs, at the moment.

Once we have agreement on this, it should be much easier to later merge the individual parts together.

phillipsjk

legendary

Activity: 1008

Merit: 1001

Let the chips fall where they may.

Quote from: O_Shovah on July 22, 2011, 04:29:50 PM

I found a recent development in hardware licencing.The "CERN OHL" so the Center European de Research Nuclear Open Hardware License http://www.ohwr.org/projects/ohr-meta/wiki/CERNOHL.

Section 3 reminds me of the infamous BSD advertising clause that was eventually removed when it became too unwieldy.

That license looks like it has potential, but I am not totally comfortable with it. Not that it matters since I have not really contributed to the hardware design yet. I like how it explicitly says the firmware is under a different license.

Topic: Modular FPGA Miner Hardware Design Development - page 10. (Read 119327 times)