Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 49. (Read 119440 times)

hero member
Activity: 504
Merit: 500
I tried to find the AOZ1025 but they seem to be hard to get. Can't find any stock.
Edit: Found it at Arrow... 3000 qty. only.
(Actually I found a couple good alternates from IR and Fairchild. Higher efficiency (90% under load) and pretty cheap. FAN2108, IR3871. Looking at them now as Digikey has.

What happens when you put two AOZ1021 / AOZ1037 in parallel? I thought that would work but then DC-DC regs are new to me. I've only used linear parts before.No longer the plan.

I'll likely drop the 3.3V anyway as an ATX PSU has regulated 3.3 already. Then just drop 12V to 1.2V as that seems to be more efficient and most PSU have more watts available on 12V. So a 20/24 pin adapter with non-standard onboard connector so users don't accidentally plug in a Molex and blow it.



 I have 260q of the AOZ1025DIL in stock if you do want to use them. Will sell q1 for $1 pu
member
Activity: 72
Merit: 10
I have, however, looked at a couple of SASIC platforms that I could port to fairly easily.  If an investor were to fall out of the sky, I know which one to go with.
How much money are we talking here?

Hi,
I would subscribe also....
donator
Activity: 308
Merit: 250
I have, however, looked at a couple of SASIC platforms that I could port to fairly easily.  If an investor were to fall out of the sky, I know which one to go with.
How much money are we talking here?
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Have you considered trying to lay this out in an Altera Cyclone IV?
I have, however, looked at a couple of SASIC platforms that I could port to fairly easily.  If an investor were to fall out of the sky, I know which one to go with.

Cheesy I certainly may not fund a complete AASIC platform i guess but i wouldn't mind helping.May you please reveal what you have in mind here ?
I still study FPGA designs and therefor am highly interested in you doing Smiley
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
I tried to find the AOZ1025 but they seem to be hard to get. Can't find any stock.
Edit: Found it at Arrow... 3000 qty. only.
(Actually I found a couple good alternates from IR and Fairchild. Higher efficiency (90% under load) and pretty cheap. FAN2108, IR3871. Looking at them now as Digikey has.

What happens when you put two AOZ1021 / AOZ1037 in parallel? I thought that would work but then DC-DC regs are new to me. I've only used linear parts before.No longer the plan.

I'll likely drop the 3.3V anyway as an ATX PSU has regulated 3.3 already. Then just drop 12V to 1.2V as that seems to be more efficient and most PSU have more watts available on 12V. So a 20/24 pin adapter with non-standard onboard connector so users don't accidentally plug in a Molex and blow it.

full member
Activity: 180
Merit: 100
I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.
Thank you. I'll do that. Hard to get here so I may work out another oven alternative.

Edit: I have an SMD rework station that I haven't used in 3 years but I'm not sure hot air from above would be good enough to do it and may stress the chip too much. Have you tried something like that?

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.
I'll be doing the on-board regulator method as running 1.2V isn't feasible. I want these to connect together in an array that can grow to 64 units or more. I'm thinking about 2x 5A reg. (like AOZ1037 at $1.41 instead of the more typical high amp reg seen on other boards so far. (If I use the same one for 3.3V I can buy 3x qty to save cost.) I'm not too happy with the 80% efficiency or less though.

You'd be much better off using the AOZ1021 / AOZ1025 combination that the ztex boards use - if you're set on going the on-board regulator route.  SMPS in parallel is way more of a PITA than most people assume...

Enigma

P.S. Distributing a communications chain (USB, Serial, JTAG) is also a major consideration.  A JTAG chain is simple for 2 or 3 devices, but 64..  Another PITA - especially TCK.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.
Thank you. I'll do that. Hard to get here so I may work out another oven alternative.

Edit: I have an SMD rework station that I haven't used in 3 years but I'm not sure hot air from above would be good enough to do it and may stress the chip too much. Have you tried something like that?

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.
I'll be doing the on-board regulator method as running 1.2V isn't feasible. I want these to connect together in an array that can grow to 64 units or more. I'm thinking about 2x 5A reg. (like AOZ1037 at $1.41 instead of the more typical high amp reg seen on other boards so far. (If I use the same one for 3.3V I can buy 3x qty to save cost.) I'm not too happy with the 80% efficiency or less though.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Would what you have now fit into an 'SLX75 ?

Nope.  The LX75 looks like the "left half" of an LX150.  Since the carry chains run vertically you're forced to orient your words that way, and therefore the LX75 can only fit half as many stages before you have to do a corner-turn.  The headache I'm dealing with now would be multiplied by three.

I prototyped my PCBs using LX45s since they're the smallest chip in the FG484 package, but never ported the design to that chip.

I'm just working on a board design that I'm going to home solder (griddle) and the risk in using a 'SLX75 is lower.

I strongly urge you to use a toaster oven instead of a griddle.  I started out using a griddle but the yield was awful.  It will take you longer to get the toaster oven working right, but once it's dialed in your consistency will be really good.

My goal is to make lego-block miners that have as little overhead cost as possible. So in crazy fashion I'm doing a 2 Layer board with only FPGA +Pwr.

Distributing power to all of those boards is the hard part.  Putting a DC-DC regulator on every board is expensive.  Distributing high-current 1.2V across the boards requires expensive (beefy) connectors between the boards.  Make sure you think this through.

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Have you considered trying to lay this out in an Altera Cyclone IV?

Yes, I have considered it.  Spartans won out for two reasons:

- A quarter of the LUTs on a Spartan can be turned into rather large shift registers (SRL16), and I use these a lot.

- Much higher register density

Each Spartan SLICE has eight registers, but half of them are very difficult to use (and often ignored by the automatic synthesis tools).  One of the reasons my design is so compact is that I made sure to use those registers.  I have 91% register utilization within the occupied slices of my design (obviously lots of slices are unoccupied so the overall utilization is much lower).

So, a lot of designs port nicely from Spartan to Altera because they were wasting half the registers to begin with.  I'm not, so I would pay a steep penalty.

I have, however, looked at a couple of SASIC platforms that I could port to fairly easily.  If an investor were to fall out of the sky, I know which one to go with.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
Would what you have now fit into an 'SLX75 ? I'm curious because for playing around they are cheaper to experiment with.

I'm just working on a board design that I'm going to home solder (griddle) and the risk in using a 'SLX75 is lower. My goal is to make lego-block miners that have as little overhead cost as possible. So in crazy fashion I'm doing a 2 Layer board with only FPGA +Pwr. They will connect together like scrabble tiles and communicate via each other to only one master controller (possibly RaspberryPi since it has Linux and network on board).

When I get further I'll make a thread and explain my idea more fully. I'd like to do testing with a 'SLX75 or even 'SLX25 as the loss would be lower if I screw up. They both come in the same FBGA484 pkg. I've done FPGA design before but I'm still very non-expert at understanding how the hashing rounds work etc.

staff
Activity: 4284
Merit: 8808
Very frustrating.  I know where the wires should go, but I've spent countless hours trying to "trick" Xilinx's tools into doing what I already know how to do.

The very, very, very last resort is to write my own router by scripting fpga_edline.  I know that sounds desperate, but that's what it might come down to.

On the plus side, if you go that route the result should be more stable— small changes won't cause the darn thing to fail or to achieve drastically worse timing in unrelated areas.
full member
Activity: 180
Merit: 100
Have you considered trying to lay this out in an Altera Cyclone IV?  I'm a Xilinx guy from wayyyy back, and I almost always choose a Xilinx part for my projects.. BUT - I have to admit that Altera has some nice things in the Cyclone IV.

Originally, the FPGAMiner code was based upon the Cyclone IV.  It was pretty much abandoned once the (cheaper) spartan 6 was able to outpace what it could do.  Development on that chip stopped - most likely because of the cost premium - but i'm not entirely sure that the Cyclone couldn't have reached more MH/$ if development had continued.  It has less exciting LUTS than the 6 input Spartan 6 type, but it has better routing resources.

Enigma
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I for one welcome our new S6-LX150 300MH/s overlord.

Ah, not so fast yet.

As expected, moving up the frequency ladder turns into a game of whack-a-mole... fix one thing, something else becomes the critical path.  I'm back to fighting the corner turn.

What was not expected was how hard it would be to get control over the routing from Xilinx's tools.  I can get them to route the corner turn by itself, and I can get everything-but-the-corner turn to route, and I can show that the routing resources used are disjoint, but I can't get them both to route at once!

The sad reality is that Xilinx really does not provide any mechanism at all that says to the router "you absolutely must route this wire along this path". There are placer directives that can force placement, but even the "DIRT strings" used to try to force routing can be ignored by PAR under some circumstances, and I'm hitting them.  Ditto for SmartGuide.

Very frustrating.  I know where the wires should go, but I've spent countless hours trying to "trick" Xilinx's tools into doing what I already know how to do.

The very, very, very last resort is to write my own router by scripting fpga_edline.  I know that sounds desperate, but that's what it might come down to.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
So how is etherything going ?

Have you had the time to search for some new paths ?
staff
Activity: 4284
Merit: 8808
I see your user icon has changed.

I for one welcome our new S6-LX150 300MH/s overlord.
staff
Activity: 4284
Merit: 8808
The corner turn on the right hand side wasn't all that difficult.  Unfortunately there's a lot of irregular stuff around the first and last stage (in red), and algorithmically placing that stuff is not feasible.  So instead I've arranged for the top row to gradually "jog" upward which leaves a triangluar "hole" near the first and last stage, and I let Xilinx's tools autoplace the random crap in that hole.

Makes sense! ... it sure looks like there is actually room to fit another fully unrolled unit, assuming it was wired the other way so that the jog ended up oppose lower jog.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).
On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.

Your post is worthless without chip-plot-porn. Gimme gimme.

Ok, fine.



The corner turn on the right hand side wasn't all that difficult.  Unfortunately there's a lot of irregular stuff around the first and last stage (in red), and algorithmically placing that stuff is not feasible.  So instead I've arranged for the top row to gradually "jog" upward which leaves a triangluar "hole" near the first and last stage, and I let Xilinx's tools autoplace the random crap in that hole.  This is what lets me get close to 150mhz.  Right now the hole is way bigger than it needs to be (in the plot a cell is purple even if it's nearly empty); once I get to my target clock speed I'll start shrinking the hole down to a more reasonable size.

The possibility of 225 MH/s on S6/LX150 sounds quite exciting. I assume the power consumption is fairly low compared to the designs running at higher clock rates?

Well, we'll see; no guarantees yet.  I have not taken any careful power measurements; last time I checked my 100mhz two-ring design pulled 5W per board measuring a cluster if 6 boards with a crude kill-a-watt at the wall, so this figure includes inefficiencies introduced by the ATX power supply.

I expect power consumption per-hash-per-second to be similar to any other design with one layer of registers per SHA256-stage, and of course much less than those with two layers of registers per stage.
staff
Activity: 4284
Merit: 8808
At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).
On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.

Your post is worthless without chip-plot-porn. Gimme gimme.

The possibility of 225 MH/s on S6/LX150 sounds quite exciting. I assume the power consumption is fairly low compared to the designs running at higher clock rates?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
How is your work going ?
Any new results yet ?

At 1-hash-per-clock (two rings) I am at 143Mhz (mining right now) and close to 150mhz (just one route fails timing).

On a lark I tried out the 1.5-hashes-per-clock (three rings) setup.  It works, but very slowly, mostly because I need to do some extra work to leave space between the rings, and I haven't even begun working on that yet.  Once I have the two-ring design where I want it to be I will backport all of those improvements to the three-ring design.

Still lots of work to be done, and lots on my plate.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Hi big chip,

How is your work going ?

Any new results yet ?

I really would like to see the fully routed design finished.
Man is stil superior to mashine afther all Wink
Pages:
Jump to: