Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 46.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: nelisky on February 14, 2012, 09:47:30 AM

Any idea what's the smallest (cheapest?) device one could fit a single ring?

Unfortunately all of the smaller devices are "narrower" than the LX150, which really messes with the design. So, at the moment, it's LX150 or nothing.

Dexter770221

legendary

Activity: 1029

Merit: 1000

Earlier he mention something about 160MHz. If he menaged to sustain that value then 160*1.5=240 MH/s. New king of LX150:) I remember times (8 months ago) when many said that its impossible to get close to 200MH/s... Never say never

Inspector 2211

sr. member

Activity: 448

Merit: 250

Quote from: nelisky on February 14, 2012, 09:47:30 AM

So you got 3 rings going on in one LX150, right?

Any idea what's the smallest (cheapest?) device one could fit a single ring? And 2? Maybe with your approach we can get a better bang for the buck somewhere else, or at least easier sourcing of FPGAs

Stefan of ZTEX fits one ring (65 rounds) into a Spartan6-75.
If eldentyrell fits 3 rings into a Spartan6-150, it's not inconceivable that one could fit two rings into a Spartan6-100.

I think the main problem of implementing only one ring is, that the result coming out of the SHA-256 operation has to be fed back into the other side,
and long interconnects on FPGAs are notoriously slow. Thus, your clock rate suffers, which is probably what eldentyrell is experiencing.

Still, Stefan achieves about 180 or 184 MHz on the Spartan6-75 http://www.ztex.de/btcminer/ and I'm dying to learn what clock rate eldentyrell is getting.

nelisky

legendary

Activity: 1540

Merit: 1002

So you got 3 rings going on in one LX150, right?

Any idea what's the smallest (cheapest?) device one could fit a single ring? And 2? Maybe with your approach we can get a better bang for the buck somewhere else, or at least easier sourcing of FPGAs

nelisky

legendary

Activity: 1540

Merit: 1002

watching

Dexter770221

legendary

Activity: 1029

Merit: 1000

Incredible amount of work. Congrats. We are waiting for some numbers and improvements in speed. Propably your solution will be fastest.

ZedZedNova

sr. member

Activity: 475

Merit: 265

Ooh La La, C'est Zoom!

Nice work!

- Zed

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: eldentyrell on February 13, 2012, 04:23:01 PM

...
This design is running and gets 1.5 hashes every clock cycle. All that's left is to improve the clock rate. Unfortunately the rest of the FPGA designers have a pretty big head start on me there... they've been in performance-optimization mode for 4-5 months now.

Although it probably isn't what you are referring to ... if you do mean normal sha256() code optimisation, it's all quite well known (and I can even provide that info if you need)
If on the other hand you mean optimisation that makes the fpga go faster due to it's inherent hardware (which is probably what you do mean) then ignore my comment

Quote from: ursa on February 13, 2012, 05:59:30 PM

eldentyrell,
What's the final MH/s ?

1.5 x the clock

ursa

member

Activity: 72

Merit: 10

eldentyrell,
What's the final MH/s ?

rjk

sr. member

Activity: 448

Merit: 250

1ngldh

Saved to pron folder.

ursa

member

Activity: 72

Merit: 10

Nice job eldentyrell.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Sorry about going AWOL there... first Real Life got in the way, then a huge pile of 2004-era Xilinx FPGAs fell on my head practically for free, and getting those from 0mhash/sec to something nonzero was a better return-on-time-spent.

Anyways, the layout is finished, all three rings:

Orange stuff is "overhead" shared between the three rings; I let the tools place that wherever they like since none of it is performance-critical.

This design is running and gets 1.5 hashes every clock cycle. All that's left is to improve the clock rate. Unfortunately the rest of the FPGA designers have a pretty big head start on me there... they've been in performance-optimization mode for 4-5 months now.

interlagos

hero member

Activity: 496

Merit: 500

Awesome thread!
It's like programming FPGA in its own assembler language.
It might inspire a lot of people to start looking into these little things me included.

BkkCoins

hero member

Activity: 784

Merit: 1009

firstbits:1MinerQ

My understanding is that those charts are like maps of the chip resources being used up. The optimization is one of reducing signal paths (propagation delays) thru the huge myriad of gates being used. This propagation delay is what causes one cycle to be limited, and hence the maximum frequency fixed (the next cycle can't start until the current one has propagated thru all functions). I may be wrong here but I'm just applying my limited knowledge of digital design.

I don't know if there are any improvements to be made in the actual logic to do the hashing.

Anyway, for my part I'm still stuck on getting ISE to map/par the design. The mapping still always fails and the only indicator is that 118% of LUT resources are used. I saw the suggestion above about trying SmartExplorer so I'll next try to figure that out. The Ztex code is open source but so far has proven useless to me as it cannot build. It also spits out 58000 warnings... apparently connections being dropped as unneeded.

If nothing else, then I'll re-design my small board to move my multiplex logic into a cheap CPLD on each board, and then have it interface to the pre-made Ztex core. That way I don't have to figure out how to re-build it with my changes. That would add a cost of $1 to fpga board but it's still a very bare bones approach with only about $10 (+heatsink) on top of the SLX150. I have designed the board, and simulated the multiplex logic to handle clusters of fpgas.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Hmm pity I have actually no idea about what actually these graphs are in detail for FPGA processing

I wrote a program in C that generated the completely unrolled code in C to do the double sha256 from a very simple text file defining the whole process.
It actually ended up with all of the standard GPU optimisations (not by initial design - by result of how I wrote it)

The code worked out the pre calculations and also simplified all possible code to constants or repeated calculations over a nonce range
(there are a LOT of zeroes in the inputs ...)
The result when I compared it to non-assembly code (other C code) was about a 20% speed improvement since none of the C code I've seen for doing bitcoin hashing is very well optimised (I guess since CPU mining is pointless and assembly coding gives a notable performance increase)

I'm wondering if the generation of these graphs could be done in a similar manner and if that would help in any way?
The point of human optimisation being better than tools that exist, relates directly to this since in my case I worked on the code by looking at the output of each version to determine the changes that would be needed in the next - i.e. a cycle of improvement based on previous results, not trying to use some tool that someone else had written which probably has no relevance to the issues you come across to solve.

O_Shovah

sr. member

Activity: 410

Merit: 252

Watercooling the world of mining

Any news on the plot front ?

Anything i may help to get this further ?

tinman951

full member

Activity: 149

Merit: 100

A side comment, but earlier in the posts it was mentioned that someone was booting off of a SD card. I was wondering how that would work, or even if it could be used like a cheap ssd.

BkkCoins

hero member

Activity: 784

Merit: 1009

firstbits:1MinerQ

I'll have to look into that. All I've done so far is plunk the source v files into a project and hit run. Today I spent most of my time working on control logic code to interface between fpgas. Ha, that builds ok and takes 22 flip-flops (it says), less than 1%. Probably less than .1%. I just wanted to try my idea out and maybe simulate something to see if it works. I guess I have to make some test harness to do that now.

li_gangyi

full member

Activity: 157

Merit: 100

You'd probably need to run smartxplorer to try and automate the process. It is running fairly close to the max as it stands.

BkkCoins

hero member

Activity: 784

Merit: 1009

firstbits:1MinerQ

Quote from: Dexter770221 on January 05, 2012, 02:13:24 PM

Are you using licensed ISE? WebPack ISE only supports up to SLX75.

Have full eval version. Definitely supports SLX150.

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 46. (Read 119468 times)