Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 46. (Read 119440 times)

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Any idea what's the smallest (cheapest?) device one could fit a single ring?

Unfortunately all of the smaller devices are "narrower" than the LX150, which really messes with the design.  So, at the moment, it's LX150 or nothing.
legendary
Activity: 1029
Merit: 1000
Earlier he mention something about 160MHz. If he menaged to sustain that value then 160*1.5=240 MH/s. New king of LX150:) I remember times (8 months ago) when many said that its impossible to get close to 200MH/s... Never say never Smiley
sr. member
Activity: 448
Merit: 250
So you got 3 rings going on in one LX150, right?

Any idea what's the smallest (cheapest?) device one could fit a single ring? And 2? Maybe with your approach we can get a better bang for the buck somewhere else, or at least easier sourcing of FPGAs Smiley

Stefan of ZTEX fits one ring (65 rounds) into a Spartan6-75.
If eldentyrell fits 3 rings into a Spartan6-150, it's not inconceivable that one could fit two rings into a Spartan6-100.

I think the main problem of implementing only one ring is, that the result coming out of the SHA-256 operation has to be fed back into the other side,
and long interconnects on FPGAs are notoriously slow. Thus, your clock rate suffers, which is probably what eldentyrell is experiencing.

Still, Stefan achieves about 180 or 184 MHz on the Spartan6-75 http://www.ztex.de/btcminer/ and I'm dying to learn what clock rate eldentyrell is getting.
legendary
Activity: 1540
Merit: 1002
So you got 3 rings going on in one LX150, right?

Any idea what's the smallest (cheapest?) device one could fit a single ring? And 2? Maybe with your approach we can get a better bang for the buck somewhere else, or at least easier sourcing of FPGAs Smiley
legendary
Activity: 1540
Merit: 1002
legendary
Activity: 1029
Merit: 1000
Incredible amount of work. Congrats. We are waiting for some numbers and improvements in speed. Propably your solution will be fastest.
sr. member
Activity: 475
Merit: 265
Ooh La La, C'est Zoom!
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
...
This design is running and gets 1.5 hashes every clock cycle.  All that's left is to improve the clock rate.  Unfortunately the rest of the FPGA designers have a pretty big head start on me there... they've been in performance-optimization mode for 4-5 months now.
Although it probably isn't what you are referring to ... if you do mean normal sha256() code optimisation, it's all quite well known (and I can even provide that info if you need)
If on the other hand you mean optimisation that makes the fpga go faster due to it's inherent hardware (which is probably what you do mean) then ignore my comment Smiley

eldentyrell,
What's the final MH/s ?
1.5 x the clock Smiley
member
Activity: 72
Merit: 10
eldentyrell,
What's the final MH/s ?
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
Saved to pron folder.
member
Activity: 72
Merit: 10
Nice job eldentyrell.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Sorry about going AWOL there... first Real Life got in the way, then a huge pile of 2004-era Xilinx FPGAs fell on my head practically for free, and getting those from 0mhash/sec to something nonzero was a better return-on-time-spent.

Anyways, the layout is finished, all three rings:


Orange stuff is "overhead" shared between the three rings; I let the tools place that wherever they like since none of it is performance-critical.

This design is running and gets 1.5 hashes every clock cycle.  All that's left is to improve the clock rate.  Unfortunately the rest of the FPGA designers have a pretty big head start on me there... they've been in performance-optimization mode for 4-5 months now.
hero member
Activity: 496
Merit: 500
Awesome thread!
It's like programming FPGA in its own assembler language.
It might inspire a lot of people to start looking into these little things me included.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
My understanding is that those charts are like maps of the chip resources being used up. The optimization is one of reducing signal paths (propagation delays) thru the huge myriad of gates being used. This propagation delay is what causes one cycle to be limited, and hence the maximum frequency fixed (the next cycle can't start until the current one has propagated thru all functions). I may be wrong here but I'm just applying my limited knowledge of digital design.

I don't know if there are any improvements to be made in the actual logic to do the hashing.

Anyway, for my part I'm still stuck on getting ISE to map/par the design. The mapping still always fails and the only indicator is that 118% of LUT resources are used. I saw the suggestion above about trying SmartExplorer so I'll next try to figure that out. The Ztex code is open source but so far has proven useless to me as it cannot build. It also spits out 58000 warnings... apparently connections being dropped as unneeded.

If nothing else, then I'll re-design my small board to move my multiplex logic into a cheap CPLD on each board, and then have it interface to the pre-made Ztex core. That way I don't have to figure out how to re-build it with my changes. That would add a cost of $1 to fpga board but it's still a very bare bones approach with only about $10 (+heatsink) on top of the SLX150. I have designed the board, and simulated the multiplex logic to handle clusters of fpgas.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Hmm pity I have actually no idea about what actually these graphs are in detail for FPGA processing Smiley

I wrote a program in C that generated the completely unrolled code in C to do the double sha256 from a very simple text file defining the whole process.
It actually ended up with all of the standard GPU optimisations (not by initial design - by result of how I wrote it)

The code worked out the pre calculations and also simplified all possible code to constants or repeated calculations over a nonce range
(there are a LOT of zeroes in the inputs ...)
The result when I compared it to non-assembly code (other C code) was about a 20% speed improvement since none of the C code I've seen for doing bitcoin hashing is very well optimised (I guess since CPU mining is pointless and assembly coding gives a notable performance increase)

I'm wondering if the generation of these graphs could be done in a similar manner and if that would help in any way?
The point of human optimisation being better than tools that exist, relates directly to this since in my case I worked on the code by looking at the output of each version to determine the changes that would be needed in the next - i.e. a cycle of improvement based on previous results, not trying to use some tool that someone else had written which probably has no relevance to the issues you come across to solve.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Any news on the plot front ?

Anything i may help to get this further ?
full member
Activity: 149
Merit: 100
A side comment, but earlier in the posts it was mentioned that someone was booting off of a SD card.  I was wondering how that would work, or even if it could be used like a cheap ssd.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
I'll have to look into that. All I've done so far is plunk the source v files into a project and hit run. Today I spent most of my time working on control logic code to interface between fpgas. Ha, that builds ok and takes 22 flip-flops (it says), less than 1%. Probably less than .1%. I just wanted to try my idea out and maybe simulate something to see if it works. I guess I have to make some test harness to do that now.
full member
Activity: 157
Merit: 100
You'd probably need to run smartxplorer to try and automate the process. It is running fairly close to the max as it stands.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
Are you using licensed ISE? WebPack ISE only supports up to SLX75.
Have full eval version. Definitely supports SLX150.
Pages:
Jump to: