Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 45. (Read 119440 times)

hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Yery interesting results... I'd like to see a bit more information though:
  • Where is the critical path, and how much could that be optimized? (Can you give a best-case estimate of the physical limits of achievable hashrate?)
  • How many pipeline stages does this design have, per core? Are the sha256 rounds doubly registered?
  • This looks pretty much crammed into the FPGA Smiley
    If you provide this as a hardmacro, is there even sufficient room to easily add a PC interface to it?
  • As the developer of MPBM, and being someone who has done at least a little VHDL design and implemented a miner core, I do understand very well what order of magnitude of effort this is. Especially with this all-broken Xilinx toolchain. However, a simple miner software can be written in basically no time (and that's how MPBM started months ago). But if you design something for flexibility like the new MPBM generation or cgminer, it'll take at least 10 times as long. May I ask how much time you have realistically spent on implementing and optimizing this FPGA design and the neccessary tools to generate it?
  • Assuming the bitcoin FPGA community (and possibly some board vendors) would want you to optimize this design until you're hitting real roadblocks (300MH/s maybe?), and release everything that's neccessary to regenerate and further improve it under an open source license, roughly how much money would we need?
sr. member
Activity: 448
Merit: 250
Potential bidders for the IP are altera, xilinx, possibly others (like terasic, etc.) and the BTC FPGA community. I know very little about the fpga market but

The topology makes use of a few Xilinx-specific features, so it would require effort to port that.  However, the geometry is very Xilinx-specific.  Porting to Altera is as much work as porting to a SASIC platform like eASIC.

I'd guess that big players (altera,xilinx) wouldn't see BTC mining as a big enough market

Correct.  This is still way below Xilinx's radar.

How do you convince anyone that what you have is legit? You'd have to let them see something under NDA? What if they say "no thanks" and go do it themselves based on what they saw.

When there is a need for me to convince people I will be happy to give live, in-person demos here in NorCal.  I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.

EldenTyrell, I'm here in the South Bay (with a home office in north-east San Jose and a business/mining office in Santa Clara next to Nvidia) and I have a ZTEX board and I can sell it to you for what I paid for it, or $50 less, or whatever we agree on.

In case you put your bitstream up on Kickstarter, I'll also make a low-to-mid 3-figure pledge for early access to a 240 MH/s or better bitstream. (Right now, it's running at 209 MH/s and I'm not really interested in paying for, say, 220 MH/s.)
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.
I'm not sure I understand this requirement. Are you somehow burning an irreversible encryption key into the chip first? Is there no way to undo that step?

Large Spartan chips like the 150 have a WRITE-ONLY nonvolatile register that can hold a bitstream decryption key.  There is (supposedly) no way to read the key back from the register; all you can do is hand the device an encrypted bitstream and let it use the key to decrypt+load.

The device also has a unique identity register (DNA).  Unfortunately it is utterly trivial to create a circuit that looks exactly like this unique identity register and then modify an unencrypted design to use that instead of the true DNA register.  So, chip-specific designs must be encrypted.
sr. member
Activity: 360
Merit: 250

The design is very easy to forward-port to the Xilinx 7-series parts; I just haven't had a reason to do that yet.  I've even backwards-ported it to older devices, but the effort/reward tradeoff there doesn't usually work out (it did this time only because I got the chips almost-for-free).  It's also possible to port it to most SASIC platforms, but my "are you serious about this" threshold for exploring that is really really high (and only with people based in the USA since there would be contracts involved).

Congratulations also from me for the great progress in your hard work.

Interesting that you think your design could be easy forward-ported to the new xilinx 28nm FPGA's. This surprise me a litter bit, because I always thought your design is so highly spartan 6 LX150 optimized/specific. How deep did you already look into the Artix architecture and didn't you have to do a lot of work just by newly 'filling up' the bigger chip, independently from the slightly other architecture?

I'm playing with the idea to build up a FPGA board with Artix FPGA's.
One of the fist ones which will come out will be the 352K version of the Artix, but it doesn't look like the first chips will be available <6-8 month :-(  
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.
I'm not sure I understand this requirement. Are you somehow burning an irreversible encryption key into the chip first? Is there no way to undo that step?
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
[sarcasm]just make sure you don't use free miners like cgminer where many many hundreds of hours have been spent without the requirement of payment[/sarcasm]

Duh.

I wrote my own miner from scratch; it has longpoll and multipool support.  Just ask Luke-Jr, who has graciously suffered through the pool side of the debugging process Smiley

I can tell you from first-hand experience that writing a miner requires about 1% of the effort I put into the HDL design.  That's not an exaggeration; I kept a (very coarse) log of how I spent my time and it really does work out to about 100:1.  I suspect ztex has had a similar experience.

I don't mean any disrespect to the authors of cgminer/mpbm/etc.  They've done a great thing for the bitcoin mining community.  But these things aren't even in the same league in terms of time commitment.

Edit: in my comments above, "miner" refers only to the part of the software that runs on the CPU (i.e. the part that gets work from the pool and sends back shares), not the actual hashing code.  I did not mean to imply that writing GPU firmware is easy or trivial.  But I don't think GPU firmware is relevant to this discussion (I don't need any!)
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Potential bidders for the IP are altera, xilinx, possibly others (like terasic, etc.) and the BTC FPGA community. I know very little about the fpga market but

The topology makes use of a few Xilinx-specific features, so it would require effort to port that.  However, the geometry is very Xilinx-specific.  Porting to Altera is as much work as porting to a SASIC platform like eASIC.

I'd guess that big players (altera,xilinx) wouldn't see BTC mining as a big enough market

Correct.  This is still way below Xilinx's radar.

How do you convince anyone that what you have is legit? You'd have to let them see something under NDA? What if they say "no thanks" and go do it themselves based on what they saw.

When there is a need for me to convince people I will be happy to give live, in-person demos here in NorCal.  I'll even let somebody bring their own board but I have to keep the board afterwards.  I'll probably need a ztex board at some point so when I do the demo we'll probably have somebody who doesn't know me bring a ztex board and I'll buy it from them as part of the demo.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Congratualtions eldentyrell.

I would gladly source your efforts with coins.

The concept is certainly outstanding.
I am courious for the outcome.
donator
Activity: 1419
Merit: 1015
eldentyrell, do you have a Bitcoin address for donations?
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
[sarcasm]just make sure you don't use free miners like cgminer where many many hundreds of hours have been spent without the requirement of payment[/sarcasm]
member
Activity: 70
Merit: 10
Potential bidders for the IP are altera, xilinx, possibly others (like terasic, etc.) and the BTC FPGA community. I know very little about the fpga market but I'd guess that big players (altera,xilinx) wouldn't see BTC mining as a big enough market but mid-size players that do fpga/asic IP might be.

How do you convince anyone that what you have is legit? You'd have to let them see something under NDA? What if they say "no thanks" and go do it themselves based on what they saw.

At what price will you be content for your investment?
sr. member
Activity: 407
Merit: 250
I think the idea of a community effort to raise the money (like Kickstartr) is really cool. If you had a "top contributors" list, I would hope the top three spots would taken by the FPGA miner makers (FPGA Mining, ngzhang, ztex). Hopefully there's enough motivation between the three of us to keep the burden off the community in general.

Once you get the clock speed up, name a price. Smiley

+1 to this.

If you can crowd fund this, I'd definitely contribute.

Can I reccomend doing it via something that accepts BTC contributions though? (GLBSE perhaps? Say you want 1000BTC, issue 10,000 shares at 0.1BTC each once they sell out you release it, if it doesn't sell out, you can just pay back the raised funds as a dividend, and all the people who contributed would get their share back, should work relatively well). You could likely talk to Nefario about it to confirm validity.

Thanks for all the hard work (which will benefit those of us who are invested in FPGAs a fair bit) lol.
sr. member
Activity: 252
Merit: 250
Inactive


Excellent work, Dr. Tyrell.
hero member
Activity: 784
Merit: 1009
firstbits:1MinerQ
If you put it on kickstartr (or similar) I'd definitely contribute towards a compatible design. That makes the most sense to me and I'm pretty sure if the speed gain was good there would be many others.
hero member
Activity: 720
Merit: 525
I think the idea of a community effort to raise the money (like Kickstartr) is really cool. If you had a "top contributors" list, I would hope the top three spots would taken by the FPGA miner makers (FPGA Mining, ngzhang, ztex). Hopefully there's enough motivation between the three of us to keep the burden off the community in general.

Once you get the clock speed up, name a price. Smiley
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I've been getting a lot of inquiries about licensing and availability of the design.  Most of these inquiries are not terribly serious.

The big problem here is that I have poured an enormous amount of time into this project, and all it takes is one leaked copy of the bitstream to negate that.  So if I'm going to release this, most of the workable strategies involve me getting compensated in full up-front.

At this point, the most likely outcome is that I will post a bounty on kickstartr or an equivalent site; if the pledges reach the threshhold I will release the design, most likely as ready-to-run bitstreams for the most popular boards (ztex, x6000, icarus, etc) and a Spartan-6 hard macro so it can be made to work on other boards without any remapping fuss.  Releasing the source is probably not all that useful for people; it's written in a custom language that lets me express repetitive geometry and topology simultaneously; the verilog (which is completely illegible) and placement constraints get extracted from that.

A less likely result is that somebody buys an exclusive license for the design.  This is really expensive.  I'm not holding my breath.

An even less likely result is that I sell per-board licenses using encrypted bitstreams.  Unfortunately the only way to do this is for every board to physically pass through my hands in California so I can burn in the decryption key for a design that is specific to that chip's DNA register value; the encryption is symmetric so I can't give out keys.  So this would have to be an "extra option" offered by a board manufacturer.  I don't think the odds of this happening are too great.  It's also incompatible with the kickstartr bounty option, so there would have to be some sort of minimum-board-production commitment.  Like I said, this option is highly unlikely.

Either way, this is all moot until the hashrate gets significantly above the open source miner (it will; there is tons of headroom).  I'm posting this to help set reasonable expectations.

The design is very easy to forward-port to the Xilinx 7-series parts; I just haven't had a reason to do that yet.  I've even backwards-ported it to older devices, but the effort/reward tradeoff there doesn't usually work out (it did this time only because I got the chips almost-for-free).  It's also possible to port it to most SASIC platforms, but my "are you serious about this" threshold for exploring that is really really high (and only with people based in the USA since there would be contracts involved).
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Here's the map output for the 8-Mar design (see update to first post in thread):


Design Summary
--------------
Number of errors:      0
Number of warnings:    4
Slice Logic Utilization:
  Number of Slice Registers:                94,029 out of 184,304   51%
    Number used as Flip Flops:              94,029
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     71,380 out of  92,152   77%
    Number used as logic:                   65,646 out of  92,152   71%
      Number using O6 output only:          11,155
      Number using O5 output only:               0
      Number using O5 and O6:               54,491
      Number used as ROM:                        0
    Number used as Memory:                   4,736 out of  21,680   21%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         4,736
        Number using O6 output only:           480
        Number using O5 output only:           480
        Number using O5 and O6:              3,776
    Number used exclusively as route-thrus:    998
      Number with same-slice register load:    993
      Number with same-slice carry load:         0
      Number with other load:                    5

Slice Logic Distribution:
  Number of occupied Slices:                18,772 out of  23,038   81%
  Nummber of MUXCYs used:                   30,080 out of  46,076   65%
  Number of LUT Flip Flop pairs used:       74,299
    Number with an unused Flip Flop:         6,862 out of  74,299    9%
    Number with an unused LUT:               2,919 out of  74,299    3%
    Number of fully used LUT-FF pairs:      64,518 out of  74,299   86%
    Number of unique control sets:              95
    Number of slice register sites lost
      to control set restrictions:             203 out of 184,304    1%

  A LUT Flip Flop pair for this architecture represents one LUT paired with
  one Flip Flop within a slice.  A control set is a unique combination of
  clock, reset, set, and enable signals for a registered element.
  The Slice Logic Distribution report is not meaningful if the design is
  over-mapped for a non-slice resource or if Placement fails.

IO Utilization:
  Number of bonded IOBs:                         1 out of     338    1%
    Number of LOCed IOBs:                        1 out of       1  100%

Specific Feature Utilization:
  Number of RAMB16BWERs:                         0 out of     268    0%
  Number of RAMB8BWERs:                          0 out of     536    0%
  Number of BUFIO2/BUFIO2_2CLKs:                 0 out of      32    0%
  Number of BUFIO2FB/BUFIO2FB_2CLKs:             0 out of      32    0%
  Number of BUFG/BUFGMUXs:                       6 out of      16   37%
    Number used as BUFGs:                        6
    Number used as BUFGMUX:                      0
  Number of DCM/DCM_CLKGENs:                     3 out of      12   25%
    Number used as DCMs:                         0
    Number used as DCM_CLKGENs:                  3
  Number of ILOGIC2/ISERDES2s:                   0 out of     586    0%
  Number of IODELAY2/IODRP2/IODRP2_MCBs:         0 out of     586    0%
  Number of OLOGIC2/OSERDES2s:                   0 out of     586    0%
  Number of BSCANs:                              1 out of       4   25%
  Number of BUFHs:                               0 out of     384    0%
  Number of BUFPLLs:                             0 out of       8    0%
  Number of BUFPLL_MCBs:                         0 out of       4    0%
  Number of DSP48A1s:                           30 out of     180   16%
  Number of ICAPs:                               0 out of       1    0%
  Number of MCBs:                                0 out of       4    0%
  Number of PCILOGICSEs:                         0 out of       2    0%
  Number of PLL_ADVs:                            3 out of       6   50%
  Number of PMVs:                                0 out of       1    0%
  Number of STARTUPs:                            1 out of       1  100%
  Number of SUSPEND_SYNCs:                       0 out of       1    0%

  Number of RPM macros:          294
Average Fanout of Non-Clock Nets:                2.31

Peak Memory Usage:  3449 MB
sr. member
Activity: 420
Merit: 250
Any updates on the progress of this design? Just curious.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
@ Eldentyrell

Would you accept any assistance on this task ?.
I have been working with ISE for some time now and i have two LX150 boards.
I would really like to help with the development if you are willing to share some of your knowledge.Smiley
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
I think the main problem of implementing only one ring is, that the result coming out of the SHA-256 operation has to be fed back into the other side, and long interconnects on FPGAs are notoriously slow. Thus, your clock rate suffers, which is probably what eldentyrell is experiencing.

Nah, because there's no way you can fit 64 stages in the width of the device (and you can't rotate 90 degrees because the carry chain only runs one way).  You have to have at least one "long vertical run" in any design that is more than 32 stages, and any design with less than 128 stages needs some amount of feedback.  So there's no way to avoid it.

Still, Stefan achieves about 180 or 184 MHz on the Spartan6-75 http://www.ztex.de/btcminer/ and I'm dying to learn what clock rate eldentyrell is getting.

Yes, I'm dying to learn that too.  Will report back as soon as I have a number that isn't embarrassing.  I previously got 160mhz (overclocked up to 170mhz) with my two-ring design, and I have no reason to believe this one will be any slower.  But it will certainly take me longer to get there.
Pages:
Jump to: