Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 14. (Read 432950 times)

sr. member
Activity: 262
Merit: 250
Sorry, I thought you were talking about scrolling down in the list of files. I got the file from the download section. Now I see what section you were talking about...
member
Activity: 107
Merit: 13
Error, error, error.....

There are a download section at http://www.ztex.de/btcminer. SCROLL DOWN.
Don't click on "Downloads" link at the left side menu bar, because it redirects you to SDK and Example downloads. It is an different downloads section.

Use this:
http://www.ztex.de/btcminer/ZtexBTCMiner-121126.tar.bz2
sr. member
Activity: 262
Merit: 250

I'm not familiar with the site. All  can find is some firmware (mostlyJava) and the only HDL I can find is for memory tests etc. and some references to the Leon open source design. They seem to have lots of documentation so it's most likely hidden somewhere one the site.Anybody knows where?



Maybe try to scroll down:)

Download a source package, extract it, and you can find it in "fpga" subdirectory.

Code:
tar xvfj ztex-121017.tar.bz2
cd ztex
 find . -type d -name fpga
./examples/usb-fpga-1.2/lightshow/fpga
./examples/usb-fpga-1.2/ucecho/fpga
./examples/usb-fpga-1.2/intraffic/fpga
./examples/usb-fpga-1.15y/ucecho/fpga
./examples/usb-fpga-1.15y/intraffic/fpga
./examples/usb-fpga-1.11/1.11c/lightshow/fpga
./examples/usb-fpga-1.11/1.11c/ucecho/fpga
./examples/usb-fpga-1.11/1.11c/intraffic/fpga
./examples/usb-fpga-1.11/1.11c/memtest/fpga
./examples/usb-fpga-1.11/1.11a/lightshow/fpga
./examples/usb-fpga-1.11/1.11a/ucecho/fpga
./examples/usb-fpga-1.11/1.11a/intraffic/fpga
./examples/usb-fpga-1.11/1.11a/memtest/fpga
./examples/usb-fpga-1.11/1.11b/lightshow/fpga
./examples/usb-fpga-1.11/1.11b/ucecho/fpga
./examples/usb-fpga-1.11/1.11b/intraffic/fpga
./examples/usb-fpga-1.11/1.11b/memtest/fpga
./examples/usb-fpga-1.15/1.15a/lightshow/fpga
./examples/usb-fpga-1.15/1.15a/ucecho/fpga
./examples/usb-fpga-1.15/1.15a/intraffic/fpga
./examples/usb-fpga-1.15/1.15a/memtest/fpga
./examples/usb-fpga-1.15/1.15a/mmio/fpga
./examples/usb-fpga-1.15/1.15d/lightshow/fpga
./examples/usb-fpga-1.15/1.15d/ucecho/fpga
./examples/usb-fpga-1.15/1.15d/intraffic/fpga
./examples/usb-fpga-1.15/1.15d/memtest/fpga
./examples/usb-fpga-1.15/1.15d/mmio/fpga
./examples/usb-fpga-1.15/1.15b/lightshow/fpga
./examples/usb-fpga-1.15/1.15b/ucecho/fpga
./examples/usb-fpga-1.15/1.15b/intraffic/fpga
./examples/usb-fpga-1.15/1.15b/memtest/fpga
./examples/usb-fpga-1.15/1.15b/mmio/fpga

Which all contains memory tests etc.

member
Activity: 107
Merit: 13

I'm not familiar with the site. All  can find is some firmware (mostlyJava) and the only HDL I can find is for memory tests etc. and some references to the Leon open source design. They seem to have lots of documentation so it's most likely hidden somewhere one the site.Anybody knows where?



Maybe try to scroll down:)

Download a source package, extract it, and you can find it in "fpga" subdirectory.
sr. member
Activity: 262
Merit: 250
I've tried to compile ztex's source

Where is the ztex source? Is it in one of the project directories?

http://www.ztex.de/btcminer/


I'm not familiar with the site. All  can find is some firmware (mostlyJava) and the only HDL I can find is for memory tests etc. and some references to the Leon open source design. They seem to have lots of documentation so it's most likely hidden somewhere one the site.Anybody knows where?

member
Activity: 107
Merit: 13
sr. member
Activity: 262
Merit: 250
I've tried to compile ztex's source

Where is the ztex source? Is it in one of the project directories?
member
Activity: 107
Merit: 13
Correct, something like that. I was thinking on-die memory segments could be used. But anything that would separate the hasher clock from the software communicator should be a good thing. I hadn't seen that code as I was working on the altera branches. They must be doing something right to achieve 200mh/s per chip on a spartan lx150 which in this thread (and on the hardware comparison page) topped out at 100mh/s on other boards (unless I missed some updates somewhere). The ztex design seems to be clocking 1 core at 200+mhz versus the other designs without hasher/controller separation clocking at 100mhz with 1 core. Would be amazing to double the clock rate of my altera chips from 220 to 440 w/ 3 cores!

Separating clock will not help for you(i've tried on xc6slx150). The frequency is limited by carry chains, not by the clock network delays.
As i know, ztex design allows 190MHz generally(probably calculated by xilinx at 85 celsius) , but voltage/temperature derating allows to increase frequency.

I've tried to compile ztex's source, and xst reported 230 MHz maximal clock freq. . I made some modifications, so i hope it will reach 190MHz after par, because Xst reported 316.312MHz.
lbr
sr. member
Activity: 423
Merit: 254
I'll soon receive the board based on Altera Stratix V.
Any advice? ; )
Is this the Stratix V development kit?
Nope..
sr. member
Activity: 262
Merit: 250
The git repository contains several designs in the projects directory in addition to the main src directory.

Which of these contains the fastest hashing core?

It's possible check them all, but it would take some time to build almost 20 designs using either Quartus or ISE/Vivado.
sr. member
Activity: 262
Merit: 250
I'll soon receive the board based on Altera Stratix V.
Any advice? ; )

Is this the Stratix V development kit?

It should build pretty much straight out of the box. But you need define you clock pin. You will probably have to modify main_pll.v in qmegawiz to match your external clock frequency (osc_clk) and target hash_clk.

But it appears that the Stratix V performance is below the Stratix IV, which I find a little odd.
hero member
Activity: 1118
Merit: 541
I'll soon receive the board based on Altera Stratix V.
Any advice? ; )
Also, if you want to poke around with the board with me, VNC+Skype could be arranged.

Send me a shout on freenode irc "senseless".

lbr
sr. member
Activity: 423
Merit: 254
I'll soon receive the board based on Altera Stratix V.
Any advice? ; )
Also, if you want to poke around with the board with me, VNC+Skype could be arranged.
legendary
Activity: 2128
Merit: 1073
Quartus, ISE, and Vivado all have options to target minimizing power.  I don't know how good they are at it; probably not very.
Yes, I misremembered. There is a "Design Goal" for "Power Optimization" that invokes synthesis with "Optimization Goal" of "Area" and adds "Power Reduction" flag. It must have scrolled so many times on my terminal that I completely forgot about the top-level goal.

I haven't really tried anything on Kintex-7, but various Virtex-[456] that I had available. After what you've said about Vivado and Family-7 I'm getting motivated to upgrade my toyset.

sr. member
Activity: 262
Merit: 250
I was thinking on-die memory segments could be used.

FIFO's are usually implemented using embedded memory on the FPGA's. Even if you claim not being a FPGA designer/coder you think like one Smiley

But if you run your miner clock domain way above the fmax it will quite often work as most devices are usually faster than their marked speed grade. But when you get your next board/batch it might fail constantly since you got slower devices. Also you have to be careful so that timing errors in the faster clock domain will not propagate into the slower clock domains, e.g. the FIFO enqueue signal beeing stuck asserted due to a timing error etc. It can potentially be a lot worse than just a bad nonce.
sr. member
Activity: 262
Merit: 250
Quote
I would imagine getting the fabric to run at 500MHz in a Kintex-7 device is also a challenge. Running the design as-is through Vivado with a 325 speed grade -2 target does not meet timing closure at 250MHz.
The rest of the fabric only needs to handle registers, routing signals, and the non-linear math.  Obviously the Kintex 7 fabric is capable of handling these frequencies for modest logic

This was the part of my concern. Getting that part to run at 500Mhz is a challenge, especially with multiple cores when utilization goes up

The issue is that FPGA's are routing constrained, especially in the Spartan 6's, and the tools aren't designed to handle these sorts of long chains.  The Kintex 7 chips are much nicer with respect to routing resources and consistency.

Yes. The CLB is pretty similar to the Spartan6, but it seems like the new switching matrix is quite effective when it comes to this type of logic/routing.

Altera Stratix-V does not seem to match this type of logic very well, at least with the current tools, as the Stratix-IV seem to outperform the Stratix-V. I don't understand why as the ALM does not seem to be radically different from the Stratix-IV.
hero member
Activity: 1118
Merit: 541
Quote
I had this idea for a bitcoin fpga design. I'm not an fpga designer/coder, but since you guys are talking about new designs I thought I would throw this in there.
Thank you for sharing your idea, Senseless.  I love getting people engaged in this field of engineering.

Quote
The code would be split into 2 segments on different clocks/plls.
Forgive me if I misunderstand your design, but I believe you have replicated what the current FPGA mining designs are already doing.  For example, on the X6500 board, the jtag_comm module communicates with the mining core in the rx_hash_clk clock domain, and communicates with the outside world in the jtag clock domain.  You can see the Asynchronous FIFO that shuttles golden nonces from rx_hash_clk clock to jtag clock here.

There is certainly work that could be done there, though.  JTAG is not a good communication method for this sort of task.  On the X6500 it was simply chosen to reduce cost and complexity.

Correct, something like that. I was thinking on-die memory segments could be used. But anything that would separate the hasher clock from the software communicator should be a good thing. I hadn't seen that code as I was working on the altera branches. They must be doing something right to achieve 200mh/s per chip on a spartan lx150 which in this thread (and on the hardware comparison page) topped out at 100mh/s on other boards (unless I missed some updates somewhere). The ztex design seems to be clocking 1 core at 200+mhz versus the other designs without hasher/controller separation clocking at 100mhz with 1 core. Would be amazing to double the clock rate of my altera chips from 220 to 440 w/ 3 cores!

Quote
Slightly related:  I would not recommend fully unrolled cores for an ASIC design.  It will certainly result in higher performance per area due to optimizations unique to the unrolled designs, but it means higher failure rates and lower clock speeds due to intra-die variations.  Fully rolled cores that can be individually enabled and clocked (or clocked in regions) should give better yield and overclocking.

What sort of pipelining would you recommend, I suppose 64 cycles per hash would be the smallest footprint and the highest clocked design? At some point routing issues will become a concern I guess I'll need to optimize the pipeline unrolling per chip. Pipelining would also allow for a greater use of available space (on an sasic at least). I would love to be able to better utilize all of the logic available on my chip (lacking 8% MLABs for a 4th fully unrolled core).
hero member
Activity: 560
Merit: 517
Quote
I had this idea for a bitcoin fpga design. I'm not an fpga designer/coder, but since you guys are talking about new designs I thought I would throw this in there.
Thank you for sharing your idea, Senseless.  I love getting people engaged in this field of engineering.

Quote
The code would be split into 2 segments on different clocks/plls.
Forgive me if I misunderstand your design, but I believe you have replicated what the current FPGA mining designs are already doing.  For example, on the X6500 board, the jtag_comm module communicates with the mining core in the rx_hash_clk clock domain, and communicates with the outside world in the jtag clock domain.  You can see the Asynchronous FIFO that shuttles golden nonces from rx_hash_clk clock to jtag clock here.

There is certainly work that could be done there, though.  JTAG is not a good communication method for this sort of task.  On the X6500 it was simply chosen to reduce cost and complexity.
hero member
Activity: 560
Merit: 517
Quote
I would imagine getting the fabric to run at 500MHz in a Kintex-7 device is also a challenge. Running the design as-is through Vivado with a 325 speed grade -2 target does not meet timing closure at 250MHz.
In this case, the DSP48E1's are taking care of the heavy lifting; three and two-way 32-bit addition.  The rest of the fabric only needs to handle registers, routing signals, and the non-linear math.  Obviously the Kintex 7 fabric is capable of handling these frequencies for modest logic, otherwise the DSP's would be unusable in the first place Tongue

But, as I said before, I already did a rudimentary implementation of this design and synthesized/routed it.  Timing reported ~400MHz on the devkit.

Quote
What you should really aim is power optimization. To my knowledge none of the popular toolchains has such a goal available.
Quartus, ISE, and Vivado all have options to target minimizing power.  I don't know how good they are at it; probably not very.

Quote
I guess working with the two unrolled copies of SHA-256 produces such a wild mess of trees primitives that it is possible to lose ones bearing in the jungle of vines signals.
Actually, unrolled cores are very straight-forward designs.  The issue is that FPGA's are routing constrained, especially in the Spartan 6's, and the tools aren't designed to handle these sorts of long chains.  The Kintex 7 chips are much nicer with respect to routing resources and consistency.  Also, the newer Vivado Studio tool does a much better job than ISE in my experiences with it so far.  It's a shame Vivado does not support S6.

Quote
Could easily take this design into an avalon style 1 chip per core; but seems like an awful waste of PCB space.
In my opinion, Avalon was smart in this regard and did it right.  Using lots of chips is a very good thing for these early mining ASIC's; I would not have recommended it any other way.  This is because there are rather large Minimum Order Quantities when producing ASIC's.  If you sell 1000 units, each with 4 chips, you aren't going to reach the necessary MOQ's, which are at least 50K chips.  Selling 1000 units, each with 240 chips, puts you in that beautiful quantity where the fabs and factories start giving you the time of day.  And the cost of everything else goes down.  In the long-run, yes, bigger chips are a better idea since they require less overall supporting circuitry and PCB space.

Slightly related:  I would not recommend fully unrolled cores for an ASIC design.  It will certainly result in higher performance per area due to optimizations unique to the unrolled designs, but it means higher failure rates and lower clock speeds due to intra-die variations.  Fully rolled cores that can be individually enabled and clocked (or clocked in regions) should give better yield and overclocking.
tbd
newbie
Activity: 45
Merit: 0
Maybe some sort of non-profit coop to collect funds to get the initial design conversion, mask printing and chips made? Could then just sell chips on as needed basis close to cost.

I like this idea.
Pages:
Jump to: