Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 13. (Read 432965 times)

member
Activity: 107
Merit: 13
The chip in question is the xq7a200. Try the dual core design. Also, you should be able to fit 1 core into DSP slices.

Is that the device on the Artix eval board?

Sorry not XQ, but XC.

XC7A200T

http://www.xilinx.com/products/boards-and-kits/EK-A7-AC701-G.htm



-3 speed grade?
sr. member
Activity: 262
Merit: 250
Quote
Would you mind posting what you have in the git tree?
Sure.  I want to finish the UART comm and then I'll make a push.  I'm quite interested to see how the Artix chips work out.

Did you push it (or intend to push it) into the current tree at git://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner.git ?
sr. member
Activity: 262
Merit: 250
sr. member
Activity: 262
Merit: 250
Sorry not XQ, but XC.

XC7A200T

Do you know how much the FPGA itself cost?

sr. member
Activity: 262
Merit: 250
Quote
Is that a single hashing core?
Yes, a single, fully pipelined DSP48E1 core.

Impressive!
hero member
Activity: 560
Merit: 517
Quote
Is that a single hashing core?
Yes, a single, fully pipelined DSP48E1 core.
hero member
Activity: 1118
Merit: 541
The chip in question is the xq7a200. Try the dual core design. Also, you should be able to fit 1 core into DSP slices.

Is that the device on the Artix eval board?

Sorry not XQ, but XC.

XC7A200T

http://www.xilinx.com/products/boards-and-kits/EK-A7-AC701-G.htm

sr. member
Activity: 262
Merit: 250
The chip in question is the xq7a200. Try the dual core design. Also, you should be able to fit 1 core into DSP slices.

Is that the device on the Artix eval board?
full member
Activity: 126
Merit: 100
Would this miner work on the Xilinx Virtex-7 dev/eval kit?
sr. member
Activity: 262
Merit: 250
Quote
Would you mind posting what you have in the git tree?
Compile and test for 400MH/s just finished.  KC705 officially beats the X6500.  Quad boards, you're next.

Is that a single hashing core?
hero member
Activity: 1118
Merit: 541
I'm quite interested to see how the Artix chips work out.

I gave the files in the rtl directory (I don't know which of the project directories contains the best performing hashing core) a run through vivado and got 59790 slice LUT's (94%) and -1.558ns setup violation on a 5ns clock (roughly 150MHz) in a  xq7a100tfg484-2I device. I have no clue as how much this chip cost though...

The chip in question is the xq7a200. Try the dual core design. Also, you should be able to fit 1 core into DSP slices.

sr. member
Activity: 262
Merit: 250
I'm quite interested to see how the Artix chips work out.

I gave the files in the rtl directory (I don't know which of the project directories contains the best performing hashing core) a run through vivado and got 59790 slice LUT's (94%) and -1.558ns setup violation on a 5ns clock (roughly 150MHz) in a  xq7a100tfg484-2I device. I have no clue as how much this chip cost though...
hero member
Activity: 560
Merit: 517
Quote
Would you mind posting what you have in the git tree?
Sure.  I want to finish the UART comm and then I'll make a push.  I'm quite interested to see how the Artix chips work out.

Compile and test for 400MH/s just finished.  KC705 officially beats the X6500.  Quad boards, you're next.
hero member
Activity: 1118
Merit: 541
I downloaded the latest Vivado IDE, and finally hammered out the code for my DSP48E1 miner.  It is now working happily on my KC705 devkit, which has a Kintex 7 on it.  I haven't pushed the clock rate up yet, so for now it's only running at 300MH/s.  Should be able to get between 400 and 450MH/s out of a fully pipelined DSP48E1 hashing core, depending on how close to the DSP48E1's max spec I can get on this speed grade (-2).  No accurate power measurements yet.  Back of the napkin says 11W, but that seems a bit high; probably a lot of static power usage.

The design is currently using 80% of the DSP48E1's on that chip, and about 25% of other resources.  My goal is to at least get 1GH/s out of this chip, ideally 2GH/s.  Regardless, even 400MH/s will beat the ole X6500's, which needed two chips to get 400MH/s Tongue


On a slightly related note, I released my FPGA-based vanitygen code today: https://bitcointalksearch.org/topic/open-source-vanitygen-for-fpgas-152444.

EDIT: By the way, I'm pretty happy with the KC705 so far. Lots of great bells and whistles to play with, and most importantly ... they included long USB cables. I can't tell you how many times I get developer-grade equipment with dinky pig-tail USB cables.  Beyond that, the kit comes with an on-board USB-UART bridge, on-board USB-JTAG, and a heatsink-fan combo for the Kintex 7 which I will be sure to cook breakfast on.

Would you mind posting what you have in the git tree? I'm going to be getting a KC705 and a AC701. The A7 200K has nearly as many DSPs as the K7 325K. I'll be sure to let you know of any optimizations I find. I'm starting to think that the A7 200K will be the most cost effective of the latest gen xilinx chips.

hero member
Activity: 560
Merit: 517
I downloaded the latest Vivado IDE, and finally hammered out the code for my DSP48E1 miner.  It is now working happily on my KC705 devkit, which has a Kintex 7 on it.  I haven't pushed the clock rate up yet, so for now it's only running at 300MH/s.  Should be able to get between 400 and 450MH/s out of a fully pipelined DSP48E1 hashing core, depending on how close to the DSP48E1's max spec I can get on this speed grade (-2).  No accurate power measurements yet.  Back of the napkin says 11W, but that seems a bit high; probably a lot of static power usage.

The design is currently using 80% of the DSP48E1's on that chip, and about 25% of other resources.  My goal is to at least get 1GH/s out of this chip, ideally 2GH/s.  Regardless, even 400MH/s will beat the ole X6500's, which needed two chips to get 400MH/s Tongue


On a slightly related note, I released my FPGA-based vanitygen code today: https://bitcointalksearch.org/topic/open-source-vanitygen-for-fpgas-152444.

EDIT: By the way, I'm pretty happy with the KC705 so far. Lots of great bells and whistles to play with, and most importantly ... they included long USB cables. I can't tell you how many times I get developer-grade equipment with dinky pig-tail USB cables.  Beyond that, the kit comes with an on-board USB-UART bridge, on-board USB-JTAG, and a heatsink-fan combo for the Kintex 7 which I will be sure to cook breakfast on.
hero member
Activity: 1118
Merit: 541
does not give you the same routing and performance as
CLKFX_DIVIDE => 5,
CLKFX_MULTIPLY =>6,

I noticed when working on Quartus II that it will always determine the lowest common denominator for my clock settings. Right now just for simplicity's sake i'm doing multiplier of 1000 and divisor of 217, which allows me to step up in small increments. Whenever I compile i notice a little line saying it's adjusting the setting to some other figures (which give nearly identical clock sometimes it does change by 0.1mhz or so). I'm guessing ISE doesn't have that kind of feature?

sr. member
Activity: 399
Merit: 250
Sorry but it can and does.
There is no correct Measurement of delay UNTIL the device is thrugh the place and route stage, sometimes "map" can come close, but I've had bitcoin designs that give  closure timings for over 300MHZ, but then after final P&R  they hit stupidly low figures

I.E 87Mhz....... or 87MH/s

which is just embarrassing..... for a V5 or V6,(HOT TIP coming up...)

The  absolutely F***** stupid thing is that changing a single DCM_BASE or DCM_ADV parameter can  trash your results....
The unit is supposed to be totally self contained clock multiplier(which it is...) but what they don't tell you is that when you configure it, the configuration pins are either grounded or taken high and depending on the combination you choose, it can screw up the routing resources to such an extent you loose massive throughput.

CLKFX_DIVIDE => 10,
CLKFX_MULTIPLY =>12,

does not give you the same routing and performance as
CLKFX_DIVIDE => 5,
CLKFX_MULTIPLY =>6,

Even though the final CLKFX frequency that feeds your logic is the same!!!!!

The way I tackled it was to find the BEST routing configuration frequency for the DCM then to externally CHANGE the crystal to match the internal clocking rate I wanted, which defeats the purpose of the DCM......
The other solution would be to register the DCM_ADV then dynamically reconfigure during running.



member
Activity: 107
Merit: 13
Problem is that XST ALWAYS reports shit hot timings for the simulation, but once the design is mapped into the actual device, then the  timings go to pot because of the way the interconnects work.(some of the XST tools just look at the 'pure logic' chains for timing).

Yes, but the xst reported max. clock and par reported max. clock ratio will not change a lot.
sr. member
Activity: 399
Merit: 250
Correct, something like that. I was thinking on-die memory segments could be used. But anything that would separate the hasher clock from the software communicator should be a good thing. I hadn't seen that code as I was working on the altera branches. They must be doing something right to achieve 200mh/s per chip on a spartan lx150 which in this thread (and on the hardware comparison page) topped out at 100mh/s on other boards (unless I missed some updates somewhere). The ztex design seems to be clocking 1 core at 200+mhz versus the other designs without hasher/controller separation clocking at 100mhz with 1 core. Would be amazing to double the clock rate of my altera chips from 220 to 440 w/ 3 cores!

Separating clock will not help for you(i've tried on xc6slx150). The frequency is limited by carry chains, not by the clock network delays.
As i know, ztex design allows 190MHz generally(probably calculated by xilinx at 85 celsius) , but voltage/temperature derating allows to increase frequency.

I've tried to compile ztex's source, and xst reported 230 MHz maximal clock freq. . I made some modifications, so i hope it will reach 190MHz after par, because Xst reported 316.312MHz.

Problem is that XST ALWAYS reports shit hot timings for the simulation, but once the design is mapped into the actual device, then the  timings go to pot because of the way the interconnects work.(some of the XST tools just look at the 'pure logic' chains for timing).

Also as regards splitting the clocks.. it is a bad idea and there is no need for it....(in this design), becasue once you have more than one clock you have to deal with crossing clock domains and then you have to deal with shitty situations of clock lag and jitter over multiple clock sources.
As regards heat.. the hotter it runs the shorter it lives, Xilinx starts to shutdown at just over 85deg. die temp.(its designed into the die)

I've taken the XUPV5 to over 350MH/s,  but it has required a very special power supply design and special PCB (which smells like cooked hairy crab when it's running full pelt)+6 17CFM fans...(those are special 'maglev' designs, not like the shitty stuff with the oil and the shitty washer holding the spindle in the housing)

BUT It's a bitch, yesterday it worked fine but today it was heavy rain and It's getting bad shares, but dropping it back 20MH/s fixed it until the rain became heavier...

it is purely a research project as both Tom & BFL have screwed me on my ASIC deliveries.

Finally as regards to the 'main delay' being in communication... actually it is unlikely, rather it is in some FPGA designs that don't allow block interruption when the block changes.
 I suspect this becasue  of the FIFO's and the increased USART timings(230400) I designed in to deal with other 'idle time'

Plus there are a number of shortcuts (nope not the well documented SHA256 ones) even saving a few tens of ms per round all adds up....

member
Activity: 107
Merit: 13
Sorry, I thought you were talking about scrolling down in the list of files. I got the file from the download section. Now I see what section you were talking about...

No problem.
Pages:
Jump to: