Pages:
Author

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 27. (Read 119429 times)

donator
Activity: 367
Merit: 250
ZTEX FPGA Boards
Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue):

Err, isn't "Simultaneous switching" issue about I/O pins?  not internal core logic.

Yes, it is -- at least in the Xilinx datasheets (SSO = Simultaneous Switching Outputs).

Could you clarify your comment, ztex?  Also, do you have a link to Inspector2211's comment?

The internal GND traces of the S6 seem to be a little bit weak.

I suspect so as well (or that the VCCINT traces are weak).  However, any details from Xilinx on this would be useful -- at least an acknowledgement that XPA isn't fully aware of the device's limitations.

According to the Xilinx docs SSO's *does*  influence internal logic / other components (especially the MCB).  Did you ever asked why?

One possible explanation would be a too large internal GND resistance. If there are large currents (e.g. from I/O's) voltage at internal GND rises to much and voltage between VCCINT and GND falls to much ...

donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue):

Err, isn't "Simultaneous switching" issue about I/O pins?  not internal core logic.

Yes, it is -- at least in the Xilinx datasheets (SSO = Simultaneous Switching Outputs).

Could you clarify your comment, ztex?  Also, do you have a link to Inspector2211's comment?



The internal GND traces of the S6 seem to be a little bit weak.

I suspect so as well (or that the VCCINT traces are weak).  However, any details from Xilinx on this would be useful -- at least an acknowledgement that XPA isn't fully aware of the device's limitations.
hero member
Activity: 592
Merit: 501
We will stand and fight.
What if you shift the clock of the middle ring?
Maybe the voltage internally in the chip in the middle drops to much each clock edge.

a trade-off is: the added GCLKs will consume more power.
but i think that's worth trying. Smiley
member
Activity: 89
Merit: 10

Err, isn't "Simultaneous switching" issue about I/O pins?  not internal core logic.
donator
Activity: 367
Merit: 250
ZTEX FPGA Boards
Bitfury experienced a similar thing.

Yeah, I know… once I have fewer things on my to-do list I think me and him and anybody else interested ought to heckle forums.xilinx.com until they own up to this issue.  I have been seeing the very same "center of the fabric drops out first" phenomenon, but until I read about his experiences I had it chalked up to my crappy homemade boards.  Now that I'm seeing it on ztex's boards too I am kinda disappointed with X.

Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue): The internal GND traces of the S6 seem to be a little bit weak.


hero member
Activity: 1596
Merit: 502
What if you shift the clock of the middle ring?
Maybe the voltage internally in the chip in the middle drops to much each clock edge.
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
Ok, so, it appears that I can get the top and bottom rings running at the rated speed (I'm still using 150mhz builds because they finish fast).  But the middle ring only runs at 60% of expected speed unless the top+bottom rings are switched off (or running super slow).

If it runs stable overnight I will launch a high-frequency build and post those bitstreams when they finish.  It won't be the predicted hashrate, but it should still be an improvement over what people have right now.  And no commissions until I figure out wtf is really going on here.

Sounds like you need prime numbers.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
quit and never go back - simplicity at it's best Smiley
I like it.

Wink

Or, at least, manual intervention required to go back.

I suppose a better idea would be a 3-line script that emails the operator to let him/her know that it has "downshifted".
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Bitfury experienced a similar thing.

Yeah, I know… once I have fewer things on my to-do list I think me and him and anybody else interested ought to heckle forums.xilinx.com until they own up to this issue.  I have been seeing the very same "center of the fabric drops out first" phenomenon, but until I read about his experiences I had it chalked up to my crappy homemade boards.  Now that I'm seeing it on ztex's boards too I am kinda disappointed with X.

Xilinx never designed their FPGAs in such a way that 95% of all flip-flops could switch at the same time.
They just didn't.

Maybe, but they steadfastly refuse to post maximum current ratings for their devices, and say over and over "run our power analysis tools, and if the tool says it's ok, it's ok".

Well, all my designs pass their power analyses.  Yet the voltage near the center of the chip is clearly sagging.

Basically, the power analysis tools are effectively "part of the datasheet" and Xilinx has a serious datasheet error here.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Ok, so, it appears that I can get the top and bottom rings running at the rated speed (I'm still using 150mhz builds because they finish fast).  But the middle ring only runs at 60% of expected speed unless the top+bottom rings are switched off (or running super slow).

If it runs stable overnight I will launch a high-frequency build and post those bitstreams when they finish.  It won't be the predicted hashrate, but it should still be an improvement over what people have right now.  And no commissions until I figure out wtf is really going on here.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Hookup an oscilloscope to the vccint, close as possible to the fpga.

Y'know, I was never any good with an oscilliscope.  One of these days….

-Make sure new midstate load etc doesn't results in spikes.

Check.  I deliberately don't stop the rings when loading nonces for this very reason; I just let garbage fly out the back end due to half-loaded work.  The noise caused by that huge change in power consumption is not worth it.

-Stagger the rings start time/midstate load/nonce wrap

-Use phase offset to interleave clock transitions for the different rings

Well, they're on different clocks.  However, I will build one where they all use the same clock so I can try this -- good ideas.


-Ramp the clocks up gradually from idle

Yes, already doing this.

It could also be the PLL suffering from too much noise. Try changing the loopfilter/bandwidth of the PLL.

Since it's only a jitter filter I have it on the lowest bandwidth setting.

I'm also going to try dropping it altogether after finding a comment by Austin saying that Xilinx's PLLs are very sensitive to activity in nearby logic

Might be hard but try an external high-speed clock source (connection/termination to the board is critical)

Unfortunately I don't have boards that can do that (SMA connectors, right?)
hero member
Activity: 686
Merit: 564
On ZTEX boards, the FPGA's JTAG signals are not even connected to the Cypress FX2 microcontroller.
That's unfortunate. I'm guessing he's not broken out the appropriate pins to allow the two to be connected either.
donator
Activity: 543
Merit: 500
I don't know if I missed a statement for this question completely or it has not yet been answered...

What about eldentyrell's ("tricone mining") bitstream? Are you going to support it? Will it be the first bitstream working* with CM? Or are you focusing on your own bitstream?

(* working means more than 700 MH/s)

You really should be asking eldentyrell this question.  Given his plans for a commission structure, it makes no sense for anyone other than himself to work on implementations.
So, eldentyrell, what do you say? Wink (CM = Cairnsmore board by Enterpoint)
member
Activity: 89
Merit: 10

You have probably thought of this stuff already but here goes:

Hookup an oscilloscope to the vccint, close as possible to the fpga.
Make it look smooth on the scope at all times by:

-Make sure new midstate load etc doesn't results in spikes.

-Stagger the rings start time/midstate load/nonce wrap

-Use phase offset to interleave clock transitions for the different rings

-Ramp the clocks up gradually from idle


It could also be the PLL suffering from too much noise. Try changing the loopfilter/bandwidth of the PLL.
Might be hard but try an external high-speed clock source (connection/termination to the board is critical)


sr. member
Activity: 448
Merit: 250
Hrm.

So, I have a bitstream that will run error-free on the ztex board at 170mhz as long as I only use one of the three rings.  I can also run any one ring at 170mhz and the other two really slow (like 50mhz slow).  But if I use all three rings at full speed, I get errors all the way down to some pretty embarrassingly-poor hash rates.  I experienced a similar phenomenon on my own boards, but it wasn't nearly this severe and the optimal clock frequencies were still giving me 245+MH/s on my SG-2 boards (ztex uses faster SG-3 chips).

I'll be doing some more experiments on the clock-rate/error relationship this evening, but the important questions require a new build in order to answer, and that's going to take 24-48 hours (sorry, folks).  Still lots of tricks up my sleeve, but they take (build) time.

Bitfury experienced a similar thing.
It's probably ground bounce INTERNALLY to the FPGA.
Or something like that.
Xilinx never designed their FPGAs in such a way that 95% of all flip-flops could switch at the same time.
They just didn't.
But that's what a miner does.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
As of today, such a bitstream change would have to be manually handled.

No.  There is an option to quit the miner if it is unable to contact the signcryption server.  So you launch it from a three-line shell script:


#!/bin/bash
run-tml-miner
run-old-miner


problem solved.

When you submit free patches to all the major mining software packages to support automatic failover to backup bitstreams I will agree with you.

I hereby open-source the above three-line shell script.
quit and never go back - simplicity at it's best Smiley
I like it.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
What's the specs on the box you are building on? Always good to know for comparison.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Hrm.

So, I have a bitstream that will run error-free on the ztex board at 170mhz as long as I only use one of the three rings.  I can also run any one ring at 170mhz and the other two really slow (like 50mhz slow).  But if I use all three rings at full speed, I get errors all the way down to some pretty embarrassingly-poor hash rates.  I experienced a similar phenomenon on my own boards, but it wasn't nearly this severe and the optimal clock frequencies were still giving me 245+MH/s on my SG-2 boards (ztex uses faster SG-3 chips).

I'll be doing some more experiments on the clock-rate/error relationship this evening, but the important questions require a new build in order to answer, and that's going to take 24-48 hours (sorry, folks).  Still lots of tricks up my sleeve, but they take (build) time.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
That quote's from a thread about a Virtex-5 device. As I recall those have metal heatspreaders. So a different chip, different packaging

No.

The whole reason for using junction temperature is that it's package-independent.

The package determines the relationship between the air/board/case temperature and the junction temperature.  Junction temperature is all that matters, but you can't measure it directly, so you compute it using thermal constants from the package.

, and built on a different process (65nm rather than 45nm).

Good point, but if the 45nm process really was more easily damaged by temperature you'd see that reflected in lower maximum junction temperatures in the datasheet.  The fact that Xilinx didn't change them means it's unlikely there has been a major change in temperature tolerance.  And we're only talking about one generation difference in process here -- it's not like 180nm vs 22nm or anything like that.
donator
Activity: 980
Merit: 1004
felonious vagrancy, personified
Well, I am sitting here staring at PAR grinding slowly along.  I don't know if I'll be able to stay awake until it finishes.

Assuming nothing goes wrong (big if), preview bitstreams in the morning.
Pages:
Jump to: