Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 27.

ztex

donator

Activity: 367

Merit: 250

ZTEX FPGA Boards

Quote from: eldentyrell on June 18, 2012, 08:31:09 PM

Quote from: ztex on June 18, 2012, 08:54:14 AM

Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue):

Quote from: pusle on June 18, 2012, 09:57:03 AM

Err, isn't "Simultaneous switching" issue about I/O pins? not internal core logic.

Yes, it is -- at least in the Xilinx datasheets (SSO = Simultaneous Switching Outputs).

Could you clarify your comment, ztex? Also, do you have a link to Inspector2211's comment?

Quote from: ztex on June 18, 2012, 08:54:14 AM

The internal GND traces of the S6 seem to be a little bit weak.

I suspect so as well (or that the VCCINT traces are weak). However, any details from Xilinx on this would be useful -- at least an acknowledgement that XPA isn't fully aware of the device's limitations.

According to the Xilinx docs SSO's *does* influence internal logic / other components (especially the MCB). Did you ever asked why?

One possible explanation would be a too large internal GND resistance. If there are large currents (e.g. from I/O's) voltage at internal GND rises to much and voltage between VCCINT and GND falls to much ...

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: ztex on June 18, 2012, 08:54:14 AM

Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue):

Quote from: pusle on June 18, 2012, 09:57:03 AM

Err, isn't "Simultaneous switching" issue about I/O pins? not internal core logic.

Yes, it is -- at least in the Xilinx datasheets (SSO = Simultaneous Switching Outputs).

Could you clarify your comment, ztex? Also, do you have a link to Inspector2211's comment?

Quote from: ztex on June 18, 2012, 08:54:14 AM

The internal GND traces of the S6 seem to be a little bit weak.

I suspect so as well (or that the VCCINT traces are weak). However, any details from Xilinx on this would be useful -- at least an acknowledgement that XPA isn't fully aware of the device's limitations.

ngzhang

hero member

Activity: 592

Merit: 501

We will stand and fight.

Quote from: pieppiep on June 18, 2012, 01:47:35 AM

What if you shift the clock of the middle ring?
Maybe the voltage internally in the chip in the middle drops to much each clock edge.

a trade-off is: the added GCLKs will consume more power.
but i think that's worth trying.

pusle

member

Activity: 89

Merit: 10

Err, isn't "Simultaneous switching" issue about I/O pins? not internal core logic.

ztex

donator

Activity: 367

Merit: 250

ZTEX FPGA Boards

Quote from: eldentyrell on June 18, 2012, 12:45:21 AM

Quote from: Inspector 2211 on June 16, 2012, 09:13:52 PM

Bitfury experienced a similar thing.

Yeah, I know… once I have fewer things on my to-do list I think me and him and anybody else interested ought to heckle forums.xilinx.com until they own up to this issue. I have been seeing the very same "center of the fabric drops out first" phenomenon, but until I read about his experiences I had it chalked up to my crappy homemade boards. Now that I'm seeing it on ztex's boards too I am kinda disappointed with X.

Inspector 2211 already mentioned it and it is also hidden in the datasheet ("Simultaneous switching" issue): The internal GND traces of the S6 seem to be a little bit weak.

pieppiep

hero member

Activity: 1596

Merit: 502

What if you shift the clock of the middle ring?
Maybe the voltage internally in the chip in the middle drops to much each clock edge.

DiabloD3

legendary

Activity: 1162

Merit: 1000

DiabloMiner author

Quote from: eldentyrell on June 18, 2012, 12:42:32 AM

Ok, so, it appears that I can get the top and bottom rings running at the rated speed (I'm still using 150mhz builds because they finish fast). But the middle ring only runs at 60% of expected speed unless the top+bottom rings are switched off (or running super slow).

If it runs stable overnight I will launch a high-frequency build and post those bitstreams when they finish. It won't be the predicted hashrate, but it should still be an improvement over what people have right now. And no commissions until I figure out wtf is really going on here.

Sounds like you need prime numbers.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: kano on June 16, 2012, 08:07:04 PM

quit and never go back - simplicity at it's best

I like it.

Or, at least, manual intervention required to go back.

I suppose a better idea would be a 3-line script that emails the operator to let him/her know that it has "downshifted".

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: Inspector 2211 on June 16, 2012, 09:13:52 PM

Bitfury experienced a similar thing.

Yeah, I know… once I have fewer things on my to-do list I think me and him and anybody else interested ought to heckle forums.xilinx.com until they own up to this issue. I have been seeing the very same "center of the fabric drops out first" phenomenon, but until I read about his experiences I had it chalked up to my crappy homemade boards. Now that I'm seeing it on ztex's boards too I am kinda disappointed with X.

Quote from: Inspector 2211 on June 16, 2012, 09:13:52 PM

Xilinx never designed their FPGAs in such a way that 95% of all flip-flops could switch at the same time.
They just didn't.

Maybe, but they steadfastly refuse to post maximum current ratings for their devices, and say over and over "run our power analysis tools, and if the tool says it's ok, it's ok".

Well, all my designs pass their power analyses. Yet the voltage near the center of the chip is clearly sagging.

Basically, the power analysis tools are effectively "part of the datasheet" and Xilinx has a serious datasheet error here.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Ok, so, it appears that I can get the top and bottom rings running at the rated speed (I'm still using 150mhz builds because they finish fast). But the middle ring only runs at 60% of expected speed unless the top+bottom rings are switched off (or running super slow).

If it runs stable overnight I will launch a high-frequency build and post those bitstreams when they finish. It won't be the predicted hashrate, but it should still be an improvement over what people have right now. And no commissions until I figure out wtf is really going on here.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: pusle on June 17, 2012, 04:59:34 AM

Hookup an oscilloscope to the vccint, close as possible to the fpga.

Y'know, I was never any good with an oscilliscope. One of these days….

Quote from: pusle on June 17, 2012, 04:59:34 AM

-Make sure new midstate load etc doesn't results in spikes.

Check. I deliberately don't stop the rings when loading nonces for this very reason; I just let garbage fly out the back end due to half-loaded work. The noise caused by that huge change in power consumption is not worth it.

Quote from: pusle on June 17, 2012, 04:59:34 AM

-Stagger the rings start time/midstate load/nonce wrap

-Use phase offset to interleave clock transitions for the different rings

Well, they're on different clocks. However, I will build one where they all use the same clock so I can try this -- good ideas.

Quote from: pusle on June 17, 2012, 04:59:34 AM

-Ramp the clocks up gradually from idle

Yes, already doing this.

Quote from: pusle on June 17, 2012, 04:59:34 AM

It could also be the PLL suffering from too much noise. Try changing the loopfilter/bandwidth of the PLL.

Since it's only a jitter filter I have it on the lowest bandwidth setting.

I'm also going to try dropping it altogether after finding a comment by Austin saying that Xilinx's PLLs are very sensitive to activity in nearby logic

Quote from: pusle on June 17, 2012, 04:59:34 AM

Might be hard but try an external high-speed clock source (connection/termination to the board is critical)

Unfortunately I don't have boards that can do that (SMA connectors, right?)

makomk

hero member

Activity: 686

Merit: 564

Quote from: Inspector 2211 on June 14, 2012, 01:46:50 PM

On ZTEX boards, the FPGA's JTAG signals are not even connected to the Cypress FX2 microcontroller.

That's unfortunate. I'm guessing he's not broken out the appropriate pins to allow the two to be connected either.

ShadesOfMarble

donator

Activity: 543

Merit: 500

Quote from: Entropy-uc on June 17, 2012, 01:16:07 PM

Quote from: ShadesOfMarble on June 17, 2012, 12:12:19 PM

I don't know if I missed a statement for this question completely or it has not yet been answered...

What about eldentyrell's ("tricone mining") bitstream? Are you going to support it? Will it be the first bitstream working* with CM? Or are you focusing on your own bitstream?

(* working means more than 700 MH/s)

You really should be asking eldentyrell this question. Given his plans for a commission structure, it makes no sense for anyone other than himself to work on implementations.

So, eldentyrell, what do you say? Wink

(CM = Cairnsmore board by Enterpoint)

pusle

member

Activity: 89

Merit: 10

You have probably thought of this stuff already but here goes:

Hookup an oscilloscope to the vccint, close as possible to the fpga.
Make it look smooth on the scope at all times by:

-Make sure new midstate load etc doesn't results in spikes.

-Stagger the rings start time/midstate load/nonce wrap

-Use phase offset to interleave clock transitions for the different rings

-Ramp the clocks up gradually from idle

It could also be the PLL suffering from too much noise. Try changing the loopfilter/bandwidth of the PLL.
Might be hard but try an external high-speed clock source (connection/termination to the board is critical)

Inspector 2211

sr. member

Activity: 448

Merit: 250

Quote from: eldentyrell on June 16, 2012, 07:22:16 PM

Hrm.

So, I have a bitstream that will run error-free on the ztex board at 170mhz as long as I only use one of the three rings. I can also run any one ring at 170mhz and the other two really slow (like 50mhz slow). But if I use all three rings at full speed, I get errors all the way down to some pretty embarrassingly-poor hash rates. I experienced a similar phenomenon on my own boards, but it wasn't nearly this severe and the optimal clock frequencies were still giving me 245+MH/s on my SG-2 boards (ztex uses faster SG-3 chips).

I'll be doing some more experiments on the clock-rate/error relationship this evening, but the important questions require a new build in order to answer, and that's going to take 24-48 hours (sorry, folks). Still lots of tricks up my sleeve, but they take (build) time.

Bitfury experienced a similar thing.
It's probably ground bounce INTERNALLY to the FPGA.
Or something like that.
Xilinx never designed their FPGAs in such a way that 95% of all flip-flops could switch at the same time.
They just didn't.
But that's what a miner does.

kano

legendary

Activity: 4592

Merit: 1851

Linux since 1997 RedHat 4

Quote from: eldentyrell on June 16, 2012, 02:22:44 AM

Quote from: Entropy-uc on June 09, 2012, 09:51:16 AM

As of today, such a bitstream change would have to be manually handled.

No. There is an option to quit the miner if it is unable to contact the signcryption server. So you launch it from a three-line shell script:

#!/bin/bash run-tml-miner run-old-miner

problem solved.

Quote from: Entropy-uc on June 09, 2012, 09:51:16 AM

When you submit free patches to all the major mining software packages to support automatic failover to backup bitstreams I will agree with you.

I hereby open-source the above three-line shell script.

quit and never go back - simplicity at it's best

I like it.

rjk

sr. member

Activity: 448

Merit: 250

1ngldh

What's the specs on the box you are building on? Always good to know for comparison.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Hrm.

So, I have a bitstream that will run error-free on the ztex board at 170mhz as long as I only use one of the three rings. I can also run any one ring at 170mhz and the other two really slow (like 50mhz slow). But if I use all three rings at full speed, I get errors all the way down to some pretty embarrassingly-poor hash rates. I experienced a similar phenomenon on my own boards, but it wasn't nearly this severe and the optimal clock frequencies were still giving me 245+MH/s on my SG-2 boards (ztex uses faster SG-3 chips).

I'll be doing some more experiments on the clock-rate/error relationship this evening, but the important questions require a new build in order to answer, and that's going to take 24-48 hours (sorry, folks). Still lots of tricks up my sleeve, but they take (build) time.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Quote from: makomk on June 11, 2012, 06:40:21 AM

That quote's from a thread about a Virtex-5 device. As I recall those have metal heatspreaders. So a different chip, different packaging

No.

The whole reason for using junction temperature is that it's package-independent.

The package determines the relationship between the air/board/case temperature and the junction temperature. Junction temperature is all that matters, but you can't measure it directly, so you compute it using thermal constants from the package.

Quote from: makomk on June 11, 2012, 06:40:21 AM

, and built on a different process (65nm rather than 45nm).

Good point, but if the 45nm process really was more easily damaged by temperature you'd see that reflected in lower maximum junction temperatures in the datasheet. The fact that Xilinx didn't change them means it's unlikely there has been a major change in temperature tolerance. And we're only talking about one generation difference in process here -- it's not like 180nm vs 22nm or anything like that.

eldentyrell

donator

Activity: 980

Merit: 1004

felonious vagrancy, personified

Well, I am sitting here staring at PAR grinding slowly along. I don't know if I'll be able to stay awake until it finishes.

Assuming nothing goes wrong (big if), preview bitstreams in the morning.

Topic: Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards - page 27. (Read 119440 times)