Pages:
Author

Topic: Hacking The KNC Firmware: Overclocking - page 4. (Read 144314 times)

full member
Activity: 138
Merit: 100
sr. member
Activity: 266
Merit: 250
EDIT:
temporarily unthwarted, got a gizmo that kinda works on the Altera JTAG.
(but not with the Altera Quartus IDE yet)
Must forge some 'USB identification papers' for the thing I think.
Using openocd at the moment. (built in pun in the name!)
Also discovered that KnC made it trivial to do JTAG interfaces WITH the BBB.
('might only' require a cable from one end of PCB to other)
Thx KnCMiner!
The bar is kinda high for my skillset but I blunder on! Smiley
Prolly another instance of,
'tolip opens mouth, changes feet'

That went a clear 8 feet over my head; how can I help?
sr. member
Activity: 386
Merit: 250
My master plan has been thwarted once again Smiley

An AVR Dragon JTAG programmer can program Altera parts.
Programming a part is less complicated than debugging a running part in circuit.

It is _NOT_ recognised by the Altera software that I was going to use to try and snoop.

I threw $17.00 to the wind,
I ordered a clone of the Altera cable.
In ~20 days I'll know if that was wise or not.

The real deal 'cable' is $300 USD. What I want to try may need the real deal.

The whole snooping the FPGA is highly dependent on if they set it up that way.
I gambled $17 that early FW has access for KnC debugging.


Other clock related data.
The oscillator on the ASIC PCB that(I think) the ASIC uses as input reference for it's PLL is 25MHz.
The 28nm has 4 oscillators @ 25 or 250MHz.
I cross-referenced to 250 back when, but now have doubts seeing the 25MHz part on the 20nm PCB.

For the very bold, can try a slightly different osc.
Ideally a freq synthesizer instead of osc. to explore.
One might expect terrible results down that rabbit hole.
If it is the input clock for the ASIC, it's part of a tuned system on the ASIC.

Additionally,
I think the r/c components for the on die PLL are on the bottom of ASIC PCB.
If you examine bottom of PCB closely can see 2 very tiny parts(per die) that do not fit pattern of filter caps.
Again someone very bold can try the 'pencil trick' on the PLL resistors, if that is what they are.
Will need a microscope, VERY SHARP PENCIL, and very steady hand.
(assumes can tell which is cap and which is resistor)

The 'pencil trick' is just using graphite from a pencil to lower resistance on a SMD resistor.
You literally draw across top of resistor with pencil.
Lowering resistance of an r/c circuit speeds it up at a cost of more current.
Slight changes can be too much, it depends on design.
DIfferent hardness pencil can have different result.
It is often easily reversable with spit and finger.

Stalled clocks can be a bad thing!
I DO NOT ADVOCATE changing parts on the PCB!!!
Food for thought though.

YMMV
Smiley

EDIT:
temporarily unthwarted, got a gizmo that kinda works on the Altera JTAG.
(but not with the Altera Quartus IDE yet)
Must forge some 'USB identification papers' for the thing I think.
Using openocd at the moment. (built in pun in the name!)
Also discovered that KnC made it trivial to do JTAG interfaces WITH the BBB.
('might only' require a cable from one end of PCB to other)

Thx KnCMiner!

The bar is kinda high for my skillset but I blunder on! Smiley
Prolly another instance of,
'tolip opens mouth, changes feet'

full member
Activity: 203
Merit: 100
By the way: anyone having a dumping CGminer or FW_1.0 ?  My system is stopping and restarting one to three times daily, this I didn't have with the previous rc9.
I wonder if I'm the only one?

This is what my 'cat /var/log/monitordcdc.log' is telling (it dumped and restarted on 08:14:18 after it ran in one go from 15:38:20 the day before):

[2014-07-13 15:38:20] Restarting cgminer
[2014-07-14 08:12:47] Die 5-3 came DOWN
[2014-07-14 08:13:09] waas re-run
[2014-07-14 08:13:55] Die 5-3 came UP
[2014-07-14 08:14:17] waas re-run


Coming back on this dumping-restarting:

During the past few days that I followed Tolip's instructions (to create a build environment and play with Waas-code), I noticed that not only the Webinterface-Status-page isn't working (expected) but Monitordcdc acts differently as well: it doesn't show the logging above, but instead: has hundreds of rows with only the text "starting" ..... and the nice thing: it did not dump-autorestart anymore (which is a good thing), and everything ran fine.

This morning I had a power-down Neptune, and restarted with default environment again: guess what: I have the same dumping described above, again!  So it's Monitordcdc in release-1.0 giving me the trouble (not present in rc9).

Does anyone have an idea, why the other build-environment changes this behavior?




Another clue emerges!

With the build environ sleepers keep sleeping and monitordcdc ignores them.

I had about 40 hours of uninterrupted goodness from cgminer.
I did find 2 VRM sleeping today and did the stop, waas, restart thing.
cgminer was hashing away and log file nuthin but 'Start'

monitordcdc restarting cgminer is preferred to sleeping die requiring manual intervention.
Provided it brings them all back into production.

YMMV
Smiley

hmmm... but I didn't have any sleepers, just a lot of "Start" lines in the Log....

Do we have the source code of Monitordcdc ?  (I cannot find it in the git).
Edit: Oh... sorry... found it!

sr. member
Activity: 386
Merit: 250
By the way: anyone having a dumping CGminer or FW_1.0 ?  My system is stopping and restarting one to three times daily, this I didn't have with the previous rc9.
I wonder if I'm the only one?

This is what my 'cat /var/log/monitordcdc.log' is telling (it dumped and restarted on 08:14:18 after it ran in one go from 15:38:20 the day before):

[2014-07-13 15:38:20] Restarting cgminer
[2014-07-14 08:12:47] Die 5-3 came DOWN
[2014-07-14 08:13:09] waas re-run
[2014-07-14 08:13:55] Die 5-3 came UP
[2014-07-14 08:14:17] waas re-run


Coming back on this dumping-restarting:

During the past few days that I followed Tolip's instructions (to create a build environment and play with Waas-code), I noticed that not only the Webinterface-Status-page isn't working (expected) but Monitordcdc acts differently as well: it doesn't show the logging above, but instead: has hundreds of rows with only the text "starting" ..... and the nice thing: it did not dump-autorestart anymore (which is a good thing), and everything ran fine.

This morning I had a power-down Neptune, and restarted with default environment again: guess what: I have the same dumping described above, again!  So it's Monitordcdc in release-1.0 giving me the trouble (not present in rc9).

Does anyone have an idea, why the other build-environment changes this behavior?




Another clue emerges!

With the build environ sleepers keep sleeping and monitordcdc ignores them.

I had about 40 hours of uninterrupted goodness from cgminer.
I did find 2 VRM sleeping today and did the stop, waas, restart thing.
cgminer was hashing away and log file nuthin but 'Start'

monitordcdc restarting cgminer is preferred to sleeping die requiring manual intervention.
Provided it brings them all back into production.

YMMV
Smiley
full member
Activity: 203
Merit: 100
By the way: anyone having a dumping CGminer or FW_1.0 ?  My system is stopping and restarting one to three times daily, this I didn't have with the previous rc9.
I wonder if I'm the only one?

This is what my 'cat /var/log/monitordcdc.log' is telling (it dumped and restarted on 08:14:18 after it ran in one go from 15:38:20 the day before):

[2014-07-13 15:38:20] Restarting cgminer
[2014-07-14 08:12:47] Die 5-3 came DOWN
[2014-07-14 08:13:09] waas re-run
[2014-07-14 08:13:55] Die 5-3 came UP
[2014-07-14 08:14:17] waas re-run


Coming back on this dumping-restarting:

During the past few days that I followed Tolip's instructions (to create a build environment and play with Waas-code), I noticed that not only the Webinterface-Status-page isn't working (expected) but Monitordcdc acts differently as well: it doesn't show the logging above, but instead: has hundreds of rows with only the text "starting" ..... and the nice thing: it did not dump-autorestart anymore (which is a good thing), and everything ran fine.

This morning I had a power-down Neptune, and restarted with default environment again: guess what: I have the same dumping described above, again!  So it's Monitordcdc in release-1.0 giving me the trouble (not present in rc9).

Does anyone have an idea, why the other build-environment changes this behavior?


sr. member
Activity: 386
Merit: 250

No help from KnC as yet other than a link to the trouble shooting guide Kurt posted on their forum.

Even when I have it up and running properly and with halfway decent cooling I'm not sure what help I could offer?

u27

I have never dealt with a sleeping die. <-(HUGE DISCLAIMER)

If it is VRM related,
AND
the VRM can be awakened,

This might work
------------------------
Stop cgminer.

'/etc/init.d/cgminer.sh stop'

WAIT 20 seconds.

'waas'

WAIT 10-20 seconds AFTER it finishes

'/etc/init.d/cgminer.sh restart'

Give cgminer a minute to get going,
then look for results.

'waas -g all-asic-info | grep I'
--------------------------------------------
look for 2 low current VRM at sleeping die position in resulting list.
It might take 2-3 tries to get all VRM awake.
Supposedly there is a way that the VRM sleeps that needs power cycle to cure.
If this is true and it applies to you get an RMA ASAP!!!
I have never seen it, I have not power cycled in days.
I regularly sleep and awaken VRM with method above.

The above procedure is also needed if you apply changes to freq from Advanced page with FW 1.0.
It often results in sleeping VRM.

As usual
YMMV
Smiley
sr. member
Activity: 266
Merit: 250
If you are just sitting waiting for others to do it,
assume it will not happen.

YMMV
Smiley

All sounds good, but I haven't even got cooling sorted on my Neptune yet... Can't really open my boxes up because it's still running with a dead die and I don't want to give KnC any excuses not to fix it.

No help from KnC as yet other than a link to the trouble shooting guide Kurt posted on their forum.

Even when I have it up and running properly and with halfway decent cooling I'm not sure what help I could offer?

u27
sr. member
Activity: 386
Merit: 250
So long story short; no overclocking?

ok

nope, no OC by me yet.

If one of my JTAG interfaces is compatable with Altera parts,,,
I'm going to try snooping the FPGA memory thru the JTAG interface.
Low probability of success but interesting.

There may be a way to get past the freq filtr in the FPGA with a raw ASIC request also.
Possibly with a corrctly crafted 'i2cset' command.

If you are just sitting waiting for others to do it,
assume it will not happen.

YMMV
Smiley

EDIT
MY AVR Dragon (for OC of BFL) and Altera Byte Blaster are compatable with each other.
All I need to do is solder on a 2x5 connector to controller (Jp6).
Here is to hoping the JTAG is still enabled on FW 1.0.
Would not be my first dead end! Wink
sr. member
Activity: 266
Merit: 250
What is the viscosity of that stuff?

Very low, thinner than water I would say.

The stuff I posted earlier was a bit too optimistic.
(tolip opens mouth changes feet)
A part of the puzzle is complete but,,,
The FPGA quietly filters requests other than ones matching the stock Advanced page speeds.
It rounds down to next available freq.
That is a cool feature,
errors, (like the ones I introduced)
are fixed on the fly at next lowest speed.
It keeps kids like me hashin.
YMMV
Smiley

So long story short; no overclocking?

ok
sr. member
Activity: 386
Merit: 250

EDIT:
Other VRM's eventually get much hotter and some clocks reduced to keep a handle on VRM temps; but the board in Novec can just be adjusted for max speed up to 51A(ish):


He needs to get the clocks running faster but neither he or I have any idea where to start; can we get a noobs guide to 525Mhz?

u27

What is the viscosity of that stuff?

The stuff I posted earlier was a bit too optimistic.
(tolip opens mouth changes feet)
A part of the puzzle is complete but,,,

The FPGA quietly filters requests other than ones matching the stock Advanced page speeds.
It rounds down to next available freq.

That is a cool feature,
errors, (like the ones I introduced)
are fixed on the fly at next lowest speed.
It keeps kids like me hashin.

YMMV
Smiley
sr. member
Activity: 266
Merit: 250
Yes but.. Smiley Surface area is still relevent; greater surface area could boil (or convect) more fluid if there is enough heat to dissipate.

This stuff is so efficient that surface area would not come into play unless you're wanting to push something like (wildly guessing here) 1kW through a 25x25mm chip.

Is that the stuff in the heat pipes on heatsinks?

Similar in that it's a phase change system, yea; not sure on the specifics.

Does it require pressure lid or just cooling to recapture?

This is known as open bath so in theory no lid is required (but in practice an almost air tight lid is advisable); where water is 1kg per 1L Novec 7100 is about 1.35kg for 1L and the same goes for the gas once it evaporates it is much heavier than air. Once the layer of HFE gas is thick enough to reach the cooling it condenses and drips back down to replenish the bath.

Would be expensive fun to watch that stuff in traditional water cooling system.
Potential catastrophy and all.
More pics please please please Smiley
YMMV
Smiley

His rig has seen some changes since, but I took these when I first got eyes on his test setup. Not really showing anything different to smracer video:

http://videobam.com/MduAt













EDIT:
Other VRM's eventually get much hotter and some clocks reduced to keep a handle on VRM temps; but the board in Novec can just be adjusted for max speed up to 51A(ish):


He needs to get the clocks running faster but neither he or I have any idea where to start; can we get a noobs guide to 525Mhz?

u27
member
Activity: 67
Merit: 10
and you guys didn't believe me Tongue said it cpl times in here Smiley hehe

I did! Just confirming your findings match my friend's results.

@ user27
Is the warmer ASIC with the cooler VRM the submerged one?
If so a small heatsink on the ASIC also submerged might help lower current.

That's not how evap cooling works; the bath is at 60 degrees and as soon as anything gets above that the liquid boils carrying the heat away. Heat sinks go at the other end of the system to cool the water in the loop.

u27

Yes but.. Smiley Surface area is still relevent; greater surface area could boil (or convect) more fluid if there is enough heat to dissipate.

Regardless... Awesome work on this, very cool!
sr. member
Activity: 386
Merit: 250

Some people mentioned swapping out the VRM to gain 20A per die.
Adding a single 40A per die would require 1/2 the total VRM and provide twice the gain 40A vs 20A.
$500 vs $1000 in VRM for whole Neptune.
Labor is similar for both options.
A reflow operator likely has both skilsets already.

Not as crazy as it sounds if you ignore huge cost.
Almost as easy to stack parts with traditional solder skills as replace huge SMD power components.
VRM addresses can go to 12 manually, they can be in gang up to 7 units per die.
The code already handles slaves, adding additional slave not too bad.

The only downside is then you won't get to sell
'pampered, only hashed on cool Sundays, used 40A VRM'
on ebay.

YMMV
Smiley
sr. member
Activity: 386
Merit: 250
and you guys didn't believe me Tongue said it cpl times in here Smiley hehe

I did! Just confirming your findings match my friend's results.

@ user27
Is the warmer ASIC with the cooler VRM the submerged one?
If so a small heatsink on the ASIC also submerged might help lower current.

That's not how evap cooling works; the bath is at 60 degrees and as soon as anything gets above that the liquid boils carrying the heat away. Heat sinks go at the other end of the system to cool the water in the loop.

u27

Thx, I'm now slightly less ignorant on the subject.
Had I engaged brain, answer is in data.

Is that the stuff in the heat pipes on heatsinks?

Does it require pressure lid or just cooling to recapture?

Would be expensive fun to watch that stuff in traditional water cooling system.
Potential catastrophy and all.

More pics please please please Smiley

YMMV
Smiley
sr. member
Activity: 266
Merit: 250
and you guys didn't believe me Tongue said it cpl times in here Smiley hehe

I did! Just confirming your findings match my friend's results.

@ user27
Is the warmer ASIC with the cooler VRM the submerged one?
If so a small heatsink on the ASIC also submerged might help lower current.

That's not how evap cooling works; the bath is at 60 degrees and as soon as anything gets above that the liquid boils carrying the heat away. Heat sinks go at the other end of the system to cool the water in the loop.

u27
sr. member
Activity: 386
Merit: 250
I'm working on cooling, but a friend beat me to it with the Novec 7100... Screenshot below showing comparison between stock cooling and evap. cooling/

He say's VRM's shut down from time to time at the voltage shown so confirmed 51A is the limit regardless of temps the OCP cuts in; he is running at 500 with one voltage setting above stock now without any problems.



and you guys didn't believe me Tongue said it cpl times in here Smiley hehe

@ user27
Is the warmer ASIC with the cooler VRM the submerged one?
If so a small heatsink on the ASIC also submerged might help lower current.

@crashoveride
I agree's with ya.

It's why I adjusted volts in both directions to determine lowest current with my setup.
Clicking on Apply changes on advanced page does not typically sleep VRM.
It's a low impact test.

YMMV
Smiley
hero member
Activity: 784
Merit: 504
Dream become broken often
I'm working on cooling, but a friend beat me to it with the Novec 7100... Screenshot below showing comparison between stock cooling and evap. cooling/

He say's VRM's shut down from time to time at the voltage shown so confirmed 51A is the limit regardless of temps the OCP cuts in; he is running at 500 with one voltage setting above stock now without any problems.



and you guys didn't believe me Tongue said it cpl times in here Smiley hehe
sr. member
Activity: 266
Merit: 250
I'm working on cooling, but a friend beat me to it with the Novec 7100... Screenshot below showing comparison between stock cooling and evap. cooling/

He say's VRM's shut down from time to time at the voltage shown so confirmed 51A is the limit regardless of temps the OCP cuts in; he is running at 500 with one voltage setting above stock now without any problems.

sr. member
Activity: 386
Merit: 250

This is interesting if it actually works, you have always been able to set whatever you want but waas itself is closed source in the Nov Jupiters, while you can pass it anything you like it will not actually honor the settings.. I can't imagine they open sourced the code for waas itself for the Neptune but if they did then it's an easy fix..

On my firmware I need to set things manually by flipping bits because it's out of the range of waas, I actually intercept the speed (if anyone has looked at the code) and if waas can handle it I let it, anything over the Octover and November waas ranges automatically gets set by flipping bits..

In any case if they now allow waas to do it directly it will be much easier..

hno (KnC Engineer) informed me recently that the FPGA rounds down incorrect clocks to next available.
FPGA code 'spimux.rbf' is the file it's in.

FPGA and ASIC are black box to me once you get on chip.

The 'power of 2 clocks' above would have been rounded down to next valid clock.
Drat

KnC has the keys in the FPGA code.

Oh well. still things to test.

There is a program called 'asic' that allows command line setting of various items including freq.

YMMV
Smiley
Pages:
Jump to: