Pages:
Author

Topic: Hacking The KNC Firmware: Overclocking - page 8. (Read 144314 times)

sr. member
Activity: 386
Merit: 250

Wow.. I'm inspired to pull out my soldering iron! I'm definitely going to implement the terminal blocks, I've got way too much heat coming off of my power leads, even with a box fan. It's just asking for trouble.

Really great job!

I recommend leaving the WBs facing out. If you do get a failure, facing out should reduce the chance of fluid reaching the other boards.

The blocks I used are (RELATIVELY) inexpensive GROUND strips.
Notice the hazardous wood and tywraps insulating it? I have a cover too.
Actually with formed 12AWG wire I'll have trouble perma-mounting except exactly there.
They come in different lengths depending on power panel size.
One could be cut for 12V and ground for 1/2 price.
Keep in mind they are NOT copper.
I do not put all PSU at one end and all ASIC at other.
I have 4 12V wires per PSU, one from each PSU together every 4 holes.
(abandoned plan for 8kW rail)


Regarding solder and ASIC PCB power pins:
Go big(with iron wattage and iron tip) or stay home.

Although my fancy dancy 18AWG to each VRM was planned exactly as pictured,
it was not scheduled for when it was installed.
And I brought big iron to the game, plus preheat PCB.

One of my PCIe connector 12V pins was not soldered well originally. fact.
(many in fact did not flow to top of PCB along pin, do they scrimp on Silver too?)
I in egotistical self defense might claim the hole was full of slag and wouldn't clean up.conjecture
Everyone else could safely assume the 12V pins might require caution and TLC.
I still had connection but no trust that is was not compromised, creating what I sought to avoid HEAT.
If I had a do-over I would have cut or melted the plastic out of the way to preserve the pins and vias.
Or left it alone to begin with.
The ground pins are quite sturdy even with 1/2 filled holes.

A third decent option exists for applying 12V to PCB.
Between VRM PCB and outer caps on ASIC PCB is 12V. Inside of cap from edge of ASIC PCB.
(exception C26 & C27 hot is toward edge of ASIC PCB)
It looks like in all instances 12V is the side opposite the silkscreen designators.
It's a tight space but could get by with lower wattage iron if you just attach small wires to inside of some 12v caps.
If you have enough heat to wick em clean on one side you have good sturdy corner to put wire and resolder caps.
Only clean one side of caps and caps will provide mount point. dirty would also work if ya have no wick.
Reflow one cap at a time and neighbors can provide oops support.
Would not need to have wire extend past more than 2-3 caps.
No idea what signals are on VRM PCB in that area so I'd avoid getting too close.
2 18AWG PCIe cables cut and sacrificed gives ya 6, could skip the ones by connector and plug in more power(or not).
Could probably skip and reasonably backfeed 1/2 the VRMs from neighbors.
A single 18AWG  PCIe(sacrificed) to the 3 farthest VRMs would help a LOT.
Some 28nm folks might already have unmelted blue extenders just begging to be cut.

I'm anal about potential points of failure that create lots of HEAT when they occur unattended.
Add crimps and consider 2 ends of every wire on modular PSU and it becomes a puckerful motivation for me.
30+ potential points of failure per ASIC (PPOFPA)
I'll never use a modular cable PSU with high current again if possible.


RE: fluid on PCB
tested
Takes a couple hours to reawaken some VRMs
They neighbors bake-em-awake.
Choose fluid and additives wisely!
If it leaks after construction in a static state I got other issues though inside or outside.

Life is full of options!

YMMV
Smiley
sr. member
Activity: 386
Merit: 250

Draw a matrix with 5 rows and 4 columns on paper, label the 5 rows for each of your numbered ASICs minus one. I had to remove the display panel so my ASICs on the advanced tab were 1, 2, 3, 5 and 6.. (port 4 doesn't work on this card) so I labeled my rows 0, 1, 2, 4 and 5. Label the columns 0, 1, 2 and 3.


I have done similar with Oct ASIC

So much in fact that I slightly automated it (still partly manual but @ kbd instead of pencil)

Preface:
DANGER WILL ROBINSON
The /config folder can save things past reboot.
Just like a normal disk drive.
It is NOT a disk drive and wears out faster.
If you log on Jup or Nep can preserve 'disk' life by using /tmp
It's RAM though, only preserves data if you copy to /config before reboot.
Only suitable for need it now kind of logging but saves lots of write cycles on 'disk'.
Advice is not don't do it, it is don't do it always!
ecaferP

Here is another
What I did
What I wish I had done and why story

I used 'tee' to run SSH to save all output into a file for x minutes.
then I used 'sed' and 'grep' to squink out the details I wanted.
My file required cleanup before it was close to similar to screen.
(due to unseen cursor control chrs being also captured with 'tee' method)

I should have used the logging feature of miner prog instead.

Either way at some point you get a file that looks similar to scrolling data from miner prog.

then I use 'grep' to seperate out only lines with " KnC: " and save them to a new file.
(the errors)

then play

can 'sort' on different fields in each line to see them in core# order (and save to another file)
A glance at that indicates frequency, some core have less entries.

can 'sort' by 2 fields, core then time, etc.
Can count lines with specific core or die or whatever.

can clean up lines with 'sed'.
can switch spaces to comma (with 'sed') and then import CSV into spreadsheet.

If you save to unique self explanitory filename at each step can rebranch from anywhere in chain for new details Smiley

Not easier just different.
Save a tree, waste 10 hours learning what you may never use again Wink

I still grab a pen and paper and move the kbd.
It's why I always find envelopes with wierd numbers and hieroglyphs on them.
I musta been outta envelopes.

YMMV
Smiley
member
Activity: 67
Merit: 10
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley

nice soldering job n wire management Smiley yep no burnt pcie connectors for you...now ya just gotta figure out how to oc above 500mhz n yer golden...shouldn't take you too long thou Smiley

Thx, It works well.

The other end of the 12AWG wires

http://i.imgur.com/epKz23R.jpg

2kW x 2 load sharing on a single rail.
Runs on 1 in a pinch but at the limit and down to only 89% efficiency.
I got space and more PSU for other Nep but will replicate 2kW x 2.
8kW single rail and I don't need to meet personally at my home.

It has a jacuzzi. (top center)
http://i.imgur.com/jeIRRcw.jpg

Also note (lower right) easy access for small children to experience 120VAC firsthand.
Not TOO easy though, short ones must use available step stool.

And a heated outdoor patio. (or garage)
http://i.imgur.com/itLOTSL.jpg

The business end of the loop running during construction.
http://i.imgur.com/r2Qjrig.jpg

I constructed a pentagon and recited the verse and all that appeared was an empty V8 bottle.
http://i.imgur.com/9T8lRc2.jpg

I might flip the PCB so WB are inside of pentagon with straight barbs for smaller footprint.

I usually refrain from prototype pics, but this prototype is less ugly than most of my monstrosities.
I hope to approach the 'pair of WC Saturns complete with PSUs in one case' level of fit and finish.

YMMV
Smiley

Wow.. I'm inspired to pull out my soldering iron! I'm definitely going to implement the terminal blocks, I've got way too much heat coming off of my power leads, even with a box fan. It's just asking for trouble.

Really great job!

I recommend leaving the WBs facing out. If you do get a failure, facing out should reduce the chance of fluid reaching the other boards.
member
Activity: 67
Merit: 10
Well after two very frustrating weeks of tinkering and working with tech support, today I sent back one malfunctioning Neptune controller card and two non-hashing boxes to KNC. I've repurposed one Jupiter controller card to run the 3 orphaned Neptune boxes and was able to recover some of my hashing power. Sad

Thank you to those that posted about being able to flash the Neptune image onto the Jupiter board!

In the interim I've been tinkering with the power settings on my "good" Neptune to try and optimize power, reduce VRM temps, and reduce HW errors. It's more difficult and time consuming without BFM Miner, but here's what I did as a guide for your own tinkering pleasure:

Draw a matrix with 5 rows and 4 columns on paper, label the 5 rows for each of your numbered ASICs minus one. I had to remove the display panel so my ASICs on the advanced tab were 1, 2, 3, 5 and 6.. (port 4 doesn't work on this card) so I labeled my rows 0, 1, 2, 4 and 5. Label the columns 0, 1, 2 and 3.

Set all of your voltages down one notch to -0.0439 and apply. Or lower if you are feeling lucky. Smiley

Use putty to login and bring up the CGMiner screen. I recommend maximizing the screen to show as many rows as you can. Restart CGMiner from the terminal screen. Now, watch for about 8-10 minutes for cores to be disabled due to HW errors. Don't rush this part, take at least 8-10 minutes. When you see a core disabled it will show in the format:

asic.die.core

So: 0.2.36 means ASIC 1, die 3, core 37. Put one hash mark in the matrix for every error you see.

After about 10 minutes some of your boxes will still be empty, most will have a few hashes, and a few will probably have a lot of hashes. Increase the voltage on any die that has 4 or more cores that have shut off due to HW errors. The dies tie directly to the VRM pairs on the advanced page. If you get a lot of errors for 0.2.xxx then increase the voltage on ASIC 1, die 3.

Also, keep an eye on the advanced tab while you do this. If a die shuts off, increase the voltage one increment and apply the settings to reinitialize/restart that die.

I was able to lower voltages on 5 dies. I actually had to increase voltages above -0.0366 on 6 dies. I was able to reduce HW errors by about 60%. Also, just about every die that is running at -0.0439 is also running at 500MHz. Smiley

My advanced page shows power consumption between 1470W and 1475W. My packages were thrown about during shipment and some of the metal boxes have dents... as a result I have several unhappy VRMs near 100C and needed to drop some dies as low as 425MHz to keep the VRMs at 90 or less until I can better tend to the cooling needs.

Despite the setbacks I'm currently getting 3.3-3.4Th/s at the pool with my "good" Neptune.

newbie
Activity: 35
Merit: 0
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley

nice soldering job n wire management Smiley yep no burnt pcie connectors for you...now ya just gotta figure out how to oc above 500mhz n yer golden...shouldn't take you too long thou Smiley

Thx, It works well.

The other end of the 12AWG wires

http://i.imgur.com/epKz23R.jpg

2kW x 2 load sharing on a single rail.
Runs on 1 in a pinch but at the limit and down to only 89% efficiency.
I got space and more PSU for other Nep but will replicate 2kW x 2.
8kW single rail and I don't need to meet personally at my home.

It has a jacuzzi. (top center)
http://i.imgur.com/jeIRRcw.jpg

Also note (lower right) easy access for small children to experience 120VAC firsthand.
Not TOO easy though, short ones must use available step stool.

And a heated outdoor patio. (or garage)
http://i.imgur.com/itLOTSL.jpg

The business end of the loop running during construction.
http://i.imgur.com/r2Qjrig.jpg

I constructed a pentagon and recited the verse and all that appeared was an empty V8 bottle.
http://i.imgur.com/9T8lRc2.jpg

I might flip the PCB so WB are inside of pentagon with straight barbs for smaller footprint.

I usually refrain from prototype pics, but this prototype is less ugly than most of my monstrosities.
I hope to approach the 'pair of WC Saturns complete with PSUs in one case' level of fit and finish.

YMMV
Smiley

are you using general PC water cooling system or GPU one? or something else? I am interested to do this as well.
full member
Activity: 203
Merit: 100
sr. member
Activity: 386
Merit: 250
Only one comment: if we cannot OC it to 550 Mhz or higher, we cannot earn back the costs for trying (added hardware costs should be earned back by the gained OC hashes)....  But, we have a nice hobby anyway  Wink


From what hno(KnCMiner) and Luke-Jr(bfgminer author) were discussing last week in IRC,
relating to driver for bfgminer on Neptune hardware,
the required details are in the published open source cgminer Neptune driver code.
Lots has changed (my interpretation of KnC statement)
Luke-Jr implimented many core details for Oct and Nov product and seemed to already understand the impact.
I got the impression he has much of the foundation already in place.
I rely on bfgminer to maximize output on 28nm boxes.
HW error down to the core level is essential for hunting down last few %.

I'll point out Luke-Jr did not indicate he would do it or when.
Evidently Luke-Jr had already inquired with KnC and got a non answer answer regarding whom to best discuss with.
That question got answered, no idea what Luke-Jr's future plans or timeline are.
A test/development platform may have been discussed Wink
I recall compatability of a Nep cube with existing BBB in Luke-Jr's posession was discussed.
One can hope and politely show interest to both parties.
Luke-Jr had a throw BTC at him option for priority adjustment, he may still, if anyone is so inclined.

also this tidbit

"cgminer 4.4.1-knc3.4. Major CPU performance improvement ........"
From 7 days ago.
rc10 source is sitting ready to compile on git for the adventurer.

https://github.com/KnCMiner/meta-kncminer/tree/neptune/recipes-kncminer

Having an IRC client logging #kncminer (for personal use only) is often helpful.
A good second source for solutions to KnC related questions.
I happened to catch the above exchange live.

YMMV
Smiley
full member
Activity: 203
Merit: 100
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley

nice soldering job n wire management Smiley yep no burnt pcie connectors for you...now ya just gotta figure out how to oc above 500mhz n yer golden...shouldn't take you too long thou Smiley

Thx, It works well.

The other end of the 12AWG wires

http://i.imgur.com/epKz23R.jpg

2kW x 2 load sharing on a single rail.
Runs on 1 in a pinch but at the limit and down to only 89% efficiency.
I got space and more PSU for other Nep but will replicate 2kW x 2.
8kW single rail and I don't need to meet personally at my home.

It has a jacuzzi. (top center)
http://i.imgur.com/jeIRRcw.jpg

Also note (lower right) easy access for small children to experience 120VAC firsthand.
Not TOO easy though, short ones must use available step stool.

And a heated outdoor patio. (or garage)
http://i.imgur.com/itLOTSL.jpg

The business end of the loop running during construction.
http://i.imgur.com/r2Qjrig.jpg

I constructed a pentagon and recited the verse and all that appeared was an empty V8 bottle.
http://i.imgur.com/9T8lRc2.jpg

I might flip the PCB so WB are inside of pentagon with straight barbs for smaller footprint.

I usually refrain from prototype pics, but this prototype is less ugly than most of my monstrosities.
I hope to approach the 'pair of WC Saturns complete with PSUs in one case' level of fit and finish.

YMMV
Smiley

Wow !  that's professional league...

Only one comment: if we cannot OC it to 550 Mhz or higher, we cannot earn back the costs for trying (added hardware costs should be earned back by the gained OC hashes)....  But, we have a nice hobby anyway  Wink
sr. member
Activity: 386
Merit: 250
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley

nice soldering job n wire management Smiley yep no burnt pcie connectors for you...now ya just gotta figure out how to oc above 500mhz n yer golden...shouldn't take you too long thou Smiley

Thx, It works well.

The other end of the 12AWG wires

http://i.imgur.com/epKz23R.jpg

2kW x 2 load sharing on a single rail.
Runs on 1 in a pinch but at the limit and down to only 89% efficiency.
I got space and more PSU for other Nep but will replicate 2kW x 2.
8kW single rail and I don't need to meet personally at my home.

It has a jacuzzi. (top center)
http://i.imgur.com/jeIRRcw.jpg

Also note (lower right) easy access for small children to experience 120VAC firsthand.
Not TOO easy though, short ones must use available step stool.

And a heated outdoor patio. (or garage)
http://i.imgur.com/itLOTSL.jpg

The business end of the loop running during construction.
http://i.imgur.com/r2Qjrig.jpg

I constructed a pentagon and recited the verse and all that appeared was an empty V8 bottle.
http://i.imgur.com/9T8lRc2.jpg

I might flip the PCB so WB are inside of pentagon with straight barbs for smaller footprint.

I usually refrain from prototype pics, but this prototype is less ugly than most of my monstrosities.
I hope to approach the 'pair of WC Saturns complete with PSUs in one case' level of fit and finish.

YMMV
Smiley
hero member
Activity: 784
Merit: 504
Dream become broken often
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley

nice soldering job n wire management Smiley yep no burnt pcie connectors for you...now ya just gotta figure out how to oc above 500mhz n yer golden...shouldn't take you too long thou Smiley
sr. member
Activity: 386
Merit: 250
Elenelen

The pic I promised.

Beware NSFW!
Shows VRMs running nekkid in the breeze while quaffing bootleg current at an open bar Wink

http://i.imgur.com/LhXa182.jpg

YMMV
Smiley
hero member
Activity: 784
Merit: 504
Dream become broken often
I am very closely watching this thread and your posts as well.

If you had a thermal camera you could understand what I meant.

Check again the temperatures of both the heatsink and the VRM of the problematic ones.
If you see huge variation, then again is a BAD contact.


Are you sure you are checking the right VRM's?

I do not know about the Neptune boards, but the Jup have a strange numeration.


Check here:




which would then explain why the front 4 would be the cooler ones and the back 4 be the hotter ones Smiley

tzortz, you by chance wouldn't happen to know the numbers for the oct. model would ya? and the 8vrm one?
sr. member
Activity: 386
Merit: 250
I am sorry I do not own any Neptune, so I cannot check it.

But yes, there is one way you can check it yourself.
Disable mining (like put a false pool url).

Take a soldering unit and heat each VRM separately. When you notice a heat increase in the Advanced tab, then you know which one it is.

Interesting suggestion.  I might try that... (I don't need to disable mining...I just can heat it with, as I have more than enough upside-room)

You might also try lower clock on a die to 50 and use fingerdetector for cooler VRM's.
They cool very fast but might not with massive HSs.
Can determine pairs but not individuals this way.
#1 is definately the end of the line based on foam dents alone.

It may be 12V power runs on the PCB getting hotter as they are longer on that side.
I doubt they go under the ASIC.
Thermal image might show details.
I have a PCB with 14AWG right to each VRM. (pics soon)
If it lacks the symptoms when I finish with VRM cooling solution it will add data.

Regardless of cause you might also try one of the foam/clay doodads on it's side along that edge of PCB bottom side and transfer some heat to the case. Forgot how high your lift kit is, might not work but an easy test.
I glued heatsinks to the bottom of a 28nm PCB. it helps a bit.

YMMV
Smiley

EDIT (forgot to add)
Your mods ROCK!  Well done!
Smiley
full member
Activity: 203
Merit: 100
IS SAFE TO HAVE ONE 16 AWG PER MODULE
OR BETTER HAVE 2 CABLES PER ASIC?

My advice: don't try 500 Mhz.  And use a fan to cool your cables and plugs (also the plugs on your PSU, if you don't use 2 cables per ASIC).
legendary
Activity: 2408
Merit: 1004
IS SAFE TO HAVE ONE 16 AWG PER MODULE
OR BETTER HAVE 2 CABLES PER ASIC?
full member
Activity: 203
Merit: 100
I am sorry I do not own any Neptune, so I cannot check it.

But yes, there is one way you can check it yourself.
Disable mining (like put a false pool url).

Take a soldering unit and heat each VRM separately. When you notice a heat increase in the Advanced tab, then you know which one it is.

Interesting suggestion.  I might try that... (I don't need to disable mining...I just can heat it with, as I have more than enough upside-room)
hero member
Activity: 728
Merit: 500
I am sorry I do not own any Neptune, so I cannot check it.

But yes, there is one way you can check it yourself.
Disable mining (like put a false pool url).

Take a soldering unit and heat each VRM separately. When you notice a heat increase in the Advanced tab, then you know which one it is.



full member
Activity: 203
Merit: 100
I am very closely watching this thread and your posts as well.

If you had a thermal camera you could understand what I meant.

Check again the temperatures of both the heatsink and the VRM of the problematics.
If you see huge variation, then again is a BAD contact.

I sadly have no thermal camera (I do have a Laser meter, and will use it tomorrow when one of the covers is off).  But I'm very very sure that this has nothing to do with the terminal-pads nor the heatsinks themselves.   It might be the airflow from the main-fan (twisted to the leftside)... but even that I cannot explain...

Oh... I just saw that you edited a photo to your last post....  No I'm not sure actually... the only thing I know for sure is that #1 (or zero if you start counting with zero instead) is on the top right side.... I more or less assumed that the numbering is from there round to the other side (clock wise).
Do you have a way to be sure how Neptune is numbered ?
hero member
Activity: 728
Merit: 500
I am very closely watching this thread and your posts as well.

If you had a thermal camera you could understand what I meant.

Check again the temperatures of both the heatsink and the VRM of the problematic ones.
If you see huge variation, then again is a BAD contact.


Are you sure you are checking the right VRM's?

I do not know about the Neptune boards, but the Jup have a strange numeration.


Check here:










full member
Activity: 203
Merit: 100
I believe is the difference in your construction appliance.

If you had a thermal camera, you could see it.

Or measure (beam measure?) the heatsink and the vrm temps of all of them.
If you see great difference between vrm/heatsink , then there is bad thermal contact.

I am sick of KNC, need so much babysitting.


Have you watched my photo's ?   all VRM's are more or less modified with Heatsinks in the same way. especially #4 and 5 are just infront of the Fan.
And NO there is no bad terminal contact: that would be very odd to have 4 x 5boxes = 20 VRM's all on the same PCB-side, having bad terminal contacts and the other 20 on the left side not... so that's not a logical explanation.



I know this photo is from one of the boxes which has a few more heatsink-fins on VRM-5 than VRM-4, but with my other 4 boxes, both heatsink sizes are the very same, and all those boxes have the same problem: all four VRM's on the right-side are 10 degrees hotter than the left-side.... I have no logical explanation...
Pages:
Jump to: