Pages:
Author

Topic: Hacking The KNC Firmware: Overclocking - page 56. (Read 144343 times)

hero member
Activity: 812
Merit: 502
December 27, 2013, 10:03:37 AM
I just wanted to report that my Oct Jupiter has been running at ~950GH/s (6-modules) and pulling an between 57-60 amps per VRM for about a week now. Temps range between 57-78.5 degrees Celsius. So far so good!

You are a brave man for pulling much more than the recommended max safe current  Shocked

You use 211 I assume?
full member
Activity: 226
Merit: 100
December 27, 2013, 08:43:56 AM
I just wanted to report that my Oct Jupiter has been running at ~950GH/s (6-modules) and pulling between 57-60 amps per VRM for about a week now. Temps range between 57-78.5 degrees Celsius. So far so good!
legendary
Activity: 1260
Merit: 1008
December 26, 2013, 03:49:43 PM

from my limited experience I think  that both points apply to everyone. For what is worth I've been albe to verify only the second one: every time I overclock the miner a good number of cores will be disabled during the first two minutes of the new cgminer session. Usually they're concentrated in one or two die.
In my case dies are being disabled (1 die = 48 cores). If just cores are disabled then that is easy to fix - just apply more voltage to stabilize them.

ok got it

On the other hand I've never tried
to verify the first point, mainly because I don't know how to do it. How do you know for sure that the clock has been reset, do you look at the Amps or is there anyway to read the value of a PLL registry? 

Yes, voltage drops. For example you apply the overclock and you see a VRM working at 54A. You change the voltage setting just 1-2 values lower and after you hit apply the Amps drop from 54 to 44, which means the higher overclock frequency is no longer applied.

ok.

next time I've to do it I will pay attention to Amps values just after changing the voltage. in the meantime I'll try also to find a way to read the PLL register that contain clock settings.

Instead of increasing the voltage to the maximum value, I just set it a little bit higher take into accounts on how many
cores are disabled in a particular die. the I restart cgminer and wait 1 minute to see if changes make any difference.

I use this approch because I don't want to cook my asics/vrms.

I've tried that, but it doesn't work as effective as applying max voltage. With max voltage the sleepy dies usually kick in immediately.
Also I have a 2nd theory, which I haven't tested a lot, but if you supply sufficient voltage to a sleepy die it might awake after 2-3-4-5-6 hours. But I don't think there is any consistency in results with this method and I prefer to wake them up immediately with stress/shock voltage that waiting hours for them to wake up naturally, which is not guaranteed to happen.

since any other methods you have tried have failed just give this theory of yours a try, no? 

I've also increased the SPI frequency because of what 'orama said in one of hist last post said:
How much? I played with this and I usually stick to 256000Mhz. I tried even more, but I can't see any correlation between this and any results.

actually I'm OC quite slightly, using 1F1, and I'm using as SPI freq 299707 and prev I 've used 256000, I haven't seen now difference, though.

Another think I do is taking not of all the changes I apply along the way (a goodthing is coping /config/adavanced.conf at differnet moment in time)   

To check the distribution of disabled cores I use a modified version of a pl script included in bertmod. It is an ASCII version, it only outputs temps and disabled core per die, e.g.




How exactly did you do that?

bertmode 0.2.X bin file contained a perl script, asic_status.pl, used to generate a modified version of the status page. A btctalk user (don't rember the name sorry) have changed it to just puts on standard output a few info.
 
Quote
You created an additional page within the lighttpd server?

no I've just placed the modifed script into /config (just to make persistent across reboost), to use it I just log in using ssh and execute from the command line:

Code:
perl /config/asice_status_ascii.pl 

if you find it useful I could upload the file somewhere...


hero member
Activity: 812
Merit: 502
December 26, 2013, 01:31:32 PM

from my limited experience I think  that both points apply to everyone. For what is worth I've been albe to verify only the second one: every time I overclock the miner a good number of cores will be disabled during the first two minutes of the new cgminer session. Usually they're concentrated in one or two die.
In my case dies are being disabled (1 die = 48 cores). If just cores are disabled then that is easy to fix - just apply more voltage to stabilize them.

On the other hand I've never tried
to verify the first point, mainly because I don't know how to do it. How do you know for sure that the clock has been reset, do you look at the Amps or is there anyway to read the value of a PLL registry? 

Yes, voltage drops. For example you apply the overclock and you see a VRM working at 54A. You change the voltage setting just 1-2 values lower and after you hit apply the Amps drop from 54 to 44, which means the higher overclock frequency is no longer applied.

Instead of increasing the voltage to the maximum value, I just set it a little bit higher take into accounts on how many
cores are disabled in a particular die. the I restart cgminer and wait 1 minute to see if changes make any difference.

I use this approch because I don't want to cook my asics/vrms.
I've tried that, but it doesn't work as effective as applying max voltage. With max voltage the sleepy dies usually kick in immediately.
Also I have a 2nd theory, which I haven't tested a lot, but if you supply sufficient voltage to a sleepy die it might awake after 2-3-4-5-6 hours. But I don't think there is any consistency in results with this method and I prefer to wake them up immediately with stress/shock voltage that waiting hours for them to wake up naturally, which is not guaranteed to happen.

I've also increased the SPI frequency because of what 'orama said in one of hist last post said:
How much? I played with this and I usually stick to 256000Mhz. I tried even more, but I can't see any correlation between this and any results.

Another think I do is taking not of all the changes I apply along the way (a goodthing is coping /config/adavanced.conf at differnet moment in time)   

To check the distribution of disabled cores I use a modified version of a pl script included in bertmod. It is an ASCII version, it only outputs temps and disabled core per die, e.g.

Code:
 Board 0: Temperature sensor: 47.5C
 DIE 0 ON: 46 OFF: 2  95.8% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 2: Temperature sensor: 64.0C
 DIE 0 ON: 47 OFF: 1  97.9% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 3: Temperature sensor: 55.0C
 DIE 0 ON: 48 OFF: 0  100% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 4: Temperature sensor: 49.0C
 DIE 0 ON: 48 OFF: 0  100% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK



How exactly did you do that?
You created an additional page within the lighttpd server?
legendary
Activity: 1260
Merit: 1008
December 26, 2013, 11:54:38 AM


so you're impling that setting the clock to the default value lower the Amps, despite the fact that you're increasing the voltage per die?
I have 2 problems with the current state of overclocking:
Problem 1 applies to everyone:
Setting the voltage on the Advanced tab resets the clock.
Problem 2 applies to me only (and maybe other people):
Setting any overclock in cgminer.sh kills a number of dies. I think it happens because when you increase the frequency of the chip the current required increases, which creates a change in the voltage/current values and so this change makes the dies sleepy.

from my limited experience I think  that both points apply to everyone. For what is worth I've been albe to verify only the second one: every time I overclock the miner a good number of cores will be disabled during the first two minutes of the new cgminer session. Usually they're concentrated in one or two die. On the other hand I've never tried
to verify the first point, mainly because I don't know how to do it. How do you know for sure that the clock has been reset, do you look at the Amps or is there anyway to read the value of a PLL registry? 
 


If the SPI frequency is too low then there is not enough bandwidth to collect all the good nonces found. So you want to find an equilibrium where by SPI frequency is high enough not to miss any of the nonces found, but low enough to retain a healthy noise to signal ratio and thus minimise hardware errors.


Another think I do is taking not of all the changes I apply along the way (a goodthing is coping /config/adavanced.conf at differnet moment in time)   

To check the distribution of disabled cores I use a modified version of a pl script included in bertmod. It is an ASCII version, it only outputs temps and disabled core per die, e.g.

Code:
 Board 0: Temperature sensor: 47.5C
 DIE 0 ON: 46 OFF: 2  95.8% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 2: Temperature sensor: 64.0C
 DIE 0 ON: 47 OFF: 1  97.9% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 3: Temperature sensor: 55.0C
 DIE 0 ON: 48 OFF: 0  100% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK
Board 4: Temperature sensor: 49.0C
 DIE 0 ON: 48 OFF: 0  100% OK
 DIE 1 ON: 48 OFF: 0  100% OK
 DIE 2 ON: 48 OFF: 0  100% OK
 DIE 3 ON: 48 OFF: 0  100% OK

hero member
Activity: 812
Merit: 502
December 25, 2013, 10:10:47 PM


so you're impling that setting the clock to the default value lower the Amps, despite the fact that you're increasing the voltage per die?
I have 2 problems with the current state of overclocking:
Problem 1 applies to everyone:
Setting the voltage on the Advanced tab resets the clock.
Problem 2 applies to me only (and maybe other people):
Setting any overclock in cgminer.sh kills a number of dies. I think it happens because when you increase the frequency of the chip the current required increases, which creates a change in the voltage/current values and so this change makes the dies sleepy.

This is what I do: on boards with sleepy dies I increase voltage to the max and when I see they kick in and are alive I immediately lower it to safe values, but not too low so they don't fall asleep again.
Then I restart cgminer.sh where the overclock values are and this makes the dies to fall asleep again even though the voltage hasn't changed (just the current changes as higher frequency results in higher current).


nothing.

your overclock settings will stay in place, but your change to voltage won't get applied :/

edit:

give IRC a try and see if hno is hanging around..


So it won't work. To be honest I don't see any certain way to achieve the same result twice. I'm just fiddling with all the settings/values I can change hoping it will work.
2-3 days ago this same miner was happily hashing overclocked at around 600GH/s
Now it is even hard to make it hash at stock as some dies are very hard to be kept awake.
legendary
Activity: 1260
Merit: 1008
December 25, 2013, 08:52:01 PM

so everytime you hit click on "Apply" button on the "Adavenced" tab, the clock is resetted to default?

edit:

I've gone through the code and it seems that what happened when you click on apply is the execution of this command:

Code:
    waas -c /config/advanced.conf > /dev/null
    killall monitordcdc

in /config/advanced.conf there's a JSON rappresentation of the data contained in the Advanced tab.

This means  that waas command reset all the default value for the PLL. Unluckily there's no source code for waas executable.


Yes, I believe so as the Amps drop.


so you're impling that setting the clock to the default value lower the Amps, despite the fact that you're increasing the voltage per die?

So what would happen if I remove that command?

nothing.

your overclock settings will stay in place, but your change to voltage won't get applied :/

edit:

give IRC a try and see if hno is hanging around..
hero member
Activity: 812
Merit: 502
December 25, 2013, 08:36:05 PM
So I have this very temperamental miner with around 6-7 sleepy dies on 3 boards with the 4th board being OK.
At stock clock I can easily awaken them by applying max voltage to all 4 dies of a single board and when I see them kick in I quickly lower them back to safe values.

But whenever I try to overclock it the dies fall asleep and because I need to change voltage settings to awaken them the overclock disappears Sad
And that same miner used to be overclocked with all dies working, but I don't remember how I did it.

Any ideas? Is there a way to NOT remove the overclock when changing voltage settings?

so everytime you hit click on "Apply" button on the "Adavenced" tab, the clock is resetted to default?

edit:

I've gone through the code and it seems that what happened when you click on apply is the execution of this command:

Code:
    waas -c /config/advanced.conf > /dev/null
    killall monitordcdc

in /config/advanced.conf there's a JSON rappresentation of the data contained in the Advanced tab.

This means  that waas command reset all the default value for the PLL. Unluckily there's no source code for waas executable.


Yes, I believe so as the Amps drop.

So what would happen if I remove that command?
legendary
Activity: 1260
Merit: 1008
December 25, 2013, 07:51:02 PM
So I have this very temperamental miner with around 6-7 sleepy dies on 3 boards with the 4th board being OK.
At stock clock I can easily awaken them by applying max voltage to all 4 dies of a single board and when I see them kick in I quickly lower them back to safe values.

But whenever I try to overclock it the dies fall asleep and because I need to change voltage settings to awaken them the overclock disappears Sad
And that same miner used to be overclocked with all dies working, but I don't remember how I did it.

Any ideas? Is there a way to NOT remove the overclock when changing voltage settings?

so everytime you hit click on "Apply" button on the "Adavenced" tab, the clock is resetted to default?

edit:

I've gone through the code and it seems that what happened when you click on apply is the execution of this command:

Code:
    waas -c /config/advanced.conf > /dev/null
    killall monitordcdc

in /config/advanced.conf there's a JSON rappresentation of the data contained in the Advanced tab.

This means  that waas command reset all the default value for the PLL. Unluckily there's no source code for waas executable.
hero member
Activity: 812
Merit: 502
December 25, 2013, 06:13:45 PM
So I have this very temperamental miner with around 6-7 sleepy dies on 3 boards with the 4th board being OK.
At stock clock I can easily awaken them by applying max voltage to all 4 dies of a single board and when I see them kick in I quickly lower them back to safe values.

But whenever I try to overclock it the dies fall asleep and because I need to change voltage settings to awaken them the overclock disappears Sad
And that same miner used to be overclocked with all dies working, but I don't remember how I did it.

Any ideas? Is there a way to NOT remove the overclock when changing voltage settings?
hero member
Activity: 812
Merit: 502
December 24, 2013, 07:30:30 PM
Found interesting. Think it's my problem, please look in yours (October miner) - ls /sys/class/gpio/ What the last gpiochip ? I have only 96, and it is my enables core, no directory for other (must be 192).
How can i copy /sys/class/gpio/gpiochipXXX whith new name or from other device to /sys/class/gpio/*.*

Those are the GPIO pins of the BBB nothing to do with ASICs, dies, or cores.
GPIO = General Purpose Input/Output

The miner only uses a few of them.

For anyone still wondering how to get bfgminer + overclocking the easiest way is to install bertmod and then make a copy of the cgminer.sh:

Code:
cp /etc/init.d/cgminer.sh /config/bfg1.sh


Then you need to edit the newly created file by:
Code:
vi /config/bfg1.sh

Delete everything by pressing this on your keyboard: 120dd
Make sure all text is gone.
Then press i and paste everything from this into your file: http://pastebin.com/fFCWngnq
Then press :x! and then Enter
Finally do /config/bfg1.sh restart

The reason BFGminer didn't want to start with the original cgminer.sh file even after editing it is this part. It basically checks the if the BFGminer setting is checked at the web interface and then starts the appropriate software accordingly:

Code:
MINING_SW=`ls -l /usr/bin/cgminer`
        if [ "`echo $MINING_SW | grep bfgminer`" != "" ] ; then
                export LD_LIBRARY_PATH=/usr/bfgminer/
                start-stop-daemon -b -S -x screen -- -S cgminer -t cgminer -m -d "$DAEMON" --api-listen -c /config/cgminer.conf -S knc:auto
        else
                start-stop-daemon -b -S -x screen -- -S cgminer -t cgminer -m -d "$DAEMON" --api-listen --default-config /config/cgminer.conf
        fi

What I've noticed is in order to stabilize bad dies with cores shutting off and on you need to increase the voltage, so the the total current (Amps) is around 50. Again according to an engineer from KNC we should not get above 50A per VRM, but Bitcoinorama said 64A max. Until this is settled don't go too far. Remember that increasing the frequency also increases the current, so depending on your miner you might have to reduce in at the Advanced tab.

If you are not sure what you are doing better don't start.
sr. member
Activity: 386
Merit: 250
December 24, 2013, 06:45:57 PM
Found interesting. Think it's my problem, please look in yours (October miner) - ls /sys/class/gpio/ What the last gpiochip ? I have only 96, and it is my enables core, no directory for other (must be 192).
How can i copy /sys/class/gpio/gpiochipXXX whith new name or from other device to /sys/class/gpio/*.*

Those are the GPIO pins of the BBB nothing to do with ASICs, dies, or cores.
GPIO = General Purpose Input/Output

The miner only uses a few of them.
legendary
Activity: 1260
Merit: 1008
December 24, 2013, 05:03:29 PM
Found interesting. Think it's my problem, please look in yours (October miner) - ls /sys/class/gpio/ What the last gpiochip ? I have only 96, and it is my enables core, no directory for other (must be 192).
How can i copy /sys/class/gpio/gpiochipXXX whith new name or from other device to /sys/class/gpio/*.*

dunno what you mean, but this is the content of aforementioned dir on my october miner.

Code:
root@mine:~# ls -l /sys/class/gpio/
--w-------    1 root     root          4096 Jan  1  2000 export
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio49 -> ../../devices/virtual/gpio/gpio49
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio59 -> ../../devices/virtual/gpio/gpio59
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio66 -> ../../devices/virtual/gpio/gpio66
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio67 -> ../../devices/virtual/gpio/gpio67
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio69 -> ../../devices/virtual/gpio/gpio69
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio70 -> ../../devices/virtual/gpio/gpio70
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio71 -> ../../devices/virtual/gpio/gpio71
lrwxrwxrwx    1 root     root             0 Jan  1  2000 gpio76 -> ../../devices/virtual/gpio/gpio76
lrwxrwxrwx    1 root     root             0 Dec 24 20:57 gpiochip0 -> ../../devices/virtual/gpio/gpiochip0
lrwxrwxrwx    1 root     root             0 Dec 24 20:57 gpiochip32 -> ../../devices/virtual/gpio/gpiochip32
lrwxrwxrwx    1 root     root             0 Dec 24 20:57 gpiochip64 -> ../../devices/virtual/gpio/gpiochip64
lrwxrwxrwx    1 root     root             0 Dec 24 20:57 gpiochip96 -> ../../devices/virtual/gpio/gpiochip96
--w-------    1 root     root          4096 Dec 24 20:57 unexport

anyway I don't think you can just "create/copy" something in /sys. sysfs it is the way moderm linuxes export information about hw to user space, so if something is missing there is probably because there's no such a thing on the hw side, or at least the kernel is not albe to deal with it.
legendary
Activity: 1260
Merit: 1008
December 24, 2013, 04:54:30 PM

Looks good. But please, don't help add a couple hundred THs to the network by being the hero and writing an overclocking tutorial!


wow this is bold. I have no words.

Imagine what would have happened if tolip_wen had applied the same reasoning...

We would had to wait until January till the new firmware comes with built-in overclocking Smiley

You can't be sure, because tolip_wen's findings could have influenced KnC's choice to release a OC-ready firmware.

The thing that really annoys me is the attitude. Without the sharing of knowledge almost all the bitcoin ecosystem would not exist at all.


What is stopping you from providing such a tutorial?

I'm not knowledgeable enough otherwise I would have done it , as simple as that.
newbie
Activity: 12
Merit: 0
December 24, 2013, 04:46:12 PM
Found interesting. Think it's my problem, please look in yours (October miner) - ls /sys/class/gpio/ What the last gpiochip ? I have only 96, and it is my enables core, no directory for other (must be 192).
How can i copy /sys/class/gpio/gpiochipXXX whith new name or from other device to /sys/class/gpio/*.*
hero member
Activity: 812
Merit: 502
December 24, 2013, 03:03:21 PM

Looks good. But please, don't help add a couple hundred THs to the network by being the hero and writing an overclocking tutorial!


wow this is bold. I have no words.

Imagine what would have happened if tolip_wen had applied the same reasoning...

We would had to wait until January till the new firmware comes with built-in overclocking Smiley

You can't be sure, because tolip_wen's findings could have influenced KnC's choice to release a OC-ready firmware.

The thing that really annoys me is the attitude. Without the sharing of knowledge almost all the bitcoin ecosystem would not exist at all.


What is stopping you from providing such a tutorial?
legendary
Activity: 1260
Merit: 1008
December 24, 2013, 02:14:34 PM

Looks good. But please, don't help add a couple hundred THs to the network by being the hero and writing an overclocking tutorial!


wow this is bold. I have no words.

Imagine what would have happened if tolip_wen had applied the same reasoning...

We would had to wait until January till the new firmware comes with built-in overclocking Smiley

You can't be sure, because tolip_wen's findings could have influenced KnC's choice to release a OC-ready firmware.

The thing that really annoys me is the attitude. Without the sharing of knowledge almost all the bitcoin ecosystem would not exist at all.
hero member
Activity: 812
Merit: 502
December 24, 2013, 02:09:49 PM

Looks good. But please, don't help add a couple hundred THs to the network by being the hero and writing an overclocking tutorial!


wow this is bold. I have no words.

Imagine what would have happened if tolip_wen had applied the same reasoning...

We would had to wait until January till the new firmware comes with built-in overclocking Smiley
legendary
Activity: 1260
Merit: 1008
December 24, 2013, 02:04:48 PM

Looks good. But please, don't help add a couple hundred THs to the network by being the hero and writing an overclocking tutorial!


wow this is bold. I have no words.

Imagine what would have happened if tolip_wen had applied the same reasoning...
legendary
Activity: 1260
Merit: 1008
December 24, 2013, 02:00:01 PM
DC\DC - Off after diagnostic tool for November Sad. I try enable-core, recovery and different firmware. Sad

you try the nov diagnostic on a october miner?
Pages:
Jump to: