Author

Topic: Ubuntu Nvidia Overclocking, Heat and Power Management (Read 918 times)

newbie
Activity: 49
Merit: 0
Can I use nvidia-smi to overclock cpu clock, mem clock and fan speed? Because nvidia-settings need xorg to run firsta and it consumes small amount of VRAM.
member
Activity: 223
Merit: 21
DCAB
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards
Code:
$ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70

I'm pretty sure they've already tried that, just were expressing the number as #.
[fan:#] instead of [gpu:#].

Ah yes, you are correct. Thank you
hero member
Activity: 630
Merit: 502
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards
Code:
$ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70

I'm pretty sure they've already tried that, just were expressing the number as #.
[fan:#] instead of [gpu:#].
member
Activity: 223
Merit: 21
DCAB
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards
Code:
$ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70

I'm pretty sure they've already tried that, just were expressing the number as #.
hero member
Activity: 630
Merit: 502
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards
Code:
$ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70
member
Activity: 223
Merit: 21
DCAB
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?


What version driver are you using?  I remember once having a similar issue when i was building my first rig.

What issues did you see with the driver? I'd like to update the primary post with that as a common pitfall.
newbie
Activity: 9
Merit: 0
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?


What version driver are you using?  I remember once having a similar issue when i was building my first rig.
member
Activity: 223
Merit: 21
DCAB
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards

Not sure, could you go through the steps you tried then run "$ history" so I can validate your command sequence?
newbie
Activity: 12
Merit: 0
Hey guys,

can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig?

Not working (any #):
Code:
rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70
rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70'
rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70"


rig1@rig1:~$

Working:
Code:
rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70

  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70.
  Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70.
rig1@rig1:~$


BTW: setting
Code:
nvidia-settings -a [gpu:1]/GPUFanControlState=1
works (for any #.....

best regards
hero member
Activity: 630
Merit: 502
Technically I don't think the power limiting on the NVidia driver is the same as undervolting - which you can do on ATI's pretty easily, or some ASIC's.  On NVidia, you can limit the amount of watts that the card will use max, but I believe it's still working at the same core voltage... With things like the ATI, you can detune off of 12v, and that basically allows the same performance but with less power consumption.

So, short answer is everyone on the NVidia side seems to get the efficiency gains by using the power limiting, which you already documented - and if you take something like WhatToMine, their calculation are clocking up core and memory, and reducing the power limit.
Reducing power limit does affect the core voltage. I don't think you can see the voltage using nvidia-smi in Linux but with Windows software you can see how many mV the GPU is using before and after changing power limit.
hero member
Activity: 687
Merit: 511
Technically I don't think the power limiting on the NVidia driver is the same as undervolting - which you can do on ATI's pretty easily, or some ASIC's.  On NVidia, you can limit the amount of watts that the card will use max, but I believe it's still working at the same core voltage... With things like the ATI, you can detune off of 12v, and that basically allows the same performance but with less power consumption.

So, short answer is everyone on the NVidia side seems to get the efficiency gains by using the power limiting, which you already documented - and if you take something like WhatToMine, their calculation are clocking up core and memory, and reducing the power limit.
member
Activity: 223
Merit: 21
DCAB
Isn't reducing the power limit already undervolting the GPU?

I guess technically it could be since Watts = Volts * Amps (depends how it’s managed under the hood, which I’m not familiar with), but that knowledge alone is worth adding to the guide for someone new. I personally haven’t gone down that road so I don’t feel qualified to provide information on the subject. If you’re willing to help add what you know I’d be happy to contribute the mining power to you.
hero member
Activity: 630
Merit: 502
Isn't reducing the power limit already undervolting the GPU?
member
Activity: 223
Merit: 21
DCAB
If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig:

1x GTX 1080ti
3x GTX 1070
Doesn't
Code:
sudo nvidia-smi -i 0 -pl 150
already increase or reduce the power limit for individual cards? What do you think needs elaborating?

Undervolting the cards. I haven’t gone down that path yet, but Ive heard you can get better results that way.
hero member
Activity: 630
Merit: 502
If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig:

1x GTX 1080ti
3x GTX 1070
Doesn't
Code:
sudo nvidia-smi -i 0 -pl 150
already increase or reduce the power limit for individual cards? What do you think needs elaborating?
member
Activity: 223
Merit: 21
DCAB
Any takers?
member
Activity: 223
Merit: 21
DCAB
If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig:

1x GTX 1080ti
3x GTX 1070
member
Activity: 223
Merit: 21
DCAB
Wow, great writeup - you get merit for that!  Smiley  I actually haven't seen a good writeup on the coolbits part before and always wondered about the significance of the values (never a huge fan of magic numbers)

Thank you for the kind words! I was hoping I could help save some people some time and clarify some of the more esoteric concepts with Ubuntu overclocking. Glad it helped you and thank you for the merit Smiley
hero member
Activity: 687
Merit: 511
Wow, great writeup - you get merit for that!  Smiley  I actually haven't seen a good writeup on the coolbits part before and always wondered about the significance of the values (never a huge fan of magic numbers)
member
Activity: 223
Merit: 21
DCAB
It is also worth noting that sometimes the nvidia driver install will fail to blacklist nouveau which will need to be done manually.
member
Activity: 223
Merit: 21
DCAB
Let me preface this by saying I've had massive amounts of headaches scrounging the internet trying to get my nvidia overclocking/power management working on ubuntu. I'm sure this guide is going to miss some edge cases I didn't hit myself. If you spot one please let me know and I'll be sure to update this post.

Before you begin, please note one of the most frustrating pitfalls you can hit when enabling manual control on your GPUs is trying to use integrated graphics. This will cause your gpumanager to recognize your system as a "hybrid system" which will nullify your generated configuration (unless you really know what you're doing and know how to manually write an xorg.conf file, DO NOT USE INTEGRATED GRAPHICS). In fact, it is generally safer to disable the integrated graphics altogether in the BIOS.

1) Driver
For starters, you'll need a driver. Currently, I rely on the Ubuntu optimized apt package 'nvidia-384' which can be pulled by running:

Code:
$ sudo apt-get install nvidia-384

The driver doesn't really matter. You'll just need one that is compatible with ubuntu. I would recommend installing via either apt or deb package, the runfile can cause headaches down the line when you need to upgrade the drivers. At the beginning this guide is meant for people relatively new to ubuntu and I will not be covering voltage control. If someone wants to help with that please DM me or post here.

Once the driver is installed, reboot

2) Enabling Manual Control
On ubuntu and other debian based systems, you'll need to set the Coolbits parameter in your xorg.conf file. This is a bit sequence that enables control of various aspects of the graphics card. You can find the breakdown online, but I will provide it here as well alongside an explanation:

For those unfamiliar with binary, or those who only want to run the recommended settings, please skip section 2.1 and go straight to 2.2 recommended settings

2.1) Coolbits Bit Sequence
[0/1][0/1][0/1][0/1][0/1]
4      3      2     1      0

Bit 0: Enables overclocking of pre-Fermi cards
Bit 1: Attempt to initialize SLI for cards with different memory amounts (This one can be and should be ignored to your own peril)
Bit 2: Enable manual configuration of fan speed
Bit 3: Enable mem/clock overclocking
Bit 4: Enable voltage control

2.2) Recommended Settings
For most cards, you're going to want a cool bits option of "28" which is Bits 2, 3, and 4 set (11100). If you do have pre-Fermi cards you're cool bits option would be
"21" which is bits 0, 2 and 4 set (10101).

2.3) Setting the xorg.conf file
To set the coolbits option. You'll need to run the following command with the your recommended or calculated setting from sections 2.1 and 2.2. I'm using "28" as it is most common.

Code:
$ sudo nvidia-xconfig -a --cool-bits 28

This command is using both the '-a' flag (enables all GPUs in the xorg.conf file) and the '--cool-bits' flag to set the coolbits. If you need to do any manual editing or validation, the generated file will be located at '/etc/X11/xorg.conf'.

Now either log out and back in, or reboot the machine (either one will reread the xorg.conf file). Validate that your xorg.conf file has not changed when you get back to your desktop. If it has changed, please refer to section 4 for troubleshooting.

3) Managing your cards
You will now need to go through and manage the manual configuration of your cards. This is done through two different commands:
Code:
nvidia-settings
for overclocking and fan control
Code:
nvidia-smi
for power management]

Please note that 'nvidia-smi' needs to be run as root while 'nvidia-settings' cannot be run as root.

3.1) Core/Mem Overclocking and Fan Control
In order to overclock the core and memory use the following commands:
Code:
nvidia-settings -c :0 -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=800"
Code:
nvidia-settings -c :0 -a "[gpu:0]/GPUGraphicsClockOffset[3]=100"

In order to control the fan speed use the following commands:
Code:
nvidia-settings -c :0 -a "[gpu:$i]/GPUFanControlState=1"
Code:
nvidia-settings -c :0 -a "[fan:$i]/GPUTargetFanSpeed=85"

To break these commands down, the gpu ID follows the same order listed if you run 'nvidia-smi' in your terminal. The ID number is specified by '...[gpu:]...' prefixing each setting string. The following index you'll notice is the '...[3]...' prior to the equal sign. This denotes that in setting 3 it will receive that much overclock. GPUs in ubuntu and other systems have a scale that denotes how fast they should run in hierarchy from 0-3 with 0 being the slowest and 3 being the highest. If your cards get too hot they'll drop out of setting 3 and into 2, 2 to 1, so on and so forth down to 0. I would not recommend setting any overclock levels other than 3 and work on heat management. The last bit after the equal sign denotes the value to set

3.2) Maximum Power Draw Setting
In order to set the maximum power draw for each card use the following commands:
Code:
sudo nvidia-smi -i 0 -pm 1
Code:
sudo nvidia-smi -i 0 -pl 150

To break these down, the first command sets the power management for card in ID 0 to on. The second command sets the power limit to 150 for card in ID 0. If no -i flag is specified, the setting applies to all cards.

4) Troubleshooting

xorg.conf being overwritten
If your xorg.conf is being overwritten when you log back in, the most likely culprit is the GPU manager. There is a swath of reasons why this could be happening. The most common reason I have seen is trying to use integrated graphics instead of hooking the graphics up directly to the primary GPU. If that is not the case you can look in '/var/log/gpumanager.log' to see if it is overwriting the xorg.conf


If you believe this guide is incomplete or if you have any questions, please feel free to follow up here or over DM and I'd be happy to help.
Jump to: