Author

Topic: Managing unstable overclock behavior, GPU errors under Linux / XP / Win7 (Read 3940 times)

sr. member
Activity: 476
Merit: 250
Power to the people!
asus 5870 ref 1.288
donator
Activity: 1731
Merit: 1008
Iv found i can run my cards insane clocks at 1100mhz on a 5870 and 950 on a 5970 but if im on my pc much they will freeze but if i dont use the pc they are fine.  I have to run my clocks much lower to not crash while on my pc doing stuff.
1100 that is insane indeed, what brand/voltage ?

I wonder what is the best test for finding stability issue.
sr. member
Activity: 476
Merit: 250
Power to the people!
Iv found i can run my cards insane clocks at 1100mhz on a 5870 and 950 on a 5970 but if im on my pc much they will freeze but if i dont use the pc they are fine.  I have to run my clocks much lower to not crash while on my pc doing stuff.
full member
Activity: 133
Merit: 100
Is your miner dedicated to mining or are you using it for other tasks?

My experience is with Linux and my cards go to 975 MHz and 1020 MHz stably at stock voltage (Sapphire HD5850 Xtremes)  I thought my slow card could handle 980 MHz but it crashed after a few days.

The most important factors for me in achieving stability were:
  • Not running a GUI,
  • Keeping the temperatures low.

On the first point, people find their cards are more stable after disabling Flash hardware acceleration for example.  Removing desktop effects can help too.  I've removed everything: no mouse, no console mode, not even a black screen (when a monitor is connected to a card it reports no signal).  I don't know how to do this in Windows but thought it worth mentioning as it made a big difference to me.

I don't know anything about "speed vs. voltage sweet spots".  I just fix a voltage and manually search for the max stable clock (which can take days).  Currently my cards are both undervolted to 0.9875V and are clocked to 847/308 and 899/327 but with only 48 hours continuous mining so far I can't be too sure of the stability of this.

I don't think that Linux is any better at handling crashes.  When one of my cards crashes I've not found a way of restarting it without rebooting the system!

How did you do this? Can you explain more, I'd like to do this too.
donator
Activity: 1731
Merit: 1008
  I know that you don't want to hear it but perhaps you should try Google.

I was expecting people here, with large scale GPU operations, would have had figured ways around these.  There exist automatic overclocking tools but they're not very good, There's Ati catalyst and AtiTool that scan for artifact.

I had though of setting up a mechanical timer on each rig that would hard reset the computer every days.
full member
Activity: 182
Merit: 100
Unfortunately or perhaps fortunately, all hardware, even those of the same brand and model and batch are unique.  I dare say they are almost human in that regard.  Bottom line is that you have to manual inch your way up (or down, if you will, regarding memory and/or efficiency).  Test for long periods, test a wide array of drivers, SDKs and for some people, even OSes.  On top of that, even BIOS settings which may or may not affect your PCIe lanes.


There is already a wealth of information out there and too much as a result of variables to list here.  I know that you don't want to hear it but perhaps you should try Google.
legendary
Activity: 1344
Merit: 1004
I've found that only trial and error is the best process for overclocking. If you're underclocking memory, I've found that all 58xx have stability problems with memory 350-400+. Turn memory speed to at most 330 to get rid of "Hardware problem?" errors. The gain of 0.1 mhash isn't worth it. If you're scared about voltages, a small bump from stock (1.163 -> 1.175) can go a long way. I managed to push my 5830 from 985 core to 1016 with that small bump, and no hardware errors! Driver hangs and restarts are another thing, and seems to purely result from core problems (memory doesn't seem to affect driver hang+reset; it just causes "Hardware problem?" and bad shares). Core voltage and memory aside then, that just leaves core speed. I generally started around 975 and went from there, increasing by 5 until driver hang/crash, then when I get around to noticing it, i'll go back by 2 mhz, and keep doing that until I forget the card is even running because it's stopped crashing because of stability.
newbie
Activity: 42
Merit: 0
If using linux, code a script that checks for GPU-hungs, if a hung is detected, it sets the default overclock on that card to 5MHz lower than the current MHz(you might want to set some thresholds here so you don't end up with a GPU that is running at clocks like 200MHz), then coldreboot.
At the same time this scripts lowers the clocks if the temperature go over predefined limits, and raises the clocks again as it gets cooler. Use other temperature sensors(such as the motherboard) to trigger presets.
hero member
Activity: 588
Merit: 500
If you want to minimize your risk, DO NOT OVERCLOCK.

There's no automated way to find a "sweet spot." Overclocking, at least to find the maximum "stable" performance, is all 100% manual tuning and a lot of crashing.
legendary
Activity: 1246
Merit: 1011
Is your miner dedicated to mining or are you using it for other tasks?

My experience is with Linux and my cards go to 975 MHz and 1020 MHz stably at stock voltage (Sapphire HD5850 Xtremes)  I thought my slow card could handle 980 MHz but it crashed after a few days.

The most important factors for me in achieving stability were:
  • Not running a GUI,
  • Keeping the temperatures low.

On the first point, people find their cards are more stable after disabling Flash hardware acceleration for example.  Removing desktop effects can help too.  I've removed everything: no mouse, no console mode, not even a black screen (when a monitor is connected to a card it reports no signal).  I don't know how to do this in Windows but thought it worth mentioning as it made a big difference to me.

I don't know anything about "speed vs. voltage sweet spots".  I just fix a voltage and manually search for the max stable clock (which can take days).  Currently my cards are both undervolted to 0.9875V and are clocked to 847/308 and 899/327 but with only 48 hours continuous mining so far I can't be too sure of the stability of this.

I don't think that Linux is any better at handling crashes.  When one of my cards crashes I've not found a way of restarting it without rebooting the system!
donator
Activity: 1731
Merit: 1008
I would like to know a way to get a High and Stable overclock without risking having a hung GPU that can go unnoticed for days.

An unstable card can have so many different behavior that I now run my card as stable safe as possible.

I only have experience with Win7 but would gladly move to Linux if GPU errors are better handled.

Here are some of the errors I encounter.

Windows lockup,
Windows reboot
Windows BSOD
Windows stop poclbm.exe
Driver downclock GPU to a ridiculously slow speed.

The worst is when guiminer.exe goes to 100% CPU and all other miner stall at ~100Khash (this happen on single core CPU)

I found out the hard way, that any Ghashes I try to gain from fine tuning an OC for 10-20mhz is always at lost compared to having a build hung for days in the above conditions.

I keep hearing about people overclocking their card much higher than I can have mines Stable.
Personally I run my 5850s no faster than 830mhz stock, and some of them can't even do that stable.

Do you know of a reliable automated tool for finding Speed vs. Voltage sweet spots ?

What are your experiences with other OSes / software / miner / drivers ?
Jump to: