5850 sudden runaway temperature | Bitcointalksearch.org

P4man

hero member

Activity: 518

Merit: 500

It has nothing to do with the (windows) registry. ArtForz (correctly) mentioned register, thats a hardware address on the videocard.

SlaveInDebt

hero member

Activity: 699

Merit: 500

Your Minion

Yep registry glitch Roll Eyes

ArtForz

sr. member

Activity: 406

Merit: 257

Quote from: SlaveInDebt on January 11, 2012, 01:14:00 PM

When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?

Actually the problem comes from 2+ programs trying to bitbang the internal power management I2C bus simultaneously via direct GPU register access.
The resulting corrupted I2C transactions have a good chance of writing 0xFFs to random registers, on a vt1165 that results in setting ~2.03V core;
A hardwired limit in the vt1165 caps that to 1.65V.
Fix is simple: don't run multiple programs that try to directly talk to the VRM controller at the same time.

SlaveInDebt

hero member

Activity: 699

Merit: 500

Your Minion

When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?

P4man

hero member

Activity: 518

Merit: 500

Quote from: ?? on ??

Quote from: Remember remember the 5th of November on January 11, 2012, 08:58:13 AM

Sorry pal, but your VRMs don't even come close to my 123C VRM temps

123C VRM? That card will be dead within a year. Hell thermal throttling should be kicking in to interrupt the card before allowing sustained >120C temps.

He's had the same symptom I had; temps suddenly for no apparent reason shooting up to throttle point. Im pretty certain it has the same cause and then temps are the least of his problems. 1.65v likely kills the GPU a lot faster than 120C on the VRMs. My 5850 only suffered an hour of so of that, it was enough to kill it. A week after I started this thread it was dead.

P4man

hero member

Activity: 518

Merit: 500

Lol? Look at the chart again not .the gpuz shot, tbats ony for max voltage

Remember remember the 5th of November

legendary

Activity: 1862

Merit: 1014

Reverse engineer from time to time

Sorry pal, but your VRMs don't even come close to my 123C VRM temps

allinvain

legendary

Activity: 3080

Merit: 1083

Quote from: P4man on January 11, 2012, 02:31:28 AM

Quote from: allinvain on January 10, 2012, 07:37:08 PM

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was

.

Damn, I will have to make sure I never run more than one monitoring app. At the moment only 1 of my miners is windows based, and on it I use GPU-Z and only one instance. But man what a horrible bug to run into! I wonder how far back this bug was discovered.

P4man

hero member

Activity: 518

Merit: 500

Quote from: allinvain on January 10, 2012, 07:37:08 PM

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was

.

allinvain

legendary

Activity: 3080

Merit: 1083

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

P4man

hero member

Activity: 518

Merit: 500

Quote from: Jack of Diamonds on September 19, 2011, 04:16:05 AM

Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler

Not sure if the cooler actually had any effect, looks like the card throttled to save its life. The GPU was at a very constant 100C which seems like a hardware throttle point. The throttling of the GPU probably also prevented the VRMs from overheating even more, unless they are also protected somehow. I think they where at 125 or 130C but also completely constant (see graph). Not sure if thats some VRM overheat protection or simply the upper scale of the temperature sensor, but I also believe its the maximum temperature they are designed for, so I dont think its coincidence.

Still, even with throttling keeping the GPU from frying, sending that kind of voltage through the chip would cause electromigration and kill it pretty fast, no matter if you could keep it "cool".

Jack of Diamonds

sr. member

Activity: 252

Merit: 251

Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler

P4man

hero member

Activity: 518

Merit: 500

More googling. Doh! It seems measuring the voltage is what causes it to spike randomly, according to this thread:

http://www.overclock.net/amd-ati/648462-hd-5870-random-voltage-bump-1-a.html

I've seen this before - it has happened on my 5870.

It's apparently an issue that happens when (a) you have a 5xxx series GPU under stress [it happened to me when running FurMark as well], AND (b) when you're running more than one (or in some cases ONLY one] GPU voltage monitoring tool at the same time [i.e., GPU+Z + Afterburner, or Everest]. The voltage spikes to the max unlocked core - mine was 1.5v.

There was a thread on this over at the Everest forums. Check it out.

I only run Afterburner now, and if I need to open GPU-Z, I close afterburner first if the GPU will be under stress. Bizarre? Yes. Verifiable? Also yes.

PS: You'll notice that in Afterburner voltage MONITORING is off by default. It's because voltage monitoring is what is causing the issue. Run just afterburner with voltage monitoring off, and you won't have the problem.

I think I was running afterburner when it happened the first time and I did run everest for sure. I guess that also explains why it never gave any trouble in ubuntu where it ran stable for weeks at 100% load and at higher clocks to boot. Maybe the card isnt dead yet lol

and three cheers for my accelero twin turbo that keeps the card from catching fire even at 1.65v!

ssateneth

legendary

Activity: 1344

Merit: 1004

WOOOOOOOSH VOLTAGE! LOL

P4man

hero member

Activity: 518

Merit: 500

I tested again, this time with GPU-Z open. Indeed, after a few minutes, the voltage suddenly spikes from 1.085 to 1.65v!

No surprise, VRM and core temperature skyrocket when that happens. Seems like a dead VRM to me Sad

P4man

hero member

Activity: 518

Merit: 500

Quote from: ?? on ??

Can't find any other reason for the VRM temp skyrocketing than temporary airflow/fan failure

Simple, VRM failure (causing voltage spike)
Fans are working fine, and even without any GPU fan my temps otherwise remained in spec for almost foreever in my rig (lots of case flow and big GPU cooler). Ints not a fan problem. If your VRM spike was as bad and as sudden as mine, consider the card dead or dying Sad

P4man

hero member

Activity: 518

Merit: 500

Quote from: ?? on ??

Happened to my Sapphire 5850s all the time back when I mined.
The reason was the fan speed reseting to 50% and the memory clock reseting to 1000mhz.
I found no fix for it, it just kept happening.

That cant be the same thing; for one, even at 50% fan and full clock, I (used to) not get anywhere near those temps with my accelero.
Secondly, it would still not cause a temperature spike like that in like 1 second. I dont how fast yours went, but I got HUGE cooler on there, even with the fans off temperature climbs slowly, particularly the GPU (VRMs do go pretty fast). Not +40C in like 2 seconds.

deslok

sr. member

Activity: 462

Merit: 250

It's all about the game, and how you play it

Come to think of it this sounds a lot like when i had a 5770 burn out(the board turned a very pretty color too) after we had a power outage

P4man

hero member

Activity: 518

Merit: 500

Quote from: ?? on ??

Get a utility which allows you to chart core/memory frequency and voltage. I wonder if the card is spontaneously changing a setting leading to a massive increase in current = heat.

Seems extremely unlikely an increase in clockspeed would cause this. Notice how slowly the temperatures ramp up going from idle (which is like 300 MHz and doing nothing) to full load when I start mining, which is 725 MHz and FULL load. It still takes 10+ minutes for temps to go up.

Also, I only checked for like 1 second before shutting down the miner, but it was still working, though much slower than usually (IIRC around 200 MHs instead of 300). Guess thats the throtteling kicking in.

I suspect the voltage regulation is the problem, at some point the VRM gives out and causes a voltage spike which in turns leads to a temperature spike which is instantaneous for the VRM and very fast for the rest of the GPU

edit: doh, you mentioned current. I agree. Probably a VRM crapping out.

The card still seems to work for gaming (CPU temps under 35-40C and VRMs under 50C), I think ill just stop mining and prepare for the card to die entirely

ssateneth

legendary

Activity: 1344

Merit: 1004

Are the fans still spinning? Going to guess yes, because I see the motherboard temperature went up too. Monitor the voltage of your GPU. It would appear that something is causing it to shoot up to max voltage is my guess

Topic: 5850 sudden runaway temperature (Read 4215 times)