Pages:
Author

Topic: 5850 sudden runaway temperature (Read 4200 times)

hero member
Activity: 518
Merit: 500
January 11, 2012, 02:52:05 PM
#21
It has nothing to do with the (windows) registry. ArtForz (correctly) mentioned register, thats a hardware address on the videocard.
hero member
Activity: 699
Merit: 500
Your Minion
January 11, 2012, 02:31:11 PM
#20
Yep registry glitch  Roll Eyes
sr. member
Activity: 406
Merit: 257
January 11, 2012, 01:37:28 PM
#19
When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?
Actually the problem comes from 2+ programs trying to bitbang the internal power management I2C bus simultaneously via direct GPU register access.
The resulting corrupted I2C transactions have a good chance of writing 0xFFs to random registers, on a vt1165 that results in setting ~2.03V core;
A hardwired limit in the vt1165 caps that to 1.65V.
Fix is simple: don't run multiple programs that try to directly talk to the VRM controller at the same time.
hero member
Activity: 699
Merit: 500
Your Minion
January 11, 2012, 01:14:00 PM
#18
When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?
hero member
Activity: 518
Merit: 500
January 11, 2012, 11:25:19 AM
#17
Sorry pal, but your VRMs don't even come close to my 123C VRM temps

123C VRM?  That card will be dead within a year.  Hell thermal throttling should be kicking in to interrupt the card before allowing sustained >120C temps.

He's had the same symptom I had; temps suddenly for no apparent reason shooting up to throttle point. Im pretty certain it has the same cause and then temps are the least of his problems. 1.65v likely kills the GPU a lot faster than 120C on the VRMs. My 5850 only suffered an hour of so of that, it was enough to kill it. A week after I started this thread it was dead.
hero member
Activity: 518
Merit: 500
January 11, 2012, 09:31:04 AM
#16
Lol? Look at the chart again not .the gpuz shot, tbats ony  for max voltage
legendary
Activity: 1862
Merit: 1011
Reverse engineer from time to time
January 11, 2012, 08:58:13 AM
#15
Sorry pal, but your VRMs don't even come close to my 123C VRM temps
legendary
Activity: 3080
Merit: 1080
January 11, 2012, 08:55:18 AM
#14
What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was Smiley.

Damn, I will have to make sure I never run more than one monitoring app. At the moment only 1 of my miners is windows based, and on it I use GPU-Z and only one instance. But man what a horrible bug to run into! I wonder how far back this bug was discovered.
hero member
Activity: 518
Merit: 500
January 11, 2012, 02:31:28 AM
#13
What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was Smiley.
legendary
Activity: 3080
Merit: 1080
January 10, 2012, 07:37:08 PM
#12
What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?
hero member
Activity: 518
Merit: 500
September 19, 2011, 04:24:33 AM
#11
Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler

Not sure if the cooler actually had any effect, looks like the card throttled to save its life. The GPU was at a very constant 100C which seems like a hardware throttle point. The throttling of the GPU probably also prevented the VRMs from overheating even more, unless they are also protected somehow. I think they where at 125 or 130C but also completely constant (see graph). Not sure if thats some VRM overheat protection or simply the upper scale of the temperature sensor, but I also believe its the maximum temperature they are designed for, so I dont think its coincidence.

Still, even with throttling keeping the GPU from frying, sending that kind of voltage through the chip would cause electromigration and kill it pretty fast,  no matter if you could keep it "cool".
sr. member
Activity: 252
Merit: 251
September 19, 2011, 04:16:05 AM
#10
Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler
hero member
Activity: 518
Merit: 500
September 18, 2011, 04:11:32 PM
#9
More googling. Doh! It seems measuring the voltage is what causes it to spike randomly, according to this thread:

http://www.overclock.net/amd-ati/648462-hd-5870-random-voltage-bump-1-a.html

I've seen this before - it has happened on my 5870.

It's apparently an issue that happens when (a) you have a 5xxx series GPU under stress [it happened to me when running FurMark as well], AND (b) when you're running more than one (or in some cases ONLY one] GPU voltage monitoring tool at the same time [i.e., GPU+Z + Afterburner, or Everest]. The voltage spikes to the max unlocked core - mine was 1.5v.

There was a thread on this over at the Everest forums. Check it out.

I only run Afterburner now, and if I need to open GPU-Z, I close afterburner first if the GPU will be under stress. Bizarre? Yes. Verifiable? Also yes.

PS: You'll notice that in Afterburner voltage MONITORING is off by default. It's because voltage monitoring is what is causing the issue. Run just afterburner with voltage monitoring off, and you won't have the problem.


I think I was running afterburner when it happened the first time and I did run everest for sure. I guess that also explains why it never gave any trouble in ubuntu where it ran stable for weeks at 100% load and at higher clocks to boot. Maybe the card isnt dead yet lol

and three cheers for my accelero twin turbo that keeps the card from catching fire even at 1.65v!
legendary
Activity: 1344
Merit: 1004
September 18, 2011, 04:10:48 PM
#8
WOOOOOOOSH VOLTAGE! LOL
hero member
Activity: 518
Merit: 500
September 18, 2011, 03:52:33 PM
#7
I tested again, this time with GPU-Z open. Indeed, after a few minutes, the voltage suddenly spikes from 1.085 to 1.65v!



No surprise, VRM and core temperature skyrocket when that happens. Seems like a dead VRM to me Sad
hero member
Activity: 518
Merit: 500
September 18, 2011, 01:37:16 PM
#6
Can't find any other reason for the VRM temp skyrocketing than temporary airflow/fan failure

Simple, VRM failure (causing voltage spike)
Fans are working fine, and even without any GPU fan my temps otherwise remained in spec for almost foreever in my rig (lots of case flow and big GPU cooler). Ints not a fan problem. If your VRM spike was as bad and as sudden as mine, consider the card dead or dying Sad
hero member
Activity: 518
Merit: 500
September 18, 2011, 01:18:52 PM
#5
Happened to my Sapphire 5850s all the time back when I mined.
The reason was the fan speed reseting to 50% and the memory clock reseting to 1000mhz.
I found no fix for it, it just kept happening.

That cant be the same thing; for one, even at 50% fan and full clock, I (used to) not get anywhere near those temps with my accelero.
Secondly, it would still not cause a temperature spike like that in like 1 second. I dont how fast yours went, but I got HUGE cooler on there, even with the fans off temperature climbs slowly, particularly the GPU (VRMs do go pretty fast). Not +40C in  like 2 seconds.
sr. member
Activity: 462
Merit: 250
It's all about the game, and how you play it
September 18, 2011, 10:04:47 AM
#4
Come to think of it this sounds a lot like when i had a 5770 burn out(the board turned a very pretty color too) after we had a power outage
hero member
Activity: 518
Merit: 500
September 18, 2011, 06:41:25 AM
#3
Get a utility which allows you to chart core/memory frequency and voltage.  I wonder if the card is spontaneously changing a setting leading to a massive increase in current = heat.

Seems extremely unlikely an increase in clockspeed would cause this. Notice how slowly the temperatures ramp up going from idle (which is like 300 MHz and doing nothing) to full load when I start mining, which is 725 MHz and FULL load. It still takes 10+ minutes for temps to go up.

Also, I only checked for like 1 second before shutting down the miner, but it was still working, though much slower than usually (IIRC around 200 MHs instead of 300). Guess thats the throtteling kicking in.

I suspect the voltage regulation is the problem, at some point the VRM gives out and causes a voltage spike which in turns leads to a temperature spike which is instantaneous for the VRM and very fast for the rest of the GPU

edit: doh, you mentioned current. I agree. Probably a VRM crapping out.

The card still seems to work for gaming (CPU temps under 35-40C and VRMs under 50C), I think ill just stop mining and prepare for the card to die entirely
legendary
Activity: 1344
Merit: 1004
September 17, 2011, 08:04:01 PM
#2
Are the fans still spinning? Going to guess yes, because I see the motherboard temperature went up too. Monitor the voltage of your GPU. It would appear that something is causing it to shoot up to max voltage is my guess
Pages:
Jump to: