Author

Topic: How to tell if GPU is damaged or dying or near end of its life? (Read 2363 times)

sr. member
Activity: 364
Merit: 250
Can you recommend a stress testing application?  I'll go and google it too.

Try Kombustor.
sr. member
Activity: 395
Merit: 250
Artifacts during furmark is a good indication.
member
Activity: 71
Merit: 11
Its a lifetime/power trade off. Below 80°C is ok, but a GPU is not going to last 5 years running 24/7 like that.


Sure, but like I said, the card is only about 4 months old, so I would be surprised if it really has been killed in such a short space of time.

Quote from: shorena
Well that sounds like its no a problem with the GPU maybe its the system memory? memtest86+ would be tool to test that.

Yes, might need to test memory, although all components were brand new when installed, and branded and good quality.
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
Thanks for the reply shorena.

The card's never produced any HW errors, I think I was quite conservative with the settings.

Regarding temps, from reading around, I thought anything under 80°C was ok.  I know lower is better, but it's been a struggle as it's getting warmer.  That's why I actually wanted to switch to mining X11 as it runs cooler.

Its a lifetime/power trade off. Below 80°C is ok, but a GPU is not going to last 5 years running 24/7 like that. This is especially true when the card had some minor issues from the begining, e.g. from production or transport.

I just tried something, and not sure what conclusion to draw now.  I unplugged the monitor from that card (which is in the primary 16x pcie slot), and plugged it into the card sitting in the 4x slot.  I'm actually not sure if this should even work or not (does the monitor HAVE to be plugged into the card in the primary slot), but regardless, I still couldn't remote desktop into it, effectively, it behaved the same as when the monitor was plugged into that card.  It booted, but hang, and was unresponsive.  It would respond to pings, but remote desktop would begin loading, but then hang too. 

Can you recommend a stress testing application?  I'll go and google it too.

Thanks again.

Well that sounds like its no a problem with the GPU maybe its the system memory? memtest86+ would be tool to test that.
member
Activity: 71
Merit: 11
Thanks for the reply shorena.

The card's never produced any HW errors, I think I was quite conservative with the settings.

Regarding temps, from reading around, I thought anything under 80°C was ok.  I know lower is better, but it's been a struggle as it's getting warmer.  That's why I actually wanted to switch to mining X11 as it runs cooler.

I just tried something, and not sure what conclusion to draw now.  I unplugged the monitor from that card (which is in the primary 16x pcie slot), and plugged it into the card sitting in the 4x slot.  I'm actually not sure if this should even work or not (does the monitor HAVE to be plugged into the card in the primary slot), but regardless, I still couldn't remote desktop into it, effectively, it behaved the same as when the monitor was plugged into that card.  It booted, but hang, and was unresponsive.  It would respond to pings, but remote desktop would begin loading, but then hang too. 

Can you recommend a stress testing application?  I'll go and google it too.

Thanks again.
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
Hi,

I've tried googling this, but haven't managed to find much on the topic.  I thought folks here might be able to offer some concrete tips and advice.

Basically, I'm suspecting one of the GPUs in my mining rig of being close to expiry, but have absolutely no way of really confirming (or disproving) this.  Are there any "tests", or anything I can do to at least fairly accurately determine the card is (or isn't) dying, or is it all pretty much undeterminable with any real degree of certainty?

Stress testing is the way to go. Since mining is basically stress testing you might up the intensity and see how many HW errors you get. Go lower with the intensitiy untill the card can run without or very few HW errors.

For info, the card is a Sapphire R9 270 (non-X), and it's only about 4 months old.  It has been mining nearly 24/7 in those 4 months, with temperatures between 75 and 80°C for 99% of the time, and just over 80, but not over 85 for literally a few hours in its entire life.

Sounds like bad cooling. Sure the GPU can take the heat but its not going to last long under these conditions.

Further info, the reason I'm suspecting the card (which is a primary card in my rig, with the monitor connected to it) is that, although it still mines stably, the GUI on this BAMT based rig often hangs, and I recently tried switching over to PiMP, and again, when booting up the rig from a freshly imaged brand new USB stick, the GUI just hangs for about 10-15 minutes, then sort of comes back to life, but it's still laggy and largely unusable.  Someone in the PiMP irc support channel suggested my card might be dying and that got me thinking if that really might be the case, and hanging GUI could be the symptom.  Find that odd though, displaying the GUI (in my non-expert opinion) should be a fairly basic and simple task for the card, far easier than mining, which the card can still do.

Thanks in advance.


Yes GUI is "easier" but thats not the matter. Usually the memory dies first and calculating the GUI uses the same memory as calculating hashes. Try cranking up the intensity and see how much the card still can handle. You also should improve the cooling.
member
Activity: 71
Merit: 11
Hi,

I've tried googling this, but haven't managed to find much on the topic.  I thought folks here might be able to offer some concrete tips and advice.

Basically, I'm suspecting one of the GPUs in my mining rig of being close to expiry, but have absolutely no way of really confirming (or disproving) this.  Are there any "tests", or anything I can do to at least fairly accurately determine the card is (or isn't) dying, or is it all pretty much undeterminable with any real degree of certainty?

For info, the card is a Sapphire R9 270 (non-X), and it's only about 4 months old.  It has been mining nearly 24/7 in those 4 months, with temperatures between 75 and 80°C for 99% of the time, and just over 80, but not over 85 for literally a few hours in its entire life.

Further info, the reason I'm suspecting the card (which is a primary card in my rig, with the monitor connected to it) is that, although it still mines stably, the GUI on this BAMT based rig often hangs, and I recently tried switching over to PiMP, and again, when booting up the rig from a freshly imaged brand new USB stick, the GUI just hangs for about 10-15 minutes, then sort of comes back to life, but it's still laggy and largely unusable.  Someone in the PiMP irc support channel suggested my card might be dying and that got me thinking if that really might be the case, and hanging GUI could be the symptom.  Find that odd though, displaying the GUI (in my non-expert opinion) should be a fairly basic and simple task for the card, far easier than mining, which the card can still do.

Thanks in advance.

Jump to: