Undervolting 7950 -> dead PCIe slots ? ? ?

eroxors

legendary

Activity: 924

Merit: 1000

Think. Positive. Thoughts.

Quote from: bizaro on May 02, 2013, 01:12:19 AM

Quote from: eroxors on May 02, 2013, 12:23:00 AM

It's sounding like a software issue? Have you tried on another platform/OS installation? Also, you might check your bios version and flash it to latest. What is the specific motherboard model and power supply model?

I think it could be two things:
My MSI Z77Z-GD80 has 3/7 slots which only see my cards as 'Standard VGA Adaptors' - I've tested the mobo on 2 PSUs, and multiple GPUs which can hash as a single for 10-20m without crashing. These slots worked previously. Also, I reset the bios, and tried the "B" bios on this motherboard and the exact same thing.
Current plan for tomorrow: check drivers are current to be thorough. Otherwise I'm considering calling MSI...
+

My GPU bios' might be corrupted? I'm thinking this because I wiped my OS on rig#2 (all PCIe slots work, no fishy business with core system) now that I had free time today. After hooking up my GPUs which hashed fine for ~10-12m at stock, I let 2 of them hash away and after ~2 hours one crashed. Reset drivers, and other card crashed after similar amount of time.

For the MSI system - I also reinstalled win7 today (I had a lot of background time) to ensure I had clean 13.1 drivers, and swapped PSUs -> same problem as before with 3/7 slots, and even on the 'good' slots GPUs crash after a while.

The strange thing is that if the GPU bios' are corrupted, it happened to all cards on both systems nearly simultaneously. They both were running through a surge protector but I'm at a loss as the initial cause unless these particular cards (sapphire 7950 vapor-x) are vulnerable when underclocked to bios corruption.

I'm kind of running with the noob assumptions of GPU bios so I could be waaayyy off.

Mobo: MSI Z77A-GD80
PSU: Rosewill LIGHTNING-1300
PSU2: Kingwin 850 (one i just pulled out today)

Thanks again for all your help it's extremely appreciated!

Ram and Power are two primary causes of instability... the MOBO/Cards are proven working, so I would examine those areas as well as software (i.e. try linux or some other OS).

Best of luck.

bizaro

newbie

Activity: 18

Merit: 0

Quote from: eroxors on May 02, 2013, 12:23:00 AM

It's sounding like a software issue? Have you tried on another platform/OS installation? Also, you might check your bios version and flash it to latest. What is the specific motherboard model and power supply model?

I think it could be two things:
My MSI Z77Z-GD80 has 3/7 slots which only see my cards as 'Standard VGA Adaptors' - I've tested the mobo on 2 PSUs, and multiple GPUs which can hash as a single for 10-20m without crashing. These slots worked previously. Also, I reset the bios, and tried the "B" bios on this motherboard and the exact same thing.
Current plan for tomorrow: check drivers are current to be thorough. Otherwise I'm considering calling MSI...
+

My GPU bios' might be corrupted? I'm thinking this because I wiped my OS on rig#2 (all PCIe slots work, no fishy business with core system) now that I had free time today. After hooking up my GPUs which hashed fine for ~10-12m at stock, I let 2 of them hash away and after ~2 hours one crashed. Reset drivers, and other card crashed after similar amount of time.

For the MSI system - I also reinstalled win7 today (I had a lot of background time) to ensure I had clean 13.1 drivers, and swapped PSUs -> same problem as before with 3/7 slots, and even on the 'good' slots GPUs crash after a while.

The strange thing is that if the GPU bios' are corrupted, it happened to all cards on both systems nearly simultaneously. They both were running through a surge protector but I'm at a loss as the initial cause unless these particular cards (sapphire 7950 vapor-x) are vulnerable when underclocked to bios corruption.

I'm kind of running with the noob assumptions of GPU bios so I could be waaayyy off.

Mobo: MSI Z77A-GD80
PSU: Rosewill LIGHTNING-1300
PSU2: Kingwin 850 (one i just pulled out today)

Thanks again for all your help it's extremely appreciated!

eroxors

legendary

Activity: 924

Merit: 1000

Think. Positive. Thoughts.

It's sounding like a software issue? Have you tried on another platform/OS installation? Also, you might check your bios version and flash it to latest. What is the specific motherboard model and power supply model?

bizaro

newbie

Activity: 18

Merit: 0

Quote from: gpudude on May 01, 2013, 03:39:45 PM

dont think it would damage your motherboard... have you checked the psu rails with a voltmeter?

No I haven't, but I do have a voltmeter. Do you have any recommended guides? If no I'm looking at this one http://forums.extremeoverclocking.com/t137886.html

What rails would you measure (same as guide? 12v 5v 3.3v?) ?

gpudude

full member

Activity: 126

Merit: 100

GPUDude

dont think it would damage your motherboard... have you checked the psu rails with a voltmeter?

sinergy

newbie

Activity: 14

Merit: 0

Bizaro, this is taken from cgminer:

GPU 0: 63.0C 3519RPM | 633.0M/623.9Mh/s | A:28539 R: 91 HW:0 U: 8.69/m I:13
GPU 1: 67.0C 3471RPM | 590.2M/587.0Mh/s | A:26510 R: 54 HW:0 U: 8.07/m I:10
GPU 2: 66.0C 3481RPM | 635.1M/626.0Mh/s | A:28425 R:103 HW:0 U: 8.66/m I:13

GPU 1 has lower rates because it is underclocked to 1150 MHz core and a little bit undervolted as well (Trixx voltage set to 1200 and GPU-Z is showing 1.144). I did this because it is located on the mobo itself and can not get very good cooling. The other two cards are raised with PCIe 1x to 16x risers and they get better cooling.

At the moment we are mining a single block on Slush's pool for almost 5 hours now. The worker statistics show 1851.842 MHash/s (all three cards) and total 7598 shares for this block.

When undervolting just test if the cards are stable. For instance when I was testing mine, if a card was not stable, it was restarted (the monitor blinks and looses the clock values). You can also test with 3D Mark or some stress tests. As if the mobo is fried, unfortunately I am not sure how we can really test this without putting the cards on different motherboard. However what PSU are you using? It would be easier if you can try to replace the PSU and see what is going on. I generally do not believe in coincidences but just do not see how this can happen from the undervolting.

bizaro

newbie

Activity: 18

Merit: 0

Thanks Sinergy,

What is you hash-speed with those settings (just curiosity)?

I mainly need help determining if my mobo is fried?

If so hypothesis as to how from undervolting it?

And, what test would you run on your cards (how long do they need to hash as stock settings; or some other test) to determine if they're damaged?

Or any other suggestions as to do some hardware testing/troubleshooting.

I was mining stably for a few weeks, and never had overheating, driver crashing problems, etc. Until I undervolted the cards for half a day and both my rigs crashed.

sinergy

newbie

Activity: 14

Merit: 0

Quote from: esenminer on April 30, 2013, 05:55:28 PM

Just slightly off topic but what settings - if i may ask

- are you using for the Sapphire 7950s? The one in my rig is running at an avg of 603 kh/s, getting it to 630 would be nice. Currently I am using cgminer with

shaders = 1792
intensity = 20
engine = 1075
memory = 1500
w = 256
thread-concurrency = 25088
lookup-gap = 2
threads = 1
auto-fan

and nothing else. I experimented for a day or two with all sorts of settings but this was the fastest I could come up with. Have I missed something or is it just my card? The model I am using is

http://www.newegg.com/Product/Product.aspx?Item=N82E16814202026

EDIT

Just found a tip somewhere else on the forums here - enabling powetune +20 effectively turns it off and stops the cards from trying to save power. it increased my hashrate from 603 kh/s to an avg of around 635 kh/s - thanks roy7 for that tip.

I am currently mining bitcoin/sha256 and my memory is underclocked so can't really tell for litecoin. However in cgminer my settings are -I 13 -w 256. The cards are stable at 1200 core 300 memory (no crossfire) and for power GPU-Z is showing:

vddc 1.156 with peaks to 1.163
cddc current ~56 A
vddc power ~67 W
temp - from 64 to 70 top but they are not in case

Can also post a screen shot. The GPU is Sapphire with Boost @925 core stock and device ID 1002-679-A (can post link to the Sapphire site, but I am not sure if this will be violation of the forum rules).

Originally I also tested overclocking the memory and it reached about 1550 MHz stable with 1200 core so they should be good for LTC as well. I am just mining BTC at the moment as had some issues with CGMiner and windows.

Hope this helps.

bizaro

newbie

Activity: 18

Merit: 0

Quote from: eroxors on April 30, 2013, 05:16:05 PM

I've never heard of undervolting hurting motherboards. I would reset your bios (unplug from wall, remove battery, switch cmos switch, wait 30, reverse) and try system restore or last-known good on bootup. Should fix the issue.

So, reset bios, reinstalled drivers. Teseted all GPUs to see if they could hash for 10m stock (probably not a long enough test to determine health): 7 passed, 1 causes system to boot to some 'corrupted' / must repair mode; I don't remember specificly but I've never seen it before - sounds dead.

For the motherboard, I get 4 slots that recognize 'passed' cards, and 3 only recognizing them as 'standard VGA adapter'. Ideas?

Also, I took two 'passed' cards at stock, set to mine over night. One crashed at ~2 hours, another at ~3 hours. So frustrating.

In a few hours I'll be able to start QC on my other rig and can cross-compare gpu's on that one

Quote from: eroxors on April 30, 2013, 05:16:05 PM

+1 This is where I would start as well, unless the real issue is the SO jamming a fork into it while he is a work...

Good thinking, but 1) I know she was out of town, 2) I trust her.

Quote from: esenminer on May 01, 2013, 01:00:01 AM

I'm using cgminer directly on a headless machine running xubuntu 12 with catalyst 13.3 drivers. You can try adding the --gpu-powertune 20 to the cgminer parameters but it seems like GUIMiner is already doing this since you're hitting 635 and if you\re using GUIMiner I'm assuming you have a screen connected - which intensity 20 that might be causing the errors. Try using cgminer directly by unplugging the monitor and sshing into the machine and running it. or maybe just run guiminer and unplug the monitor and see if your pool results change

I unplug the monitor once the system start hashing. I'm in the process of trying to decide on a good headless method I can monitor remotely. Your system sounds reasonable. Do you use anything to monitor remotely?
I'll keep doing my research as I haven't sshed before and am new to networking, but I already run mint on my main system so crossing over to linux wouldn't be too bad.

esenminer

full member

Activity: 126

Merit: 100

Quote from: bizaro on April 30, 2013, 11:55:16 PM

Quote from: esenminer on April 30, 2013, 05:55:28 PM

Just slightly off topic but what settings - if i may ask

- are you using for the Sapphire 7950s? The one in my rig is running at an avg of 603 kh/s, getting it to 630 would be nice. Currently I am using cgminer with

shaders = 1792
intensity = 20
engine = 1075
memory = 1500
w = 256
thread-concurrency = 25088
lookup-gap = 2
threads = 1
auto-fan

and nothing else. I experimented for a day or two with all sorts of settings but this was the fastest I could come up with. Have I missed something or is it just my card? The model I am using is

http://www.newegg.com/Product/Product.aspx?Item=N82E16814202026

EDIT

Just found a tip somewhere else on the forums here - enabling powetune +20 effectively turns it off and stops the cards from trying to save power. it increased my hashrate from 603 kh/s to an avg of around 635 kh/s - thanks roy7 for that tip.

Can you share your powertune resource?

I was getting 635khs with:

intensity = 20
engine = 1075
memory = 1500 (looks like we both found the magic numbers for these cards)
w = 256
thread-concurrency = 21712
vectors = 1
threads = 1
auto-fan
1.25v
in GUIminer using cgminer

lopheaded, eroxors -- I'm in progress testing, will update after I have some data

I'm using cgminer directly on a headless machine running xubuntu 12 with catalyst 13.3 drivers. You can try adding the --gpu-powertune 20 to the cgminer parameters but it seems like GUIMiner is already doing this since you're hitting 635 and if you\re using GUIMiner I'm assuming you have a screen connected - which intensity 20 that might be causing the errors. Try using cgminer directly by unplugging the monitor and sshing into the machine and running it. or maybe just run guiminer and unplug the monitor and see if your pool results change

bizaro

newbie

Activity: 18

Merit: 0

Quote from: esenminer on April 30, 2013, 05:55:28 PM

Just slightly off topic but what settings - if i may ask

- are you using for the Sapphire 7950s? The one in my rig is running at an avg of 603 kh/s, getting it to 630 would be nice. Currently I am using cgminer with

shaders = 1792
intensity = 20
engine = 1075
memory = 1500
w = 256
thread-concurrency = 25088
lookup-gap = 2
threads = 1
auto-fan

and nothing else. I experimented for a day or two with all sorts of settings but this was the fastest I could come up with. Have I missed something or is it just my card? The model I am using is

http://www.newegg.com/Product/Product.aspx?Item=N82E16814202026

EDIT

Just found a tip somewhere else on the forums here - enabling powetune +20 effectively turns it off and stops the cards from trying to save power. it increased my hashrate from 603 kh/s to an avg of around 635 kh/s - thanks roy7 for that tip.

Can you share your powertune resource?

I was getting 635khs with:

intensity = 20
engine = 1075
memory = 1500 (looks like we both found the magic numbers for these cards)
w = 256
thread-concurrency = 21712
vectors = 1
threads = 1
auto-fan
1.25v
in GUIminer using cgminer

lopheaded, eroxors -- I'm in progress testing, will update after I have some data

lopheaded

member

Activity: 70

Merit: 10

Quote from: eroxors on April 30, 2013, 05:16:05 PM

I've never heard of undervolting hurting motherboards. I would reset your bios (unplug from wall, remove battery, switch cmos switch, wait 30, reverse) and try system restore or last-known good on bootup. Should fix the issue.

+1 This is where I would start as well, unless the real issue is the SO jamming a fork into it while he is a work...

esenminer

full member

Activity: 126

Merit: 100

Just slightly off topic but what settings - if i may ask

- are you using for the Sapphire 7950s? The one in my rig is running at an avg of 603 kh/s, getting it to 630 would be nice. Currently I am using cgminer with

shaders = 1792
intensity = 20
engine = 1075
memory = 1500
w = 256
thread-concurrency = 25088
lookup-gap = 2
threads = 1
auto-fan

and nothing else. I experimented for a day or two with all sorts of settings but this was the fastest I could come up with. Have I missed something or is it just my card? The model I am using is

http://www.newegg.com/Product/Product.aspx?Item=N82E16814202026

EDIT

Just found a tip somewhere else on the forums here - enabling powetune +20 effectively turns it off and stops the cards from trying to save power. it increased my hashrate from 603 kh/s to an avg of around 635 kh/s - thanks roy7 for that tip.

eroxors

legendary

Activity: 924

Merit: 1000

Think. Positive. Thoughts.

I've never heard of undervolting hurting motherboards. I would reset your bios (unplug from wall, remove battery, switch cmos switch, wait 30, reverse) and try system restore or last-known good on bootup. Should fix the issue.

sinergy

newbie

Activity: 14

Merit: 0

I am also using 3 x Sapphire 7950 but with some cheap GB mobo. The voltage is very good (about 1.56 each) and I undervolted one of the cards to 1.119 in order to lower temperature (it runs stable at 1150/300 core/memory and the other two are @1200 core). This is running stable for almost 10 days now and do not have any issues. What you describe definitely looks like mobo/PCIe issue but I do not see how undervolting will burn the mobo. My personal guess is that it is coincidence that it happened after you undervolted one of the cards. Some motherboards can not constantly run their PCIs on higher voltage and over time the tracks are just burning. Since yours is with several PCIe x16 the cache for this to happen is small (the vendor should have done the voltage calculations) but my bet is still on this.

bizaro

newbie

Activity: 18

Merit: 0

Hello all,

I have 2 rigs, both ran stable with 3 sapphire 7950s at 1.25v ~635khs scrypt mining untill ~2 days ago. I got a complaint from a significant other about the 'intense heat' blah blah; I retorted that my cards were only at 65C and not to worry. That only made the situation worse, go figure. So in an attempt to appease I decided to try to undervolt my cards to increase their thermal efficiency. I got tested out one card and got it to hash stable at 1.09v with minor khs loss (final ~620khs). So having to go to work I undervolted the rest of my cards with Trixx and waited for 30 min. All seemed good so I left; while at work both systems crashed within a few hours of ea/ other.

On sys one (my MSI mobo system), I ran some tests: reinstalled drivers, clean OS wipe -> neither fixed it.

Reinstalled drivers again, and tested slots individually with one tested working GPU. 3/7 slots wont let me boot using GPU as video source. Shocked

Well at least I have 3+ pcie slots on my machine with 3 GPUs... WRONG. It turns out 2 cards will hash at stock settings, but a third (even at stock settings) causes my driver to crash, regardless of the slot combo.

It looks like its the mobo?

How can undervolting damage a mobo?

I never had any temp problems (always around 65C) or stability issues before this epic crash. I haven't even had time to check out the other comp except I know it wont run it's three cards after a driver reinstall either - haven't played with slots.

ANY other ideas I can try? Super bummed I might have killed a mobo by undervolting (I know there is risk, but I believed undervolting couldnt do that)!

Much thanks in advance. Even drunken ramblings (roughly on topic) are appreciated!

Topic: Undervolting 7950 -> dead PCIe slots ? ? ? (Read 1660 times)