Author

Topic: Nvidia GPU Rig Goes Down Periodically - ASUS H170 Pro Gaming (Read 184 times)

member
Activity: 155
Merit: 11
Okay, so it happened again since I changed some bios settings. This time when I hit restart in Windows a BSOD occured with the following error code:

Video_TDR_Failure (nvlddmkm.sys) on Windows 10
https://www.drivereasy.com/knowledge/nvlddmkm-sys-video_tdr_failure-blue-screen-error-solved-on-windows-10/

Do note that I've clean reinstalled the drivers numerous times yet it continues to happen.

I just now reset GPU clocks to factory defaults and I'll see if that keeps it from happening.

I had them set at:
GTX 1080 Ti +50core +200mem
GTX 1070 Ti +100core +400mem
GTX 1070 +100core +400mem
all at 75% power limit

I don't want to run them at 100% power, so I only set clocks to default.

I'll also be running checkdisk and memtest to see if that's an issue.

*Checkdisk and memtest were good.
jr. member
Activity: 168
Merit: 2
If you in overclocked mode or in mod-bios, try to reset it back to default (factory) config.
Test it for several hours/ days and observe how it's behave.
Also no harm to check your power supply, it might be start to defect and gave your rigs inconsistent amps and voltages.
hero member
Activity: 756
Merit: 560
Run linux for a stable mining rig. None of this DDU and which driver to run nonsense.
legendary
Activity: 2310
Merit: 2073
Your power supply should be enough. Read somewhere that the latest NVIDIA drivers are unstable. Try to install these drivers 23.21.13.9077, but first delete the old using Display Driver Uninstaller (DDU). Another may be that you have a strong overclocking.
member
Activity: 155
Merit: 11
Yeah I was wondering if PSU is the cause.

I've got it setup like this:

850W PSU=
System
2x 1080 Ti

750W=
2x 1070 Ti mini
1x 1070

I have an energy meter, i'll see how much each is pulling later this evening.

I was wanting to buy that HX1200i because they are in stock and on sale. Nothing was in stock when I bought the bronze units. Everything was overpriced. The efficiency isn't that big of a deal, like $30/year or something like that. But if it's causing instability then it's worth the upgrade IMO.
legendary
Activity: 4256
Merit: 8551
'The right to privacy matters'
Problem is the past 2 days my rig stopped mining completely after I left for work. Running EWBF Miner 0.3.4b and it would say "GPU 0,1,3 stopped working". It would require a restart of teh system to work again. I've reinstalled drivers several times.

Also, before I was using Nicehash and only 1 GPU would stop working pretty much everyday at some time. That's why I switched to mining ZEC.

It doesn't really happen with only 4x GPUs, but with 5x it seems to have issues and restarting or reinstalling the driver is a temporary fix.

My idea is the motherboard has some temp monitoring feature that doesn't work properly with so many GPUs. I have a 7x AMD mining rig on a GA-Z270XP-SLI and it will run non-stop without issue.

Think I should buy another GA-Z270XP-SLI??

Or any other suggestions?

Asus H170 Pro Gaming
4GB RAM
G3900
2x EVGA GTX 1080 Ti SC2
2x Zotac GTX 1070 Ti Mini
1x EVGA GTX 1070 SC
1x EVGA 850 B2
1x EVGA 750 B2
24-Pin Dual Power Supply Adapter Cable For PC ATX Motherboard
Windows 10
Latest Nvidia Driver (also previous drivers did same thing)



psu issue is likely

and you have 1600 watts of bronze

2 1080tis   should be at 180 watts each or 70%   360
2 1070tis  should be at   105 watts each or 70%  210
1  1070ti should be at     105 watts or 70%         105

675 watts  add 75 more 750 watts


which means this psu below is what you should be using



https://www.corsair.com/us/en/Categories/Products/Certified-Refurbished/Power-Supplies/RMx-Series%E2%84%A2-RM1000x-%E2%80%94-1000-Watt-80-PLUS%C2%AE-Gold-Certified-Fully-Modular-PSU-%28NA%29-%28Refurbished%29/p/CP-9020094-NA/RF

if you want more overhead use this one

https://www.corsair.com/us/en/Categories/Products/Certified-Refurbished/Power-Supplies/HXi-Series%E2%84%A2-HX1200i-High-Performance-ATX-Power-Supply-%E2%80%94-1200-Watt-80-Plus%C2%AE-PLATINUM-Certified-PSU-%28Refurbished%29/p/CP-9020070-NA/RF
newbie
Activity: 1
Merit: 0
double check PSUs , i had recurring crash on nvidia rig , changed risers and tried countless fix to finally realise my problem was the cable feeding in energy the 1080ti.
My corsair 850 RMX cable with 2x6+2 pin can't feed the 1080ti correctly and rig crash every few hours/days, i plugged the same card with another 2x6+2 pin from EVGA SuperNOVA 750 G3 psu and rig hashing non stop for 30days+ now. Good luck !
member
Activity: 155
Merit: 11
Temps are good. I just changed some settings in BIOS and hopefully it fixes the issue.

*I just remember a setting in BIOS when I first set it up that said it could cause issues when monitoring temps; but I can't remember what or where it is exactly.
sr. member
Activity: 1008
Merit: 297
Grow with community
have you tried lowering the clocks a bit

hows your GPU temps?

did you try DSTM miner?
member
Activity: 155
Merit: 11
Problem is the past 2 days my rig stopped mining completely after I left for work. Running EWBF Miner 0.3.4b and it would say "GPU 0,1,3 stopped working". It would require a restart of teh system to work again. I've reinstalled drivers several times.

Also, before I was using Nicehash and only 1 GPU would stop working pretty much everyday at some time. That's why I switched to mining ZEC.

It doesn't really happen with only 4x GPUs, but with 5x it seems to have issues and restarting or reinstalling the driver is a temporary fix.

My idea is the motherboard has some temp monitoring feature that doesn't work properly with so many GPUs. I have a 7x AMD mining rig on a GA-Z270XP-SLI and it will run non-stop without issue.

Think I should buy another GA-Z270XP-SLI??

Or any other suggestions?

Asus H170 Pro Gaming
4GB RAM
G3900
2x EVGA GTX 1080 Ti SC2
2x Zotac GTX 1070 Ti Mini
1x EVGA GTX 1070 SC
1x EVGA 850 B2
1x EVGA 750 B2
24-Pin Dual Power Supply Adapter Cable For PC ATX Motherboard
Windows 10
Latest Nvidia Driver (also previous drivers did same thing)

Jump to: