Nvidia GPU Mining Problems - page 5.

JaredKaragen

legendary

Activity: 1848

Merit: 1166

My AR-15 ID's itself as a toaster. Want breakfast?

https://bitcointalksearch.org/topic/m.15551454

tbearhere

legendary

Activity: 3164

Merit: 1003

A note 7-21-2016
My rig mined all day with all cards with MC.

Room temp 92f humidity 35%
For the four days before that the rig crashed at only 87f humidity 85%
Possibility the humidity played some role in the crashing on with the 970gtx that mines in p2 clocks default 1413 core.
Maybe micro dust.
This weeks temp are supposed to be record breaking 100f......... humidity.?
Slowly one thing at a time to pin down the 3 things contributing to the crashing.
Due to heat I will have to shut down the rig for sometime during the day for the next 5 days.

EDIT:7-22-16 rig mined all day today with MC room temp max 95f hum 45% the only change I did was turn off oc'ing and set "delay": 15 to "delay": 30.
So one problem is heat related cards changing algo's going from the p2 state to p8 state in order to mine or some call it spin down time.
The hotter it is the longer it takes for spin down.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: antonio8 on July 17, 2016, 08:01:49 PM

Not sure if this will help you at all but I was having a issue similar.

I had one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read [email protected] before the switch and after the card that was crashing read [email protected]. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.

Thank you antonio8.
Glad you found your problem.

I have done the switching but I need to look into it in detail.
But I have had 3 970, I think all from the same batch or lot, and all were not mining in the p2 state at 1178 core. They were at about 1423 core clocks. Going to call gigabyte again asap.
Will look into switching slots again.
It's as if they where oc'd without oc'ing them.
Thx

tbearhere

legendary

Activity: 3164

Merit: 1003

Still doing great on lyra2v2 stright mining almost 20 hours. Next is to do Minercontrol without the 970 in the confg file.
Do to things I must attend to.. I may not be able to post until Thursday.
Thx

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: antonio8 on July 17, 2016, 08:01:49 PM

Not sure if this will help you at all but I was having a issue similar.

I had one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read [email protected] before the switch and after the card that was crashing read [email protected]. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.

I'm just guessing but it look like the notation means @. That would mean the slots
were running at PCIe v1.1 even though the slots and cards support v3. I have no idea why swapping slots would change that.
Glad you got it fixed.

antonio8

legendary

Activity: 1400

Merit: 1000

Not sure if this will help you at all but I was having a issue similar.

I had one 970, two 960 and one 750ti in a rig and I had one card that would always crash while mining certain algos. It was always the same 960 and sometimes a few minutes and sometimes a few hours.

I used Nvidia Inspector to determine this as it always showed the same card with the fan down to 0% after the crash. I tried new risers (powered usb) and the same thing. I was starting to think I had a bad card so I switched the 960's only in each riser to narrow it down. Low and behold the card that was having no issues started crashing and the card that was crashing would not anymore. Same thing after switching and same algos. The card that was crashing would not at all any more.

I was perplexed. ran out of ideas and I had one last thought. I just switched the riser connector on the mother board that was crashing with my 750ti pci-e slots. I looked in Nvidia Inspector and noticed my 960 that was crashing had switched PCI Interface numbers. Both 960 cards read [email protected] before the switch and after the card that was crashing read [email protected]. Now it reads the same as my 750ti. So both 960's read different values and I have not had a single issue since.

I am no expert and have no idea why it fixed it but it did.

Hope that might help and hope it made sense.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: joblo on July 17, 2016, 05:50:17 PM

Quote from: tbearhere on July 17, 2016, 04:49:12 PM

Quote from: joblo on July 17, 2016, 03:52:59 PM

Quote from: tbearhere on July 17, 2016, 02:06:57 PM

Quote from: Spiffy_1 on July 17, 2016, 01:35:01 PM

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.

If the 970 is not mining, why is it so hot?

It's in a room.. temp 95 f 35 c and next to the other cards..
At room temp that card mines very cool about 68c on some algo's.
If I run the other 970 gtx I have all by itself all cards removed .. it will crash immediately.
I think I have a bad batch of 970 gtx cards.

Which temp is for which card? It looks like the third card is at 79, isn't that ccminer GPU #2?
Anyway if the cards crash at stock settings and reasonable temps they are defective.
Double faults can easilly get you going in circles trying to troubleshoot.

3rd card is a 980ti 79c is what i set as max going to lower it.

tbearhere

legendary

Activity: 3164

Merit: 1003

Doing good.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: tbearhere on July 17, 2016, 04:49:12 PM

Quote from: joblo on July 17, 2016, 03:52:59 PM

Quote from: tbearhere on July 17, 2016, 02:06:57 PM

Quote from: Spiffy_1 on July 17, 2016, 01:35:01 PM

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.

If the 970 is not mining, why is it so hot?

It's in a room.. temp 95 f 35 c and next to the other cards..
At room temp that card mines very cool about 68c on some algo's.
If I run the other 970 gtx I have all by itself all cards removed .. it will crash immediately.
I think I have a bad batch of 970 gtx cards.

Which temp is for which card? It looks like the third card is at 79, isn't that ccminer GPU #2?
Anyway if the cards crash at stock settings and reasonable temps they are defective.
Double faults can easilly get you going in circles trying to troubleshoot.

Spiffy_1

full member

Activity: 235

Merit: 100

After it crashes your cards are just producing errors. It may seem silly but try stress testing your individual cards with furmark. If the cards themselves are bad, they should produce artifacts. Plus you can monitor the temperatures. there is a bootable usb linux for mining called KopiemTu that you could try as well. It isn't as user friendly but if you can get it mining then that eliminates the operating system.

antantti

legendary

Activity: 1176

Merit: 1015

I haven't been following this thread but if my afterburner panel would look like that I would first go to fan tab.

And then sell those 750ti's. I sold after 500 days of mining thinking that aftermarket price would tank. I was wrong.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: joblo on July 17, 2016, 03:52:27 PM

Quote from: tbearhere on July 17, 2016, 01:16:06 PM

Yes exactly.... it crashed... card #2 again 970 gtx now going to mine without it in the bat file on lyra2v2 only no switching. But the crash .....it was mining with only the 970 and scrolling fast but mining.
I have a lot of things that must get done... so I will be on and off the thread for the next day or 2. Undecided

If it still crashes I will remove the card and try that next.

Thx

This is not good hashing, that card is sick.

That is after it crashed the drivers. Scrolling.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: joblo on July 17, 2016, 03:52:59 PM

Quote from: tbearhere on July 17, 2016, 02:06:57 PM

Quote from: Spiffy_1 on July 17, 2016, 01:35:01 PM

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.

If the 970 is not mining, why is it so hot?

It's in a room.. temp 95 f 35 c and next to the other cards..
At room temp that card mines very cool about 68c on some algo's.
If I run the other 970 gtx I have all by itself all cards removed .. it will crash immediately.
I think I have a bad batch of 970 gtx cards.

Spiffy_1

full member

Activity: 235

Merit: 100

38 degrees is something you would see under water at idle. Unless you're getting valid hash messages (yay!, Yes!) your card isn't doing what it is supposed to be doing. Your card can be overclocking itself due to boost clocking. If the card is spitting out numbers faster than you can see them scroll you're producing nothing but errors and garbage. Perhaps you have a corrupt version of ccminer or cudaminer? Try getting the newest 1.8 or redownloading it, My cards under load hit 58 degrees and thats under water with 7 120sx radiators with an ambient temperature of 24 degrees. 75 to 80 degrees under load for air cooling would be as high as I would push your cards. Since you're using msi afterburner, you can tune each card individually to stock settings if you like. I have seen issues with shady power supplies not providing enough voltage to cards, as well as motherboards that don't like 3 cards on the same motherboard. I know its frustrating, but my suggestion is to start with the lowest videocard. Take every other card out, and get that one stable. Then repeat for the other cards one at a time. We are checking to see if your cards are bad. If the cards are the same, you can put them together after verifying that they work individually. And double check to make sure SLI isn't enabled by accident. That will pock up mining. I found that one out the hard way.

You could also try updating your nvidia drivers. you're a few revisions behind.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: tbearhere on July 17, 2016, 02:06:57 PM

Quote from: Spiffy_1 on July 17, 2016, 01:35:01 PM

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.

If the 970 is not mining, why is it so hot?

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: tbearhere on July 17, 2016, 01:16:06 PM

Yes exactly.... it crashed... card #2 again 970 gtx now going to mine without it in the bat file on lyra2v2 only no switching. But the crash .....it was mining with only the 970 and scrolling fast but mining.
I have a lot of things that must get done... so I will be on and off the thread for the next day or 2. Undecided

If it still crashes I will remove the card and try that next.

Thx

This is not good hashing, that card is sick.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: Spiffy_1 on July 17, 2016, 01:35:01 PM

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

I'm running lyra2v2 right now without the 970 gtx in the bat file. Running great. Take a look on the OP.
The 970 gtx is at 1413 core clock.. should be 1178..I think that is one of the problems.
I'm just running the 2 980ti and 1 750ti test run.

Spiffy_1

full member

Activity: 235

Merit: 100

I'm wondering if it is a heat issue. Can you run a temp monitoring program and observe the temperatures up to and during a crash? Try also not using miner control and just mine one select coin and see if you can make 12 hours. One card at a time, repeat process. If you're using a scrypt like JK's, then you aren't the only one getting crashes every 6 hours(mine does as well on 1.03). My theory on the scrypt switching is it fails to close the non profitable miner before opening up the other, and with high intensities(which is another issue you might be running into(if you're mining with -i intensity try removing that altogether))it causes a memory overlap and crash.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: DrkLvr_ on July 17, 2016, 12:33:48 PM

i would recommend to install linux.. try with a separate hard drive as you are bound to have problems the first time.. but there are guides on installing the driers, cuda and all that.. my linux rigs run for months without crashing

apart from that you can try booting in safe mode, i had a 6x750ti rig running windows7 that would (surprisingly) mine in safe mode without any issues. otherwise it would crash every few hours for different reasons. however in other rigs using windows 2012 hashing in safe mode does not work

Thx DrkLvr ... some time in the future I may do that.. thx

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: joblo on July 17, 2016, 12:17:30 PM

Quote from: tbearhere on July 17, 2016, 11:51:26 AM

Quote from: joblo on July 17, 2016, 10:56:53 AM

You're juggling too many things at once, start isolating cards. If you have multiple faults you have to seperate them.

You seem to have an issue triggerred by heat, you need to find out which card it is and you need to monitor
the GPU temperatures to confirm it's temp related.

That's what I was doing... now im mining one algo only and so far no crashes.

Room temp 86 f.
I'm also posting extra info for my own notes.

OK, so your following the algo switching lead. Following up with that, has it failed imediately after a reboot,
ie the first algo starts up but crashes after a few minutes? Or does it run fine for a while algo switching and not
crashing until suddenly it crashes? Monitor the GPU temperatures, not the room remperatures, to see if there
is a correlation.

You will also need to identify which card is crashing. If you don't want to test one card at a time you can test minus 1
card at a time, remove one card, test, resinstall, remove a different card, repeat.

Yes exactly.... it crashed... card #2 again 970 gtx now going to mine without it in the bat file on lyra2v2 only no switching. But the crash .....it was mining with only the 970 and scrolling fast but mining.
I have a lot of things that must get done... so I will be on and off the thread for the next day or 2. Undecided

If it still crashes I will remove the card and try that next.

Thx

Topic: Nvidia GPU Mining Problems - page 5. (Read 7019 times)