RESTORING DRIVER--
I use PrecisionX 16 to restore the driver without rebooting the system. These are my steps for a card on Windows that has low hash (crashed driver):
1) Start or re-open PrecisionX 16.
2) Turn off K-Boost with the toggle switch (upper right-and corner).
3) Turn on K-Boost with the same toggle switch.
4) Re-select the boost profile that you prefer (important!).
5) Verify that the fan profile and boost settings are again in place, and that temperature is appropriate.
6) Close or minimize PrecisionX 16.
7) Restart miner, it should again have appropriate hash readings for boost/overclock settings.
You may also have to open nVidia control panel and reset the display resolution if your graphics now "look odd". It isn't every time, but frequently I have to reset display resolution for normal graphics. This is for my work computer, Win 7 X64, with a GTX 960 that I mine with when not playing games. The GTX 960 will get 10.6Mh/s on Quark, but if it crashes with a segmentation fault, it will only get 3Mh/s on miner restart. I then perform the steps above and restart the miner.
There is a memory leak somewhere, but I was suspecting poorly programmed flash-media websites, like my local news site. I need to reboot about once a day because of increasing memory bloat. --scryptr
I'm using an ancient ccminer so I'm not sure about the issue but it does sound like a simple soft crash to me when the card reverts back to lower P state with 405 Mhz. That is how my cards crash if I have too high OC on them or set too high intensity or accidentaly mine on the same card with 2 instances. Memory leak would imply a leak of some sort causing excess memory usage and/or slow performance degradation over time.
I hadn't checked the p state when my card degrades but I think you're right about that. I don't know anything about
a "soft crash". When I set too high intensity ccminer errors out with an out of memory error. When I start two instances
on the same card they each hash at lower rates but the card doesn't crash and never gets stuck in a degraded state. I have also seen
driver crashes due to too high OC where I lose the display for a few seconds. If the degradation is the result of some sort of soft crash
or exception why leave the gpu degraded? Why not reset automatically like it does for a hard crash? (rhetorical questions,
I don't expect an answer)
and how do you do that exactly ?
I have a different solution if the driver crashes and sets the gpu to some lower state. I just go into the device manager and disable the problematic gpu and re-enable it.
It restores the default gpu state. Then You just have to click the profile in the MSI afterburner and You have it back.
I had another degradation on my Fedora20/GTX970 mining neoscrypt and took some notes.
- The performance level was still at 2 and the GPU clock was still around 1500 (+120 OC)
- gpu utilization was 100%
- mem utilization 69%, normally 41% (of 4 GB)
- hash rate was 61K, normally 540K
This eliminates a memory leak, at least to the point of mem exhaustion.
The GPU was pegged but only hashing at 1/9 normal, interesting that it is
almost an exact multiple.
Was it only using 1/9 of the cuda cores? If so what were the other cores doing?
Maybe a runaway process, a crash seems unlikely.
Could there be a problem with the way cuda processes are killed when the host process
dies leaving an orphan still running?
nvidia-smi has a reset command and a process monitoring command but I'm not sure
if they work on lowly geforce cards. I'll try that next time.