Pages:
Author

Topic: Reset GPU without resetting rig (Read 6345 times)

sr. member
Activity: 854
Merit: 253
l0tt0.com
May 14, 2013, 09:58:58 PM
#24
Sheesh how hard do I have to try and convince you? It wedges in there WHEN IT CRASHES.

Oh, now I see what you mean.

Did you ever look into trying to find a cure for this (short of getting AMD to fix the problem)?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 14, 2013, 09:55:07 PM
#23
The driver is absolutely wedged into the operating system kernel and unremovable. Getting the picture yet?

I do understand what you say, but I was able to unload it when it is not attached to cgminer, and then load it again. So, it becomes unremovable only when one of the cards gets sick and/or dies.
Sheesh how hard do I have to try and convince you? It wedges in there WHEN IT CRASHES.
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 14, 2013, 09:10:59 PM
#22
The driver is absolutely wedged into the operating system kernel and unremovable. Getting the picture yet?

I do understand what you say, but I was able to unload it when it is not attached to cgminer, and then load it again. So, it becomes unremovable only when one of the cards gets sick and/or dies.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 14, 2013, 09:00:29 PM
#21
But the system needs rebooting and the device is unusable. What is the purpose of detaching cgminer from the device?

To try to unload the driver and reload it.
The driver is absolutely wedged into the operating system kernel and unremovable. Getting the picture yet?
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 14, 2013, 02:03:08 PM
#20
Thats why I wrote a watchdog script, in windows, for dead / sick GPU's through the API then it simply restarts cgminer on a windows machine and continues mining like normal.

Yes, I am trying to do something similar in Linux, you are quite right.
legendary
Activity: 2450
Merit: 1002
May 14, 2013, 11:53:35 AM
#19
There is no way to reset it in linux if the driver crashes. It requires a reboot.

What is the reason that this is possible in Windows? Is the Linux driver different, or is it something to do with cgminer?
Cgminer can't keep mining on windows either, even if your driver does reset, the existing cgminer is tied to the old driver. The difference is the windows AMD driver is better than the linux AMD driver because the windows driver uses a different model where it can detach a crashed driver and then attach a fresh one. On linux, because AMD do not work with the linux kernel crew and provide a driver not supported by the linux kernel development process, it wedges itself into the kernel as best as it can which means that when the driver crashes, the linux kernel actually becomes corrupted and any process attached to it (such as cgminer) is hung in an unrecoverable state and only a reboot will fix it.

Thats why I wrote a watchdog script, in windows, for dead / sick GPU's through the API then it simply restarts cgminer on a windows machine and continues mining like normal.
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 14, 2013, 11:16:04 AM
#18
But the system needs rebooting and the device is unusable. What is the purpose of detaching cgminer from the device?

To try to unload the driver and reload it.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 14, 2013, 02:15:50 AM
#17
What exactly would you have cgminer do? It can usually keep adjusting fanspeeds and gpu engine speeds, but it is attached to the driver in such a way that it can't detach itself, and since the machine needs a reboot to fix the corruption, I'm not sure what you expect cgminer to do about it?

I guess you already answered the question: "it is attached to the driver in such a way that it can't detach itself".

I was wondering if this can be changed.
But the system needs rebooting and the device is unusable. What is the purpose of detaching cgminer from the device?
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 13, 2013, 11:13:00 PM
#16
What exactly would you have cgminer do? It can usually keep adjusting fanspeeds and gpu engine speeds, but it is attached to the driver in such a way that it can't detach itself, and since the machine needs a reboot to fix the corruption, I'm not sure what you expect cgminer to do about it?

I guess you already answered the question: "it is attached to the driver in such a way that it can't detach itself".

I was wondering if this can be changed.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 13, 2013, 10:52:00 PM
#15
Cgminer can't keep mining on windows either, even if your driver does reset, the existing cgminer is tied to the old driver. The difference is the windows AMD driver is better than the linux AMD driver because the windows driver uses a different model where it can detach a crashed driver and then attach a fresh one. On linux, because AMD do not work with the linux kernel crew and provide a driver not supported by the linux kernel development process, it wedges itself into the kernel as best as it can which means that when the driver crashes, the linux kernel actually becomes corrupted and any process attached to it (such as cgminer) is hung in an unrecoverable state and only a reboot will fix it.

I see. Getting something changed on the AMD end is almost hopeless, I guess, so I do wonder, is there a way to fix this (at least in part) on the end of cgminer?
What exactly would you have cgminer do? It can usually keep adjusting fanspeeds and gpu engine speeds, but it is attached to the driver in such a way that it can't detach itself, and since the machine needs a reboot to fix the corruption, I'm not sure what you expect cgminer to do about it?
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 13, 2013, 09:51:34 PM
#14
Cgminer can't keep mining on windows either, even if your driver does reset, the existing cgminer is tied to the old driver. The difference is the windows AMD driver is better than the linux AMD driver because the windows driver uses a different model where it can detach a crashed driver and then attach a fresh one. On linux, because AMD do not work with the linux kernel crew and provide a driver not supported by the linux kernel development process, it wedges itself into the kernel as best as it can which means that when the driver crashes, the linux kernel actually becomes corrupted and any process attached to it (such as cgminer) is hung in an unrecoverable state and only a reboot will fix it.

I see. Getting something changed on the AMD end is almost hopeless, I guess, so I do wonder, is there a way to fix this (at least in part) on the end of cgminer?
hero member
Activity: 504
Merit: 500
May 13, 2013, 09:27:10 PM
#13
The issue is because each card is assigned an ID. When the card fails, and is reset, it has a new ID, and cgminer is trying to talk to the old ID. (Thus, the old ID appears dead.)

If it is failing, you don't want it to keep running. You want to make adjustments so it does not fail. (Otherwise you will kill your hardware for good, if it keeps rebooting and using the "crashing" settings.)

Turn down your clock a few notches, or bump-up your volts a bit, or try adding more memclocks if you have those down real low. (Unless it is a temperature issue, or PSU supply issue... then you need to reduce voltage and clocks, so it draws less power and runs cooler.)
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 13, 2013, 09:26:35 PM
#12
There is no way to reset it in linux if the driver crashes. It requires a reboot.

What is the reason that this is possible in Windows? Is the Linux driver different, or is it something to do with cgminer?
Cgminer can't keep mining on windows either, even if your driver does reset, the existing cgminer is tied to the old driver. The difference is the windows AMD driver is better than the linux AMD driver because the windows driver uses a different model where it can detach a crashed driver and then attach a fresh one. On linux, because AMD do not work with the linux kernel crew and provide a driver not supported by the linux kernel development process, it wedges itself into the kernel as best as it can which means that when the driver crashes, the linux kernel actually becomes corrupted and any process attached to it (such as cgminer) is hung in an unrecoverable state and only a reboot will fix it.
sr. member
Activity: 420
Merit: 250
May 13, 2013, 09:16:35 PM
#11
It's mostly that the AMD drivers suck.

I'm guessing if you check your dmesg output you'll see something about an ASIC hang.  That's pretty much it for a GPU until it's re-initialized with a reboot.
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 13, 2013, 08:53:50 PM
#10
There is no way to reset it in linux if the driver crashes. It requires a reboot.

What is the reason that this is possible in Windows? Is the Linux driver different, or is it something to do with cgminer?
legendary
Activity: 2450
Merit: 1002
May 13, 2013, 08:35:57 PM
#9
There is no way to reset it in linux if the driver crashes. It requires a reboot.
didnt know that, well theres another good reason to mine on windows =P
which is all Ive ever done =)
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
May 13, 2013, 08:09:26 PM
#8
There is no way to reset it in linux if the driver crashes. It requires a reboot.
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 13, 2013, 05:21:12 PM
#7
Generally if the GPU is sick / dead that means the system survived an AMD driver reset already - hence the gpu stopped responding in CGMINER. I wrote a script in windows to restart cgminer if such a situation has occurred because, after restarting cgminer it continues mining w/ all GPU's once again.

In Linux, I find that it is impossible to completely quit (or kill) cgminer in such situations.
legendary
Activity: 2450
Merit: 1002
May 13, 2013, 02:50:53 PM
#6
If it was just the amd driver failure then YES! I've found with my setup that if I enable and then disable crossfire it resets the driver if that is all that has crashed after playing with o/c settings.

I am referring to a case where the GPU gets locked up, and it is declared SICK/DEAD by cgminer.

Generally if the GPU is sick / dead that means the system survived an AMD driver reset already - hence the gpu stopped responding in CGMINER. I wrote a script in windows to restart cgminer if such a situation has occurred because, after restarting cgminer it continues mining w/ all GPU's once again.
sr. member
Activity: 854
Merit: 253
l0tt0.com
May 13, 2013, 10:34:04 AM
#5
If it was just the amd driver failure then YES! I've found with my setup that if I enable and then disable crossfire it resets the driver if that is all that has crashed after playing with o/c settings.

I am referring to a case where the GPU gets locked up, and it is declared SICK/DEAD by cgminer.
Pages:
Jump to: