Good morning everyone,
I am looking for ideas and help. I was debugging one of my unstable rigs running 7 RX580s.
Suddenly, I've seen one of the fans set to 0%. Whatever I have tried I was unable to start it again. To me, it looked like the driver has decided to refuse fan speed modification.
Here is what fanspeed.sh reports when trying to adjust the speed:
cat: /sys/class/drm/card3/device/hwmon/hwmon3/pwm1: Invalid argument
/root/utils/fanspeed_RX.sh: line 84: echo: write error: Invalid argument
The wolframd reports the following (--set-fanspeed):
Failed to read a fan sysfs entry for GPU 3.
The card is completely operational, and it responds to every other command. Of course, it will overheat in a matter of seconds.
Can someone offer an advice?
And at the end a note to tytanick: I hope you are aware of the fact that the GPU orders are not always sequential! On this particular system I have two different batches of RX580 and there is *a gap* in card numbering - there is no card with index 1 on this system. Cards start to enumerate with index 0, index 1 is not existent and then cards are indexed from 2 to 7. I am speaking, of course, about the system card index and not the GPU numbers given by the claymore driver which are sequential 0-6.
I wish everyone a nice day