i'm having a problem when running more than one gpu, on a old algo like x15, the second gpu do not run at full speed, but it's not throttling, it simply sit there at low core speed, why this?
when i restart the machine for the first few seconds it begin to run at full speed, then the miner close and when i restart it once again it run at slow speed, tried also with the Yimp fork but nothing
Oh dear, I've seen this too much. Here is what I know.
Sometimes when a miner stops and restarts, usually to switch algos, it will exhibit any of the following symptoms:
- it will fail to start with the same parameters
- it will start with reduced intensity (-i)
- it will start in a degraded state, low clock
A reboot will correct the problem.
The occurrance appears to be random. Sometimes a card will get hit multiple times per day, sometimes not for several days
Only some cards are affected, some never exhibit the problem.
There is no pattern to the affected cards, Maxwell and Kepler have both been affected.
It occurs on Windows and linux.
It does not appear to be driver related.
It appears to affect only specific individual cards.
There was discussion about this several months ago. A search on "degraded" might find something.
The bottom line is it was never solved. It's possibly a HW issue.
I do have some new information, It may be heat related, possibly heat stressing a marginal component.
I recently moved an affected card to the basement where it's cooler. It had been crashing often when
profit switching. Since the move it has been more stable but I have not been profit switching with it. I only
switch manually every few days. I may setup profit switching on this PC to try to recreate the problem.
Well there ya go. It's one of those things that has been nagging me for a while. I feel better now.
You could try swapping the cards to see if the problem moves.
What you're describing is basically a card crashing without the driver not recovering.
The GPU clocks are stuck at 405 Mhz, right?
If so then yes, the card crashed and switched to the lowest and safest power state (P8 state) and the driver recovery didn't reset the card.
Driver recovery usually only starts if the card doesn't respond for a while (TDR timeout) but this is not the case and the driver doesn't care which power state a card is in.
Basically the card poops itself because of one the following:
- too much OC;
- not enough GPU memory (usually from too high intensity);
- too high intensity;
- low quality or faulty riser or power cable or PSU;
- low quality card (increasing voltage a bit helps stabilize the card)
and then it goes into safe mode.
To solve this you have to restart the PC or more conveniently disable the card in device manager (if it's not your primary card with a monitor pluged in) and re-enable it to force the driver to reinitialize the card.
Or you can use a bat with:
nvidiainspector.exe -restartdisplaydriver
to reset all cards.
It's mostly caused by too much overclock and it can crash even after days of being stable. I have a couple of cards that I know can't handle the same OC as the others so I always have lower clocks on them. Otherwise they'd always crash eventually.
Also different algos have different levels of taxing the GPU so the same card will not necessarily be able to handle all algos with the same overclock.
Some algos are overclock friendly while others will crash the card even with a mild overclock.
So if you're using a profit switching solution you have to use an overclock that works for all cards.
This is why it'd be nice to be able to set clock speeds from within the .bat file for each algo but nvidia-smi only changes the memory clocks for some reason.