Sometimes BAMT will create a file to disable overclocking. You have to go to " /live/image/BAMT/CONTROL/ACTIVE " and delete the file there.
In my experience I have never seen any card but 0 get flagged there, so it can be difficult to figure out which card is the culprit.
Does anybody know of a log or something which may indicate which card f'ed up? Has anybody seen any other card but 0 get flagged?
The logic is very simple. If a phoenix process goes "zombie", i.e. the Linux kernel can't get the process to respond anymore, then the GPU associated with that phoenix instance is flagged with a noOC file, and the system is cold booted. until the file is removed, bamt will not overclock that GPU any more.
Sometimes an overclocked or malfunctioning card will cause some other GPU to lock up. Sometimes it will cause *all* the GPUs to lock up. In this situation, GPU 0 or (more rarely) other gpus may be flagged when they are not actually the culprit.
There is no good solution here. Before I made mother do this, people were constantly having "bamt problems" that were nothing more than overzealous overclocking. Take a stroll through the old 0.4 thread sometime if you want to see just how bad it was. 9 out of 10 or more posts were from people who simply overclocked their cards too much.
Now, we get the same people with questions about why BAMT won't overclock their GPUs. Is this better? Probably. At least the rigs keep mining and it calls attention to the fact there is a problem.
In a perfect world, I wouldn't have to make mother babysit
But I don't see that happening. People will always think they can just apply whatever settings they used under some other OS, some other mining client, or using some other kernel/settings without doing any testing. When things crash, mother just tries to keep you mining.
Remember this: If mother is disabling overclocking on your GPU(s),
*a GPU is locking up*. The only thing that triggers mother to place the noOC file is a hung phoenix, and the only thing that hangs phoenix is a locked up GPU (at least, no one has suggested or proven otherwise yet). So, you will have to make some change to fix the problem. It is highly likely the change to fix this will be: reducing your overclocking settings on one or more cards.
(the rest isn't aimed at anyone in particular, just to address what I know will be in the head of some readers at this point)..."But but but I know these cards are stable at XXX mhz because blah blah".
No.
You know that they were stable at that speed using some particular OS, mining client, kernel and settings. That is all. It doesn't mean much if you're no longer using that exact OS, client, kernel and settings.
Some platforms use a GUI that can use 10% or more of a GPUs time just by being loaded (Ubuntu I am looking at you and your crapmaster "Unity". Windows isn't much better). BAMT's GUI uses so little resources that I haven't been able to measure it with any certainty. This means your GPU may have been getting an awful lot of breaks from mining while the OS took over every so many ms. That just isn't happening anymore. Your GPU is now spending all of it's time mining without those tiny constant breaks.
BAMT also strives to include the very best kernels for Phoenix. We often use patched kernels with additional improvements from the stock versions. This means that not only is your GPU working constantly, it may well be working harder at the same clockrate.
Put it all together and surely you can understand why the rate your GPU was stable at in (whatever) doesn't mean much when you move to BAMT, and you may very likely find a drop of some Mhz is required for stability.
The good side is that ultimately, you will find that same or higher shares submitted and at lower clock rates == generally less power. After all, you're now wasting less time and power doing stuff that doesn't get you paid.