Author

Topic: Troubleshooting OC & Stability (Read 382 times)

hero member
Activity: 2506
Merit: 645
Eloncoin.org - Mars, here we come!
September 21, 2017, 12:07:43 PM
#10
The same way that you OC in any other situation. Clock up until there are issues and slowly back down. This is true in any situation where you OC. It is one of the simplest concepts in computer hardware and crypto mining does not change that at all. Thats the simple old concept and it does not changes.
full member
Activity: 140
Merit: 100
September 20, 2017, 02:16:44 PM
#9
I know that pimps next release will attempt to have a unified logging structure to see what exactly is going on but for right now you have to make do with claymores logging. For the fan issues, make sure you have nothing set in the claymore config file if you want the fans t obe managed by the json settings. A lot of people miss that and leave the claymore settings in which conflict with the system settings.

Yeah I removed those bits from the claymore config & was able to find the claymore logs(think that's good enough for now). I'm still trying to determine if it's worth the work to set different OC rates for each card or just set a conservative # for all. We're only talking 1-2MH/s difference for the whole rig, but if it goes down for a few hours than that extra MH isn't even worth it.

Another thing is that 2 of my rigs went down around the same time last night. My only assumption is that maybe the internet went out or something idk. I heard someone here mention they used a cellular hotspot as a backup & while latency would obviously be higher/more stale, if internet went down it'd be better than nothing..
hero member
Activity: 756
Merit: 560
September 20, 2017, 01:20:41 PM
#8
I know that pimps next release will attempt to have a unified logging structure to see what exactly is going on but for right now you have to make do with claymores logging. For the fan issues, make sure you have nothing set in the claymore config file if you want the fans t obe managed by the json settings. A lot of people miss that and leave the claymore settings in which conflict with the system settings.
full member
Activity: 140
Merit: 100
September 20, 2017, 01:16:49 PM
#7

Also isn't there a way to just have watchdog restart the miner if it goes unstable? that way even if it's close to stable but crashes every few days you don't stop mining until you notice/get notified?



What miner are you using? If it's Claymore for ETH you can make a reboot.bat and put -r 1 in your miner config bat.

Code:
-r   Restart miner mode. "-r 0" (default) - restart miner if something wrong with GPU. "-r -1" - disable automatic restarting. -r >20 - restart miner if something
   wrong with GPU or by timer. For example, "-r 60" - restart miner every hour or when some GPU failed.
   "-r 1" closes miner and execute "reboot.bat" file ("reboot.bash" or "reboot.sh" for Linux version) in the miner directory (if exists) if some GPU failed.
   So you can create "reboot.bat" file and perform some actions, for example, reboot system if you put this line there: "shutdown /r /t 5 /f".

Wouldn't I put this in claymore.dual.pcfg ? 

Another thing I've noticed is that every time I reboot my rigs I have to run "gputool --config" for all the cards fan speeds to get set agin. Usually it's just 1 or 2 that will be something other than what's in my config, but it's annoying especially since "gputool --config" is in the startup file. I followed https://getpimp.org/faq/fancontrols/ & have a feeling that it has something to do with where it says "The old oc.sh, nvidia-config.json, nvidia-fanspeed.sh, etc stuff can be removed from /root/startup" but they're not there in the 1st place so maybe there's something elsewhere trying to set default fan speeds etc.
I like that PiMP is plug & play. < 5 minutes from 1st boot to mining great, but long term I might just run Ubuntu & install everything I need myself. I guess that's kinda what nvOC is though.
full member
Activity: 140
Merit: 100
September 20, 2017, 01:03:04 PM
#6

Also isn't there a way to just have watchdog restart the miner if it goes unstable? that way even if it's close to stable but crashes every few days you don't stop mining until you notice/get notified?



What miner are you using? If it's Claymore for ETH you can make a reboot.bat and put -r 1 in your miner config bat.

Code:
-r   Restart miner mode. "-r 0" (default) - restart miner if something wrong with GPU. "-r -1" - disable automatic restarting. -r >20 - restart miner if something
   wrong with GPU or by timer. For example, "-r 60" - restart miner every hour or when some GPU failed.
   "-r 1" closes miner and execute "reboot.bat" file ("reboot.bash" or "reboot.sh" for Linux version) in the miner directory (if exists) if some GPU failed.
   So you can create "reboot.bat" file and perform some actions, for example, reboot system if you put this line there: "shutdown /r /t 5 /f".

Right. That will at least keep me mining if something happens at 2AM & I don't catch it till morning. But ideally I'd like to know what/when issues happen, hence a log of the card's errors etc would be perfect.
full member
Activity: 140
Merit: 100
September 20, 2017, 01:00:57 PM
#5
First things first, since hardware quality varies card to card there is no exact science other than testing as you said. If you have a large enough farm you can go with some very conservative clocks and see how everything reacts.

As far as longs or finding out problems you havent even mentioned which miner or OS you are using. Nobody will be able to answer your questions without a few more specifics.

Good catch. I'm using PiMP at the moment on Claymore. The best I've found was /var/log/minelog but only shows the miner starting/stopping etc.  
newbie
Activity: 12
Merit: 0
September 20, 2017, 01:00:10 PM
#4

Also isn't there a way to just have watchdog restart the miner if it goes unstable? that way even if it's close to stable but crashes every few days you don't stop mining until you notice/get notified?



What miner are you using? If it's Claymore for ETH you can make a reboot.bat and put -r 1 in your miner config bat.

Code:
-r   Restart miner mode. "-r 0" (default) - restart miner if something wrong with GPU. "-r -1" - disable automatic restarting. -r >20 - restart miner if something
   wrong with GPU or by timer. For example, "-r 60" - restart miner every hour or when some GPU failed.
   "-r 1" closes miner and execute "reboot.bat" file ("reboot.bash" or "reboot.sh" for Linux version) in the miner directory (if exists) if some GPU failed.
   So you can create "reboot.bat" file and perform some actions, for example, reboot system if you put this line there: "shutdown /r /t 5 /f".
hero member
Activity: 756
Merit: 560
September 20, 2017, 12:46:18 PM
#3
First things first, since hardware quality varies card to card there is no exact science other than testing as you said. If you have a large enough farm you can go with some very conservative clocks and see how everything reacts.

As far as longs or finding out problems you havent even mentioned which miner or OS you are using. Nobody will be able to answer your questions without a few more specifics.
full member
Activity: 140
Merit: 100
September 20, 2017, 12:31:53 PM
#2
 Angry Woke up again today with my wife's call saying "I think the rigs are down again"..

I usually catch it pretty quick but ethermine takes about an hour before they send an inactive worker notification. I might need to wire up a loud ass alarm or something.. This shouldn't be happening!

Please can someone tell me where/if there is a log file I can review to determine exactly which card(s) are having issues?
full member
Activity: 140
Merit: 100
September 19, 2017, 05:21:05 PM
#1
I've been overclocking & undervolting my GPUs for a while now & it's no an exact science. Try something for a while & see if it crashes, if not try to squeeze a bit more out of it till it does & then back it off a bit. That's really the best thing I've figured out to do but I'm wondering if there's a better process to follow here? I could set different memory/core rates for every GPU but that's a hassle with any more thnan a few rigs, still it may or may not be worth that hassle..

What do you do? Find a decent stable rate & apply that to all GPUs? Most or?

For the life of me I cannot find a good log file that will tell me exactly what/when something goes wrong. Unless I'm watching the miner & see some issue with a card I have no idea which card did what etc..

Also isn't there a way to just have watchdog restart the miner if it goes unstable? that way even if it's close to stable but crashes every few days you don't stop mining until you notice/get notified?

I'm thinking there should be a program that can do benchmarks for your GPUs, push them a little & basically spit out ideal #s to clock it to. That would be cool Tongue
Jump to: