Author

Topic: Do any miners support auto kill on hardware fail or dropped Mh/s rate? (Read 1444 times)

sr. member
Activity: 378
Merit: 250
Watch the temps.  It's probably just overheating and throttling down to avoid damaging the gpu.  If you're running at over 90C for any length of time you'll likely run into problems.  You really want to stick in the 60-70C range.
sr. member
Activity: 280
Merit: 252
I'm using a mix of Phoenix/polcbm-mod on my machines, but unfortunately my main rig is having hardware issues and will error out after several hours. I tried resetting my bios to factory settings (no overclocking) and maxing out my fan speed. That seemed to help a little, but still after maybe 5 hours it will inevitably decrease by a factor of 10 (!). Obviously I don't want to keep it mining at that rate, and there don't seem to be any ill effects when I restart it. The only thing I haven't tried is underclocking the GPU, which I'd rather not do since it should be running fine at stock settings. My case has very good airflow, so I don't know what's going on. Anyway this is a separate issue so...

I'm this close to creating a python/ruby wrapper just for this machine that will kill/autorestart when this happens, but it would be nice to have some built in support (warnings, stderr) if we get a hardware failure or our Mh/s is very low. Is there any tool that does this?

If I have to write one myself, I'll release it to the forum in hopes it will help other people with hardware woes.

What card brand and model are you using?

What do oyu have clocks set to?
newbie
Activity: 21
Merit: 0
Thanks for the advice. I am using Cygwin in windows and I would rather use those utilities than Autoit, or make a python script etc so it is cross platform.

I would probably have to grep the stdout for the "hardware failed" message because I haven't been able to pinpoint a temp at which it gets unstable - it just shuts down after X hours..

Right now I am trying the easier route and underclocking my card slightly. Temp seems stable at 52.
copper member
Activity: 56
Merit: 0

If you were using Linux..... you could run a script against aticonfig to get a temperature and reboot/suspend the mining for a series of seconds/minutes until it recovers.  bash has a built in "sleep 5m" (sleep 5 minutes) that would work pretty well.

For Windows, you might check out something like AutoIt.  Spawn the script every so often and see if you can grab the temperature from one of the many Windows tools form components.  If the number is < or > you can use something like -
taskkill /f /fi "Imagename eq poclbm.exe"

To stop the miner, then restart later with something like -
tasklist /fi "Imagename eq poclbm.exe"
if %errorlevel% == 0
poclbm --myoptionsgohere

To simulate a sleep in Windows, I think most people use this -
ping -w 1000 -n 5 localhost

Where 1000 is the milliseconds you want to sleep.

You can also fall back to doing a timed reboot on the hour every hour, I suppose.
newbie
Activity: 21
Merit: 0
I'm using a mix of Phoenix/polcbm-mod on my machines, but unfortunately my main rig is having hardware issues and will error out after several hours. I tried resetting my bios to factory settings (no overclocking) and maxing out my fan speed. That seemed to help a little, but still after maybe 5 hours it will inevitably decrease by a factor of 10 (!). Obviously I don't want to keep it mining at that rate, and there don't seem to be any ill effects when I restart it. The only thing I haven't tried is underclocking the GPU, which I'd rather not do since it should be running fine at stock settings. My case has very good airflow, so I don't know what's going on. Anyway this is a separate issue so...

I'm this close to creating a python/ruby wrapper just for this machine that will kill/autorestart when this happens, but it would be nice to have some built in support (warnings, stderr) if we get a hardware failure or our Mh/s is very low. Is there any tool that does this?

If I have to write one myself, I'll release it to the forum in hopes it will help other people with hardware woes.
Jump to: