Pages:
Author

Topic: cgmon - mining monitor for Linux - auto restart, reboot, sick gpu, ASIC, &more - page 2. (Read 48363 times)

sr. member
Activity: 269
Merit: 250
Perhaps you can fix your X.org startup problem?  Sounds like that would make it work reliably.
newbie
Activity: 6
Merit: 0
Hi! tnx for great script!
but it is very sad to see this on my rig at mornings

Quote
Mar 03 20:00:02 bamt-miner - cgmon 1.0.8 - sgminer running and all GPUs healthy.
Mar 03 20:02:01 bamt-miner - cgmon 1.0.8 - sgminer running and all GPUs healthy.
Mar 03 20:04:03 bamt-miner - cgmon 1.0.8 - sgminer running and all GPUs healthy.
Mar 03 20:06:11 bamt-miner - sgminer is not responding.  Rebooting.
Mar 03 20:06:21 bamt-miner -
Mar 03 20:08:44 bamt-miner - sgminer not running, starting via this command:
Mar 03 20:08:44 bamt-miner - /opt/miners/sph-sgminer/sgminer --api-listen   -c /opt/miners/sph-sgminer/sgminer.conf
Mar 03 20:08:45 bamt-miner - sgminer started successfully.  Use 'screen -r' to attach to sgminer and Control-a-d to detach.
Mar 03 20:10:01 bamt-miner - sgminer not running, starting via this command:
Mar 03 20:10:01 bamt-miner - /opt/miners/sph-sgminer/sgminer --api-listen   -c /opt/miners/sph-sgminer/sgminer.conf
Mar 03 20:10:02 bamt-miner - sgminer failed to start.  Try running the mining command above to find the error.  Also, double check your GPU options.
Mar 03 20:12:01 bamt-miner - sgminer not running, starting via this command:
Mar 03 20:12:01 bamt-miner - /opt/miners/sph-sgminer/sgminer --api-listen   -c /opt/miners/sph-sgminer/sgminer.conf
Mar 03 20:12:02 bamt-miner - sgminer failed to start.  Try running the mining command above to find the error.  Also, double check your GPU options.
Mar 03 20:14:01 bamt-miner - sgminer not running, starting via this command:
Mar 03 20:14:01 bamt-miner - /opt/miners/sph-sgminer/sgminer --api-listen   -c /opt/miners/sph-sgminer/sgminer.conf
Mar 03 20:14:02 bamt-miner - sgminer failed to start.  Try running the mining command above to find the error.  Also, double check your GPU options.

sometimes X is not starting after reboot and cgmon restart sgminer in endless cycle. maybe rebooting rig after N failed starts will be good solution?

I bet there's an error in your .conf file.  What happens when you run the command manually?

Code:
/opt/miners/sph-sgminer/sgminer --api-listen   -c /opt/miners/sph-sgminer/sgminer.conf

Are you running as root or another user?  Could also be a permissions issue.  Either way, running it by hand should show the problem.


Hi!
.conf is ok, rig on default BAMT with another mining software (sgminer). Problem occurs when X.org does not start automatically (sometimes it happens) at boot (and AMD driver not working). You can try this sutuation. Sgminer exiting and return nothing. If X.org running, everything works well.
sr. member
Activity: 269
Merit: 250
is there any way to add a feature?
If the card's temp goes to specific temperature it stops mine for like 5 minutes than reboot! this can be turn on or off!

That would not be difficult to add.  Perhaps a donation would provide some motivation to make it happen Smiley
newbie
Activity: 62
Merit: 0
is there any way to add a feature?
If the card's temp goes to specific temperature it stops mine for like 5 minutes than reboot! this can be turn on or off!
sr. member
Activity: 269
Merit: 250
Can you please paste the contents of /tmp/accepted_count

Code:
bomberb17@bomberb17-ltcminer:~$ cat /tmp/accepted_count
1127 1397104922
1151 1397104922
1146 1397104922

I have restarted my rig and now cgmon works ok again.
However I have seen cgminer hanging many times without cgmon catching it, and I suspect that this "bug" (which happens once in a while) is the reason.

Sounds like it.  There's only so much we can do in this case.  You could try running sgminer instead.  It's a fork of later version of cgminer, with scrypt support.

sr. member
Activity: 269
Merit: 250
Hi, do I need to add this to say BAMT/SMOS/NotSMOS 1.3 or PiMP to have the rig reboot when sick devices are found?

I haven't used those, but I believe so.
hero member
Activity: 812
Merit: 1000
Hi, do I need to add this to say BAMT/SMOS/NotSMOS 1.3 or PiMP to have the rig reboot when sick devices are found?
newbie
Activity: 62
Merit: 0
If i run the scrypt manually it works, but it wont do it alone! how to fix? Cheesy
newbie
Activity: 62
Merit: 0
when running it manually:

http://prntscr.com/38nlyx

Ninja edit: this error i solved, in the path to log it misses a "/"

newbie
Activity: 62
Merit: 0
Hi there,
I have this scrypt running on one rig and i am trying to put on the others.
I run them with bamt 1.3, when i first installed i had this error but i dont remember how i fix it!

i got this information on the log
http://prntscr.com/38nkfi

On the config file i have the right path:
http://prntscr.com/38nkt4

any help? Cheesy
hero member
Activity: 773
Merit: 528
Can you please paste the contents of /tmp/accepted_count

Code:
bomberb17@bomberb17-ltcminer:~$ cat /tmp/accepted_count
1127 1397104922
1151 1397104922
1146 1397104922

I have restarted my rig and now cgmon works ok again.
However I have seen cgminer hanging many times without cgmon catching it, and I suspect that this "bug" (which happens once in a while) is the reason.
sr. member
Activity: 269
Merit: 250
Can you please paste the contents of /tmp/accepted_count
hero member
Activity: 773
Merit: 528
When I manually run it, I don't receive any mail ( I have gmail).
Waiting to see if I receive when cgminer crashes..

Check your configuration options.  This is the default:

Code:
# send email when running script by hand (no or yes)
set mail(notify_on_manual_runs) "no"

Yes I have that option enabled. Mail is coming when cgminer hangs though.
I also have another question: I added the line
*/2 * * * *        root    /home/user/cgmon.tcl >/dev/null 2>&1
in /etc/crontab as in the instruct.ons. While the script most of the time runs ok, I see that sometimes it does not. For example, this is the output of the log:
Code:
bomberb17@bomberb17-ltcminer:~$ tail -n 20 cgmon.log
Apr 09 02:42:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:44:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:46:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:48:01 bomberb17-ltcminer - GPU 0 Shares accepted since last run:  65  (5.43 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - GPU 1 Shares accepted since last run:  68  (5.68 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - GPU 2 Shares accepted since last run:  62  (5.18 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:50:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:52:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:54:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:56:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:58:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:00:02 bomberb17-ltcminer - GPU 0 no accepted shares in 721 seconds. GPU probably hung.
Apr 09 03:02:02 bomberb17-ltcminer - cgminer API is not enabled/responding.  Restart cgminer with '--api-listen' or check the status of mining pools.
Apr 09 03:04:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:06:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:08:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:10:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:12:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:14:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
And I received the email on 03.00am.
While it restarted my computer and cgminer continued to run fine, the script stopped running every 2 minutes for some reason. I tried adding the line also in crontab -e but nothing changed.
Any ideas?
My os is Xubuntu 12.10


See this also
Code:
bomberb17@bomberb17-ltcminer:~$ ./cgmon.tcl
invalid bareword "exited"
in expression "229 - exited";
should be "$exited" or "{exited}" or "exited(...)" or ...
    (parsing expression "229 - exited")
    invoked from within
"expr  $current_accepted($n) -  $previous_accepted($n)"
    (procedure "check_status" line 197)
    invoked from within
"check_status"
    (file "./cgmon.tcl" line 602)
hero member
Activity: 773
Merit: 528
When I manually run it, I don't receive any mail ( I have gmail).
Waiting to see if I receive when cgminer crashes..

Check your configuration options.  This is the default:

Code:
# send email when running script by hand (no or yes)
set mail(notify_on_manual_runs) "no"

Yes I have that option enabled. Mail is coming when cgminer hangs though.
I also have another question: I added the line
*/2 * * * *        root    /home/user/cgmon.tcl >/dev/null 2>&1
in /etc/crontab as in the instruct.ons. While the script most of the time runs ok, I see that sometimes it does not. For example, this is the output of the log:
Code:
bomberb17@bomberb17-ltcminer:~$ tail -n 20 cgmon.log
Apr 09 02:42:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:44:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:46:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:48:01 bomberb17-ltcminer - GPU 0 Shares accepted since last run:  65  (5.43 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - GPU 1 Shares accepted since last run:  68  (5.68 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - GPU 2 Shares accepted since last run:  62  (5.18 shares/min)
Apr 09 02:48:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:50:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:52:02 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:54:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:56:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 02:58:01 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:00:02 bomberb17-ltcminer - GPU 0 no accepted shares in 721 seconds. GPU probably hung.
Apr 09 03:02:02 bomberb17-ltcminer - cgminer API is not enabled/responding.  Restart cgminer with '--api-listen' or check the status of mining pools.
Apr 09 03:04:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:06:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:08:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:10:04 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:12:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
Apr 09 03:14:03 bomberb17-ltcminer - cgmon 1.0.8 - cgminer running and all GPUs healthy.
And I received the email on 03.00am.
While it restarted my computer and cgminer continued to run fine, the script stopped running every 2 minutes for some reason. I tried adding the line also in crontab -e but nothing changed.
Any ideas?
My os is Xubuntu 12.10
sr. member
Activity: 269
Merit: 250
Thanks for the advice jdape. Will look into the SSL option.
Noted on the gpu engine clock, sometimes the gpus can run for 48hours+ and be absolutely fine and other times they will run for an hour. Frustrating as they appear to be stable and then throw a curveball.

Now that spring is here - I notice if certain windows are closed vs opened, that makes the difference between stability and constant hangs.  Will have to power down (depending on price) or break out the AC units soon! Smiley

sr. member
Activity: 588
Merit: 251
Thanks for the advice jdape. Will look into the SSL option.
Noted on the gpu engine clock, sometimes the gpus can run for 48hours+ and be absolutely fine and other times they will run for an hour. Frustrating as they appear to be stable and then throw a curveball.
sr. member
Activity: 269
Merit: 250
Just been testing this script for the past 24 hours! It's caught 3 SICK GPUs so far and solved with an automatic reboot. Thank you to the developers, no more midnight panics!

I did have one question, when the system is told to reboot, cgmon starts cgminer in the background. I then have to open the terminal and type "screen -r" to see the readout from the miners. Is it possible to have this happen automatically following an auto-reboot? Thank you

So you're saying you want to see your miner in a terminal window after a reboot?  Hmm...   I'm sure that's possible but it would involve ugly hacks and wouldn't work reliably.   I think you'll have to type screen -r as needed for now.   That being said - you can use any computer to SSH (remotely login) into your miner computer and then type screen -r.   You don't have to physically go to your miner and pull it up on the big screen.  For example, I can open five terminal windows on my laptop, each with an SSH session to a miner and see all five at once from anywhere.

When I get sick gpu's over and over, I lower the clock speed (gpu-engine) by 10 and start it again.  Repeat until card is stable.
sr. member
Activity: 269
Merit: 250
When I manually run it, I don't receive any mail ( I have gmail).
Waiting to see if I receive when cgminer crashes..

Check your configuration options.  This is the default:

Code:
# send email when running script by hand (no or yes)
set mail(notify_on_manual_runs) "no"
sr. member
Activity: 588
Merit: 251
Just been testing this script for the past 24 hours! It's caught 3 SICK GPUs so far and solved with an automatic reboot. Thank you to the developers, no more midnight panics!

I did have one question, when the system is told to reboot, cgmon starts cgminer in the background. I then have to open the terminal and type "screen -r" to see the readout from the miners. Is it possible to have this happen automatically following an auto-reboot? Thank you
hero member
Activity: 773
Merit: 528
When I manually run it, I don't receive any mail ( I have gmail).
Waiting to see if I receive when cgminer crashes..
Pages:
Jump to: