Pages:
Author

Topic: cgmon - mining monitor for Linux - auto restart, reboot, sick gpu, ASIC, &more - page 14. (Read 48345 times)

newbie
Activity: 48
Merit: 0
Try adding an extra line:

At line 102:
# cgminer is not running, restart it.
notice "$cgminer_exec not running, starting..."
notice "$mining_command"

With the extra line you can see the cgminer command when you run cgmon. Makes it a little easier to debug invalid arguments.
sr. member
Activity: 324
Merit: 250
I think I found what the problem was. I added api-allow in the cgminer extra-options. Seems it didn't like it for some reason. It would crash after 5 mins~.

Should I put that in the Gpu_options ? Or else ?
sr. member
Activity: 324
Merit: 250
 Status: PHP Warning:  socket_read(): unable to read from socket [104]: Connection reset by peer in /tmp/cgmon-api.php on line 35
 
I have this error happening. I believe that it's because I run Anubis on the side and Anubis makes the socket_read() impossible on cgmon side from time to time. Any quick way to solve this ?

Thanks

*Not Anubis's fault. I stopped pointing on that particular miner and it crashed again. I cannot go 10 minutes w/o it crashing because of that. Not sure what's wrong. Will disable it in the meantime.
newbie
Activity: 48
Merit: 0
No problem guys, it's great we can make this work for everyone!

jdape:
I was thinking, how do you feel about using my implementation of settings?
I think that is a lot easier to get an overview of, also you can use commented lines to make notes and "disable" options.

Also, more pools would be great, I have 5 pools in my version, which I change between.

I know my implementation is very static, but I guess it could be a lot prettier by using some lists (I am TCL noob).
member
Activity: 110
Merit: 10
I noticed just now after downloading the new cgmon that my log file says it is 0.1b8.  Did you do another release or is it mislabelled?
newbie
Activity: 9
Merit: 0
Hey, I found this post a while ago, and as I am using sgminer and also wanted some easier customization, I have reworked the script quite a lot.

Still needs a little more work though, then I will be sharing it here.

I highly appreciate your work!

Sgminer is the only way to go for newer GPU's (R9 290, R9 290X, R9 280X, etc...)
it is very stable and has lots of bugs fixed....just my opinion
newbie
Activity: 9
Merit: 0

For me at least, when a GPU goes SICK or DEAD a full reboot of the computer is always required to get the GPU working again.   This has been the case with all my rigs and GPUs.  It could still be just me though since my rigs are running mostly identical software...  Anyhow, that's why it restart the entire computer.



its fine, rebooting the computer isnt a bad idea for dead/sick GPU's considering ubuntu is really quick to reboot unlike Windows....Linux is the only way to go for continuous mining IMO....thx for everything jdape! It is getting closer to CGwatcher....

sr. member
Activity: 269
Merit: 250
# 0.1b7
#   Added detection of cgminer/AMD crash aka 'asic hang' (thanks dr00g!)
#   Fixed rebooting on BAMT.
#   Default share timer changed from 5 to 10 minutes.
#   Accepted share counts and rate added to logfile.  Example below.

Code:
Jan 30 17:55:02 miner8 GPU 0 Shares accepted since last run:  16  (1.07 shares/min)
Jan 30 17:55:02 miner8 GPU 1 Shares accepted since last run:  29  (1.93 shares/min)
Jan 30 17:55:02 miner8 GPU 2 Shares accepted since last run:  1  (0.07 shares/min)
Jan 30 17:55:02 miner8 GPU 3 Shares accepted since last run:  17  (1.13 shares/min)
sr. member
Activity: 269
Merit: 250
I have a question though. The reason I was looking for a monitoring solution in the first place is because my cgminer randomly freezes and stops responding. I am able to SSH through my phone or computer and coldreboot the rig. Will this script do the equivalent, or does it ONLY restart/reboot when cgminer quits completely?

Typically cgminer will still be running when a situation requiring a reboot occurs...  So no, if a GPU crashes, it will reboot whether cgminer is running or not.

If cgminer itself crashes and stops responding -- cgmon is not setup to deal with that situation.   Let me know if that's actually what's going on and if so I can probably add a feature to reboot in that case.

-j
sr. member
Activity: 269
Merit: 250


Thx dr00g!

I already had figured out the set_sgminer = "no" part, the issue was with the path.
I got everything working now.

For those who may struggle, here is what to put since the new 0.1b6 has two fields (path and exec) which kind of threw me off a little.

set cgminer_exec "cgminer"
set cgminer_path "/your/path/to/cgminer/"


Thank you.  I've now added this explanation for cgminer_path:
Code:
# this is required only if your cgminer_exec binary is not installed in your mining_user PATH (like /usr/bin/ or whatever)
# example:  /home/user/cgminer-3.1.1/       (must end with a '/')
set cgminer_path ""


About the GPU disabled. Yes i waited 3-4minutes and it did detect my DEAD/SICK GPU...one little problem, it rebooted my computer instead of cgminer...is this normal?

Here is the output message i got...

"GPU 0 no accepted shares in 458 seconds. GPU probably hung.
cgminer not running, starting....."

and it reboot my computer...Is this normal?

A small suggestion, maybe you should only reboot that particular GPU, I was looking at API commands for cgminer, The command for restarting a GPU is gpurestart|N where N is the GPU number. So gpurestart|0 would restart GPU0.

Just a suggestion, since GPU restarts are much quicker than cgminer restarts.

For me at least, when a GPU goes SICK or DEAD a full reboot of the computer is always required to get the GPU working again.   This has been the case with all my rigs and GPUs.  It could still be just me though since my rigs are running mostly identical software...  Anyhow, that's why it restart the entire computer.



sr. member
Activity: 269
Merit: 250
Right, here is a little "bugfix":

The variable is not initialized:
# uncomment for sgminer
#set use_sgminer "yes"

Which leads to:
can't read "use_sgminer": no such variable
    while executing
"if {$mine_for == "litecoin" && $use_sgminer == "no"} {set cgminer_option1 "--scrypt" } else { set cgminer_option1 ""}"
    (file "./cgmon.tcl" line 287)

It is fixed by:
The variable is not initialized:
# uncomment for sgminer
set use_sgminer "no"
#set use_sgminer "yes"

If you uncomment the one below, the variable will be changed. In this, you can control sgminer with just commenting/uncomment the variable below.

Apart from that, the script runs for me. Did you change anything else?


Also, if you will in:
set cgminer_path ""
Then you only have to set the directory, like /home/user/sgminer/, where the sgminer file is located inside. You don't have to fill out the directory path, if you have installed cgminer, like I have by compiling and installing with "sudo make install".

Thanks for this.  I've fixed the uninitialized variable error in the latest version on the website.
sr. member
Activity: 269
Merit: 250
2) I;m also having another problem where even if i disable all my GPU's manually from CGminer i still get all "GPU's running healthy" message from cgmon.

Am I doing something wrong? shouldnt it detect DEAD GPU's and restart cgminer?

Thx
 

If your GPUs are disabled, cgmon won't 'see' them and so it will assume everything is running smoothly.   If one of the GPU's is marked as sick, dead, or stops outputting accepted shares, cgmon will log it, notify you, and then reboot the server.

-j
newbie
Activity: 48
Merit: 0
newbie
Activity: 48
Merit: 0
You are welcome, promise you will teach other people mining too! Wink

It depends on what is the problem, sometimes it can require a reboot as it is the actual driver that has crashed.

What do you need that for, experiencing that hashrate drops and a cgminer restart to fix it?
member
Activity: 110
Merit: 10
How can I get the e-mail notifications to work?  I tried using smtp.gmail.com:25 but that doesn't send anything.  Does anyone know of an smtp setting I could use so the e-mail gets sent?  Thanks!
newbie
Activity: 9
Merit: 0
Try changing values to:
set cgminer_exec "cgminer"
set cgminer_path "/opt/miners/cgminer/"

Did you remember to change username in:
set mining_user "user"

Thx dr00g!

I already had figured out the set_sgminer = "no" part, the issue was with the path.
I got everything working now.

For those who may struggle, here is what to put since the new 0.1b6 has two fields (path and exec) which kind of threw me off a little.

set cgminer_exec "cgminer"
set cgminer_path "/your/path/to/cgminer/"


About the GPU disabled. Yes i waited 3-4minutes and it did detect my DEAD/SICK GPU...one little problem, it rebooted my computer instead of cgminer...is this normal?

Here is the output message i got...

"GPU 0 no accepted shares in 458 seconds. GPU probably hung.
cgminer not running, starting....."

and it reboot my computer...Is this normal?

A small suggestion, maybe you should only reboot that particular GPU, I was looking at API commands for cgminer, The command for restarting a GPU is gpurestart|N where N is the GPU number. So gpurestart|0 would restart GPU0.

Just a suggestion, since GPU restarts are much quicker than cgminer restarts.

Thx and the only thing missing now would be:

1) Set Hash rate, if falls below reboot cgminer

Thx Guys!!

newbie
Activity: 48
Merit: 0
I had that problem aswell, in my case it turned out my GPU apparantly didn't like my thread-concurrency...

Anyway, to solve it I found a script which I found here a while ago: http://forum.feathercoin.com/index.php?topic=5989.0

Add this to a file, like /home/username/amd_crash_monitor.sh

#!/bin/bash
if ( dmesg | grep "ASIC hang happened" ); then
  echo "Catalyst has crashed! Rebooting..."
  /sbin/shutdown -r now
fi

Add it to crontab the same way as cgmon, works same way with the extra flag if you want to see output or not.

After the driver has crashed it could take a few minutes before it appears in the log, and the rig reboots.

Remember that you likely should change the reboot line to work with BAMT.
newbie
Activity: 4
Merit: 0
Oh, ">/dev/null 2>&1" is exactly for not making any output, I guess it should be removed, both from cronjob line to actually get logoutput...
When you run it manually, type:
sudo ./cgmon.tcl
That should trigger it with some output.

Thanks dr00g! Got it working! PHP wasnt in the correct directory... once the log started showing I was able to troubleshoot.

I have a question though. The reason I was looking for a monitoring solution in the first place is because my cgminer randomly freezes and stops responding. I am able to SSH through my phone or computer and coldreboot the rig. Will this script do the equivalent, or does it ONLY restart/reboot when cgminer quits completely?
newbie
Activity: 48
Merit: 0
Oh, ">/dev/null 2>&1" is exactly for not making any output, I guess it should be removed, both from cronjob line to actually get logoutput...
When you run it manually, type:
sudo ./cgmon.tcl
That should trigger it with some output.
newbie
Activity: 4
Merit: 0
Does it come with any output when you run it?
Is it running in screen? Type: screen -r
To deattach, press Ctrl + A + D

Did you remember to install the required packages?
# 1) Install PHP, TCL and screen. 
# CentOS: yum install php53 tcl screen
# Ubuntu: apt-get install php5 tcl screen

I type "/home/user/cgmon.tcl >/dev/null 2>&1" and i get no output, just a new line...

php5, tcl, and screen are all installed and updated.
Pages:
Jump to: