Folk,
I'd like to share a lesson's learned that may be useful to others. I have a Z9mini here that I just checked and it has a hash rate of 0. Immediately I went to the logs to see what was going on and found the following:
Nov 28 21:33:52 (none) local0.err cgminer[23558]: bm1740_verify_nonce_integrality CRC error. cal-crc=374c, chip-crc=60bf
Nov 28 21:33:52 (none) local0.warn cgminer[23558]: receive a error nonce. total = 8908
Nov 28 21:33:52 (none) local0.err cgminer[23546]: bm1740_verify_nonce_integrality CRC error. cal-crc=ac2d, chip-crc=3f77
Nov 28 21:33:52 (none) local0.warn cgminer[23546]: receive a error nonce. total = 8705
The key here is that these are happening constantly, every second, the count is up to 8000+. (If these are infrequet, every few minutes, to hours, they can be ignored) What's going on? To figure that out, let's take a look at the process list ("System" -> "Monitor")... and what do I find?
23546 23545 root S < 225m 98% 50% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
23558 23557 root S < 257m 111% 40% /usr/bin/cgminer --version-file=/usr/bin/compile_time --config=/config/cgminer.conf -T --syslog
Two copies of cgminer running! How could that happen? The answer is in this little program right here:
1012 1 root S 2152 1% 0% {monitorcg} /bin/sh /sbin/monitorcg
This is a factory process that tries to be a "watchdog" for cgminer and restart it if it is not running. From the factory it ran every 20 seconds, but I modified it to sleep for 60 seconds to try to limit the possibility of this race condition.
What happens is if you change frequency or pool configuration, cgminer is stopped and restarted. While that stop/start is occurring, monitorcg has a change to see cgminer is not running and start one itself. End result: Two cgminer's stepping on each other.
I may end up removing /sbin/monitorcg from the firmware as I've attempted to fix this particular race a myriad of ways... but when two separate processes (web interface actions and monitorcg) are both touching the same resource ("cgminer"), there is not any good way to prevent them from stepping on each other unless they are talking to each other constantly to achieve what is called "quorum".
What's the lesson here? Many times the errors that you may see are a function of this particular race condition.... and if you have two cgminer processes running, the fix is to kill/restart them. The simplest way to do that is ust to go to the frequency page and click submit. That will terminate both cgminers and hopefully restart it before monitorcg tries to help. A guaranteed way to fix it is to reboot, but I am not a fan of unnecessary reboots.
Hopefully this bit of information will be useful to someone. I've been meaning to write posts like this explaining various scenarios for a while.
Thank you,
Jason