Pages:
Author

Topic: CGWatcher 1.4.0, a GUI/monitor for CGMiner & BFGMiner to help minimize downtime - page 28. (Read 402554 times)

member
Activity: 75
Merit: 10
Looks pretty interesting.  I might give it a try.  Thanks.
newbie
Activity: 29
Merit: 0
Are you also running two instances of cgwatcher, one for each port?
newbie
Activity: 50
Merit: 0
I think there's a bug in that CGWatcher only ever looks on port 4028.

I have specified two instances of cgminer with --api-port 4029 and --api-port 4030.

Got "[2013-06-09 00:47:30] API running in IP access mode on port 4029 (13436)" success, have CGWatcher setup like so - http://i.imgur.com/14Fw78y.png.

Reports no access.

Using the latest 1.1.15 with the June 6th hotfix.

Edit.
Tried editing the .ini and it works fine in 1.1.13 and 1.1.14 although they do not allow multiple instances which kind of defeats the point.
hero member
Activity: 574
Merit: 500
freedomainradio.com
Looks nice. I will try it out!  Grin
newbie
Activity: 29
Merit: 0
I logged on about ten minutes before taking the screenshot, and the failed to reinit gpu thread message was repeating at regular intervals; I too suspected that maybe my logging on would have somehow caused this so I observed it for a while to see, thinking that if this is the case then theoretically, the system should soon reboot; I was just surprised to see that it didn't and that nothing was logged/logging so I couldn't check when exactly it died initially either. :/

Mapping works correctly now with either --gpu-platform specified or not, the --no-adl was causing the mapping problem. Also, no more random sick/dead GPUs since said removal, hmh. Smiley

Seems to be in order now... Smiley
sr. member
Activity: 434
Merit: 251
CGWatcher & CGRemote
From looking at the screenshot again, it looks as though the GPU died right before you took the screenshot, correct? The screenshot was taken at 19:24 and cgminer shows:

Quote
[2013-06-06 19:23:44] Failed to reinit GPU thread 1
...
[2013-06-06 19:24:46] Thread 1 no longer exists
[2013-06-06 19:24:46] Error -6: Creating Comamnd Queue. (clCreateCommandQueue)
[2013-06-06 19:24:46] Failed to reinit GPU thread 1

So unless there is more to what is shown, it doesn't appear the GPU was dead for 12 hours but rather ~1 minute. While there is usually a small delay in cgminer showing the sick or dead GPU and it reporting it through its API (+ up to 10 additional seconds because of the CGWatcher monitor interval), I think CGWatcher should have at least reported it by the time the screenshot was taken so there was still a CGWatcher issue. But I believe this would have been caused by incorrect GPU mapping, which may have been a result of the bugs that were found or may have been resolved with manual mapping (although the bugs still didn't help the situation.) The fact that the GPUs showed as INACTIVE means there was a mapping problem. I'd be interested to know if GPUs are correctly mapped with "--gpu-platform 1", and if not, if manually mapping them in CGWatcher corrects this. I'm not sure if the --no-adl option played a part in this... I want to say it didn't but I would need to test it some more and I'm getting ready to leave at the moment.

CGWatcher gets all mining data from cgminer (including GPU status) - it sends a command and waits for a response, then processes the response data. If there were problems that prevented a response, it would consider this as API access loss and react accordingly (although it may require several consecutive failures before taking action to make sure there is in fact a problem.) With 1.1.5.0, it also checks the status of the cgminer process, and if it is not responding for three consecutive checks it kills and restarts it.

As far as your increase in sick or dead GPUs, I can't explain that but I can say that CGWatcher should have no effect on this. While CGWatcher reads ADL data, it doesn't actually change anything so I don't think that would cause issues (and hasn't during testing). If this was the case, any hardware monitoring software like HWINFO, GPU-Z, AIDA64, etc. would also cause problems with cgminer. The same thing with OpenCL, and CGWatcher only polls OpenCL devices once on startup. When you change a GPU setting in CGWatcher, it just tells cgminer to change the setting so it isn't actually changing anything itself.

And as far as the cgminer logging options... I've never used any of those so I'm not sure why they don't work. When it comes to these types of options, I chose to include them in the Config File Editor because even though I don't use them, other people may and they may work in other versions of cgminer. If it doesn't work even outside of CGWatcher then it's not a CGWatcher problem.  Grin
newbie
Activity: 28
Merit: 0
newbie
Activity: 29
Merit: 0
I've disabled --gpu-platform 1 and it seems to work fine now, I can't tell when and where the problem was since I've been using it through mostly all versions of cgminer ever since I first had the problem with CPU showing up. Only GPUs are visible now.

I've moved all my settings into .conf as it's much easier to manage. Why didn't I do this in day one... xD

I thought about logging any info I can get, just in case, and it seems cgminer (with or without cgwatcher) will instantly crash if any of these are used:

"debug" : true,
"per-device-stats" : true,
"verbose" : true,

Weird. Ehh, no biggie.

Lastly, I was thinking about that time a few posts above when my miner was running with one working GPU and a dead one for ~12 hours and cgwatcher didn't reboot the system (or log anything), which got me thinking a bit. Does cgwatcher read everything directly from cgminer? I'm not a programmer whatsoever, but wouldn't this cause issues in unforeseen circumstances where it just waits for an answer from cgminer and that one's not providing/responding or other unpredictable behaviour? Perhaps copying said values from cgminer to ram, and then have the whole thing read only from ram to provide some redundancy? This way, no matter what happens as long as the watcher is alive and responsive, reboots should always execute. Just a misplaced thought though. Tongue
sr. member
Activity: 282
Merit: 250
Is this because of the different cryptos or do you suspect bad configuration
different crypto.
full member
Activity: 213
Merit: 100
question about my hash rate.

I have CGminer connected to a lte pool and bitminter for btc

bitminter gets 26mh/s while cgminer is at about 35kh/s

Is this because of the different cryptos or do you suspect bad configuration
sr. member
Activity: 434
Merit: 251
CGWatcher & CGRemote
Interesting that --gpu-fan works with --no-adl. Maybe I don't understand it correctly, or maybe it still uses adl just for the fan. Or maybe it uses something else to control the fan. If you find that you have to go back to --no-adl or --gpu-platform 1, let me know if there are any issues with CGWatcher and I'll try to get them worked out. I figured there would be some issues with GPU mapping but I wouldn't know for sure until it was used on many different systems and configurations. It worked without incident on the 5 or so test PCs.

I've read that each vendor's OpenCL implementation will result in its own OpenCL platform, so the CPU being on a separate platform makes sense... but it is not something I encountered on any of the systems I tested on... even if they used Intel CPUs and AMD GPUs. None of the CPUs were Ivy Bridge though (all were either AMD or pre-Sandy Bridge Intel.)

And for the last question, the miner will use both a config file and arguments at the same time if it is given both. If the same option is specified in both with different values, it appears to give priority to the config file. The only exception is when using CGWatcher, --api-port in arguments will override api-port in the config file because I think it makes more sense that way and I wanted to make it easier to manage multiple instances running (each requires a unique port). Changing arguments seems easier than editing a config file (although I think CGWatcher makes them both pretty easy.) I guess that's just based on how I personally use them, but I initially expected arguments to override the config file. I suppose I could have it check for duplicates, but at some point I have to stop trying to handle every situation and give the user a little responsibility.  Wink
newbie
Activity: 29
Merit: 0
Thanks for the clarifications.

Yeah, --gpu-fan works with --no-adl option. Even tho I set them to 100% via Afterburner on boot sometimes after a cgminer instance is closed (or all the time?) they drop back to Auto for some reason so I figured why not, seems to stick well this way. I used --no-adl since cgminer can't adjust my voltages, can't read VRM temps (which I was more concerned about compared to core) and I think it *may* have caused some issues with either screen locking or LogMeIn, can't recall specific details. Updated to the new version, and removed --no-adl for now to see how it goes; it seems this way GPUs are mapped correctly as a first observation.

On my platform 0, I think the CPU used to pop up (dual-core Ivy)... I'll remove the setting these days and see what happens, too lazy to test this too at the moment, 5:20AM. o_O

I've changed the temp thresholds in the config to suit my needs. Regarding this, if in profile manager there are arguments/parameters in the first field and a cfg file linked in the second, will all merge into one command line which is fed to cgminer when started? What if there are duplicates which may or may not specify the same value(s) due to a typo or so, thus conflicting? Just curious how that's handled.

I'll let it dig for silver for now, go to sleep and see how things go.

Thanks again for all your input. Smiley
sr. member
Activity: 434
Merit: 251
CGWatcher & CGRemote
sydameton: I'm sorry to hear you're having problems. I'm not sure why you're using the no-adl option as that prevents cgminer from being able to get GPU temperature, clock speeds, voltage, fan speeds, etc. Does the "--gpu-fan 100" work with ADL disabled? I think the only time it may be necessary is if you need the miner to continue running while switching users... but I would think it is preferable to keep ADL enabled and just stop the miner, switch users, then restart it. I see you also have "gpu-platform 1" set. Out of curiosity, do you know which devices are on OpenCL's platform 0? In my testing I've never seen OpenCL report more than one platform, although I was only able to test on five systems.

I found two bugs that would cause the problems shown in the log. I've fixed those and added two "Device Reports" buttons to the Tests tab that may help find bugs if the problem persists. "Detected" will list all of the GPUs (and CPUs) detected by CGWatcher. "OpenCL" will list all GPUs (and CPUs) reported by OpenCL, or in the case the miner reports a GPU that for some reason OpenCL did not, it gets added to the OpenCL GPU list. I did not find any reason the log would not have entries for those hours other than nothing occurred during that time that created a log event.

Even if these fixes work in correcting that problem, it still may be necessary to manually map cgminer's GPUs with CGWatcher's GPUs. Unfortunately the libraries involved do not provide a way to do this with 100% accuracy, which is why cgminer has a gpu-map option. CGWatcher will attempt to copy the gpu-map option (if used), but because CGWatcher may detect GPUs that cgminer does not, you may have to tell it which GPU is which as far as mining is concerned. You can map GPUs by clicking the 'GPU Map' option in the Settings tab, or the 'Incorrect hashrates for this GPU?' link in the GPU tab. I've tried to provide as much relevant information as I could for mapping them correctly, but as I said the libraries do not provide any common properties.

You will need to download 1.1.5.0 again to test if it is working any better. If you still have problems after updating, please send both the "Detected" and "OpenCL" lists reported by clicking each button and copying the results. Any relevant log entries (including debug entries) and the text from the Report tab will also be helpful. It may be better to email me this info if it gets too long.


In case anyone is curious as to what GPU mapping is and why it is a problem sometimes (even for cgminer)... this is how I understand it, although admittedly I'm still learning this myself:

OpenCL is the language cgminer uses to actually perform the hashing on your GPU. OpenCL reports all OpenCL-supported devices (including CPUs and GPUs.) For each of these devices, it provides some information like codename (e.g. "Tahiti"), vendor, and a bunch of OpenCL-related information like OpenCL version, OpenCL device ID, hardware info, etc.

ADL is a library provided by AMD to get model/series name, temperature, clock speeds, voltage, fan speeds, etc. for AMD GPUs.

The problem is that the two don't provide any information that can be used to map an OpenCL GPU to an ADL GPU with 100% accuracy. This is why cgminer has a gpu-map option. Normally it tries to match in the order they are returned: OpenCL GPU 0 is mapped to ADL GPU 0. But this isn't always correct, especially if you get into some more unusual hardware configurations that miners may have.

So now that CGWatcher uses ADL on its own, it also has to try to correctly map the GPUs it detects to the GPU mapping in cgminer. It cannot do this as easily as you'd think because cgminer doesn't return ADL or OpenCL information through its API that would be useful in mapping, especially if you have two or more of the same model card. So it looks at the gpu-map option and tries to copy it, but again it is not always guaranteed to map perfectly so I tried to allow the ability to manually map a CGWatcher GPU to a cgminer GPU. Once you do this once, it is remembered so you won't have to do it again unless you add/move/remove cards.

To add to this, cgminer has the gpu-reorder option that when enabled, will cause cgminer to list GPUs sorted by their bus, so that is another thing CGWatcher has to watch for and try to copy. This gets even more complicated if you have something like an integrated Nvidia GPU and a discrete AMD card (which was the case on one system I tested on), because cgminer may only detect the OpenCL-supported AMD GPU while CGWatcher will detect both because even if the Nvidia GPU does not support OpenCL, as CGWatcher also uses NVAPI
(Along with trying to explain why this is sometimes necessary, I'm hoping that if someone more knowledgeable sees that I am mistaken here they will correct me. )


More information on how overheat protection works:

With CGWatcher's overheat protection in 1.1.5.0, it looks at all of the GPUs that are mining and if cgminer is capable of getting their temperatures, it leaves them alone. In your case, using the 'no-adl' option means cgminer is not capable of providing any type of overheat protection (as far as I know). So CGWatcher attempts to provide it since it does have the temperatures... but as of right now it is pretty basic and limited to adjusting intensity or disabling the GPU.

If you want to change the temperatures used in overheat protection (temp-target, temp-overheat, and temp-cutoff), you can do so by setting them in the config file or as arguments - CGWatcher uses the same temperature settings as cgminer. By default (meaning if you don't explicitly set them, these values will still be used), the temperatures for these options are 75, 85, and 95 respectively. So if you find you don't want CGWatcher changing intensity until temps get to 92, you would need to change temp-overheat to 92. Temp-cutoff is the temperature at which cgminer or CGWatcher will disable to GPU. I'm not sure how cgminer handles this, but CGWatcher will re-enable the GPU once temperatures have returned back down to temp-target for 30 seconds or so. Also, you may find the overheat protection changing intensity quite often at first, but ultimately it should level off between two intensities as it tries to find the highest intensity it can while staying under the temp-overheat.

If you want to disable CGWatcher's overheat protection completely, you can change EnableOverheatProtection=True to EnableOverheatProtection=False in CGWatcher's INI file.
newbie
Activity: 10
Merit: 0
Hi, I'm using CGW with CGMiner and works flawlessly.
GOD bless you, milone !!!
newbie
Activity: 29
Merit: 0
I found a bug, at least in my particular case. I use 2 7950s. cgminer sorts them as as they are physically installed in the system, as does HWINFO, but cgwatcher sees gpu0 as gpu1 and gpu1 as gpu0. Because of this and as my lower card is much hotter than the upper one, it tries to lower the intensity of what it thinks is gpu0, thus slowing down the real first card and of course, this doesn't help with the temp (on rare occasions my bottom card's core touches 91 or so *C, only when mining LTC). Also, using --no-adl argument if it helps. Smiley

L.E. Got more info, putting it together now.

last 40 or so hours' log: http://pastebin.com/7JCNbsKe and latest screenshot: http://imageshack.us/photo/my-images/849/screenshotjfv.jpg/

For some reason, CGWatcher also stopped logging (for) the past 12 hours (notice 7AM compared to 7PM in tray), otherwise it was responsive to menus. I'm guessing this is why the restart computer when sick/dead didn't kick in either, as set.

Furthermore, while mining the GPUs in devices tab appear as "INACTIVE", even tho respective values are reported/updated accordingly (this I rechecked after system reboot). I haven't tried clicking ENABLE while cgminer's working tho. I don't understand why I'm getting SICK/DEAD lately, parameters haven't been tinkered with for a month, and it was running 24/7 without issues since. I think after 1.1.4 these problems started surfacing, but I don't know if it's related, just an observation.

Meh. Undecided
newbie
Activity: 29
Merit: 0
The entire community's grateful for your work on cgw so far -- especially with v1.1.5; compared to not having it at all, lack of cgremote is meh. All in due time. Smiley
sr. member
Activity: 434
Merit: 251
CGWatcher & CGRemote
Sorry, I took on another project that will last 6-8 weeks, so I will not be able to work on it during that time. My plan was to release 1.1.5 fixing any known problems before starting the new project, and it seems to be working pretty well (I'll still try to fix bugs if they are reported.) So once this project is finished I will be working primarily on CGRemote, but unfortunately my best estimate right now is 2.5 to 3 months.
sr. member
Activity: 282
Merit: 250
can we get an eta for CGRemote please? cheers.
member
Activity: 112
Merit: 10
feature request:
An option to notify (email) when a block is found/accepted.
This could be usefull for solo miners.
member
Activity: 87
Merit: 10
I check every other day.  Smiley  This is the greatest piece of software for mining since cgminer.  I would consider cgminer incomplete without cgwatcher.  Let's pile some BTC on the effort, folks--kindly consider donations.
Pages:
Jump to: