Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 614. (Read 5805874 times)

legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
poclbm should give you the best performance on a 7970 if using 12.2 and 2.6 SDK
If you are using something else - change to 12.2/2.6 Smiley
(Installing 12.2 includes the 2.6 SDK)
To select the poclbm kernel you add: -k poclbm
full member
Activity: 210
Merit: 100
Hi, I'm sure I am doing something really stupid but I can't get my 3x 7970s over 200 MHash/S each.
Running 12.2 driver and the newest version of cgminer.
Is there anything extra I need to do for 7970? I used cgminner for all my other cards. Thanks.
Wow, with that much information I can only suggest that you change ANYTHING, starting with cgminer kernel configuration and ending with the whole OS Roll Eyes

EDIT::In case you missed it, a lot of GCN-related info can be found here and here.
hero member
Activity: 630
Merit: 500
Alright, my previous attempt at giving info on new drivers in this thread was squashed...let's try again.  Wink

AMD Cataliyst 12.4 OpenCL 1.2 (8.960.0 March 15) AMD Official BETA

http://forums.guru3d.com/showthread.php?t=360362
http://developer.amd.com/Downloads/OpenCL1.2betadriversWindows.exe

Edit:  amdocl(64).dll is version 10.0.923.1
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
As a follow up... I decided to try a different pool aside from gpumax.  I've ran stable with overclocks for about 5 hours now.  Seems like some problem with cgminer and gpumax, but only on my 5970 rigs.  I have the same version of cgminer running on the same linux version on my two 7970 rigs and it seems to work fine....  so it appears to be isolated to my dual gpu 5970 rigs...
Auto fan control was known broken in the last release of cgminer for 5970 and this was causing problems for people on 5970s with spontaneous restarts due to unexpected overheats. I don't think I released a newer release version with the fix for it. If you download and build the latest git tarball and build from that you can get that fix. Alternatively, disabling auto fan control should have the same effect.
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
I am supposed to put some switch before the number 4?

Code:
# echo -n "config" | nc -4 127.0.0.1 4028 ; echo
nc: invalid option -- '4'
nc -h for help

Well you can ignore the -4 option (remove it) - but I guess you must have a really old version of nc?
What does "uname -r" say on that computer? (and what OS version is it?)
sr. member
Activity: 309
Merit: 250
I am supposed to put some switch before the number 4?

Code:
# echo -n "config" | nc -4 127.0.0.1 4028 ; echo
nc: invalid option -- '4'
nc -h for help
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
Then also the simple check about ADL being enabled - you do see GPU Temp/RPM as you said before?

If you don't have network API access enabled, but do have just --api-listen (like your command line shows) then on each machine:
Code:
echo -n "config" | nc -4 127.0.0.1 4028 ; echo

... and compare all 3 computers.

I'm actually only asking this coz it seems strange that the GPUs are running amok and it really does sound similar to ADL not working.
But of course you have already said it is working but ... well ... that command is another way to verify it.

I guess the other possibility is that when gpumax does let the cards go idle from a getwork perspective of having nothing to do, something in cgminer could be getting confused about the cards status ... but that is just a guess since (as I've mentioned before) I've not looked closely at the internal Driver/ADL code that handles the card status/problems (and I got that idle bit wrong before as ckolivas pointed out)
If I get a chance today to add storing some of that info and making it available in the API I may know a bit more about it then Smiley
sr. member
Activity: 309
Merit: 250
As a follow up... I decided to try a different pool aside from gpumax.  I've ran stable with overclocks for about 5 hours now.  Seems like some problem with cgminer and gpumax, but only on my 5970 rigs.  I have the same version of cgminer running on the same linux version on my two 7970 rigs and it seems to work fine....  so it appears to be isolated to my dual gpu 5970 rigs...
sr. member
Activity: 309
Merit: 250
I ran at stock gpu clocks for 14 minutes and had restarts on half (3) of my gpus during this short time, increasing ram used from 175 meg to 260 meg.

Before I stopped it, I checked GPU status and only GPU 5 had its "Last Initialized" time updated.... From this screen, all other gpus appeared to never have restarted.  However, going through the log, I can see that gpu 0, 2, and 5 restarted.   The entire log file is about 1 meg... 100k zipped.  I can email it or post it somewhere if you want to look at the whole thing.

Code:
root@skynet:~# cat run.20120321105206.19993.log | grep idle
[2012-03-21 10:55:22] Device 0 idle for more than 60 seconds, GPU 0 declared SICK!
[2012-03-21 10:56:52] Device 2 idle for more than 60 seconds, GPU 2 declared SICK!
[2012-03-21 10:57:11] Device 0 idle for more than 60 seconds, GPU 0 declared SICK!
[2012-03-21 11:03:52] Device 5 idle for more than 60 seconds, GPU 5 declared SICK!
root@skynet:~# cat run.20120321105206.19993.log | grep restart
[2012-03-21 10:55:22] Attempting to restart GPU
[2012-03-21 10:55:23] Thread 0 restarted
[2012-03-21 10:55:24] Thread 1 restarted
[2012-03-21 10:56:52] Attempting to restart GPU
[2012-03-21 10:56:53] Thread 4 restarted
[2012-03-21 10:56:54] Thread 5 restarted
[2012-03-21 10:57:11] Attempting to restart GPU
[2012-03-21 10:57:12] Thread 0 restarted
[2012-03-21 10:57:13] Thread 1 restarted
[2012-03-21 11:03:52] Attempting to restart GPU
[2012-03-21 11:03:53] Thread 10 restarted
[2012-03-21 11:03:54] Thread 11 restarted

GPU 0: 331.0 / 283.9 Mh/s | A:54  R:0  HW:0  U:3.90/m  I:8
73.5 C  F: 60% (3630 RPM)  E: 725 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 10:52:09]
Intensity: 8
Thread 0: 165.6 Mh/s Enabled ALIVE
Thread 1: 165.0 Mh/s Enabled ALIVE

GPU 1: 329.8 / 335.7 Mh/s | A:62  R:0  HW:0  U:4.48/m  I:8
71.5 C  F: 60% (3630 RPM)  E: 725 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 10:52:12]
Intensity: 8
Thread 2: 164.4 Mh/s Enabled ALIVE
Thread 3: 165.3 Mh/s Enabled ALIVE

GPU 2: 328.8 / 307.9 Mh/s | A:60  R:0  HW:0  U:4.34/m  I:8
73.0 C  F: 54% (3329 RPM)  E: 725 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 10:52:14]
Intensity: 8
Thread 4: 163.9 Mh/s Enabled ALIVE
Thread 5: 165.3 Mh/s Enabled ALIVE

GPU 3: 329.7 / 334.3 Mh/s | A:60  R:0  HW:0  U:4.34/m  I:8
74.0 C  F: 54% (3326 RPM)  E: 725 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 10:52:16]
Intensity: 8
Thread 6: 164.9 Mh/s Enabled ALIVE
Thread 7: 164.9 Mh/s Enabled ALIVE

GPU 4: 335.3 / 336.7 Mh/s | A:61  R:0  HW:0  U:4.41/m  I:8
73.0 C  F: 79% (4378 RPM)  E: 735 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 10:52:19]
Intensity: 8
Thread 8: 168.0 Mh/s Enabled ALIVE
Thread 9: 165.6 Mh/s Enabled ALIVE

GPU 5: 334.1 / 309.8 Mh/s | A:61  R:0  HW:0  U:4.41/m  I:8
74.5 C  F: 79% (4378 RPM)  E: 735 MHz  M: 240 Mhz  V: 1.050V  A: 99% P: 0%
Last initialised: [2012-03-21 11:03:54]
Intensity: 8
Thread 10: 177.9 Mh/s Enabled ALIVE
Thread 11: 158.3 Mh/s Enabled ALIVE


legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4

Obviously, something is wrong since 2> and 2>> are the official ways to log cgminer activity.
Do you know what the mistake is?
You're galloping headfirst, launching screen and cgminer in a single step. Care to take a guess which of the two subroutines you're logging the output of?
Gotta slow down a bit there, cowboy Cheesy


Thanks Jake!  Surprisingly, that started logging  Tongue

That is in a file for a startup script...  How do I start cgminer attached to screen x in two lines?

My cgminer.sh looks like this:
Code:
#!/bin/sh
#
now="`date +%Y%m%d%H%M%S`"
#
./cgminer-231j -S /dev/ttyUSB0 -S /dev/ttyUSB1 -Q 4 --api-port 4028 --api-listen --api-allow W:127.0.0.1,W:192.168.7.0/24 --api-description Subaru -I 9 --submit-stale --auto-fan --auto-gpu --gpu-engine 900 --gpu-memclock 775 --gpu-memdiff -125 --temp-target 70 "$@" 2> run.$now.$$.log
Just in case you wanted a hint Smiley

You must of course also
Code:
chmod +x cgminer.sh

The "$@" means I can add more arguments - like pool configurations files e.g. "./cgminer.sh -c pool.json"

Edit: so in case it wasn't obvious Smiley
Code:
/usr/bin/screen -dmS cgminer /opt/miners/cgminer/cgminer.sh
and of course cgminer.sh would have all your normal options before the "$@"
sr. member
Activity: 309
Merit: 250

Obviously, something is wrong since 2> and 2>> are the official ways to log cgminer activity.
Do you know what the mistake is?
You're galloping headfirst, launching screen and cgminer in a single step. Care to take a guess which of the two subroutines you're logging the output of?
Gotta slow down a bit there, cowboy Cheesy


Thanks Jake!  Surprisingly, that started logging  Tongue

That is in a file for a startup script...  How do I start cgminer attached to screen x in two lines?
full member
Activity: 210
Merit: 100
...
Code:
/usr/bin/screen -dmS cgminer /opt/miners/cgminer/cgminer -D --verbose -Q 2 --api-listen --auto-fan --temp-target 75 -I 8 -o http://x http://x 2> "/root/run.`date +%Y%m%d%H%M%S`.$$.log"
...
The logfile was created, but is empty...

Quote
-rw-r--r-- 1 root root       0 Mar 20 16:20 run.20120320162057.3911.log
-rw-r--r-- 1 root root       0 Mar 20 16:33 run.20120320163321.3713.log
-rw-r--r-- 1 root root       0 Mar 20 23:04 run.20120320230402.8150.log

Obviously, something is wrong since 2> and 2>> are the official ways to log cgminer activity.
Do you know what the mistake is?
You're galloping headfirst, launching screen and cgminer in a single step. Care to take a guess which of the two subroutines you're logging the output of?
Gotta slow down a bit there, cowboy Cheesy
sr. member
Activity: 392
Merit: 250
Is it efficient to use balance or rotation mode? I would like to get text alerts from a pool but I don't like putting on my hashing power there. Balance seems like a good option if LP still works correctly. Does it matter how many pools are used?
sr. member
Activity: 309
Merit: 250
When it comes to the GPU Driver/ADL I'm not too helpful.

As I mentioned before I'd guess that turning on debug (-D) and verbose (--verbose) and logging the output might help shed some light

As I sorta suggested before I'll look into making a change to get the actual numbers of bad events recorded against each device and probably also the details of the last event for each device if that is possible.
Then make this available through the API with a new command (e.g. something like 'events')
This may be of help to more easily see problems like this (though it may not help resolve them)

Edit: Though - it will of course have to be on my git until ckolivas is back in action (and give me a couple of days also)
I turned on -D and--verbose using this string and running at stock:

Code:
/usr/bin/screen -dmS cgminer /opt/miners/cgminer/cgminer -D --verbose -Q 2 --api-listen --auto-fan --temp-target 75 -I 8 -o http://x http://x 2> "/root/run.`date +%Y%m%d%H%M%S`.$$.log"

cgminer started at 175 meg resident and 415 meg virtual ram utilized.  After 4 hours, its at 284 meg resident and 607 meg virtual...  gpu5 threads were the only ones restarted according to the timestamp.

The logfile was created, but is empty...

Quote
-rw-r--r-- 1 root root       0 Mar 20 16:20 run.20120320162057.3911.log
-rw-r--r-- 1 root root       0 Mar 20 16:33 run.20120320163321.3713.log
-rw-r--r-- 1 root root       0 Mar 20 23:04 run.20120320230402.8150.log


is a stderr message supposed to be generated when the thread is restarted?
legendary
Activity: 4354
Merit: 3614
what is this "brake pedal" you speak of?
Well the API 10004 error is interesting ...

According to Microsoft it can't happen coz the function that causes it has been removed (and I don't call it anyway) Tongue

Though, it looks exactly like a network problem with the computer?

they are all on gigabit, hardwired. all can see each other fine, matter of fact my HTPC (with the 6770) has a lot of media on the 6870 computers drives as the HTPC is low on drive space. network (seems to anyway) run flawlessly. terracopy (which verifies transfers with MD5) copies multi gig files across the network perfectly all the time.

and, no transfers were in progress, so network saturation shouldnt be it. this latest crash was while we were at work, and previously its crashed in the middle of the night while nothing was using network except the miners. not that multigig transfers have ever bothered the miners before.

router and the 6770/6870 rigs are on UPSs too.

damn thing is the 5830 is on a POS computer, no UPS, longest ethernet run with the lowest quality cable. and it runs perfect. but with 2.2.7

go figure, eh?

switched the 6870 back to 2.2.7. left the 6770 on 2.3.1. Ill see what happens tomorrow
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
been getting this error on 2 of my miners lately. they run about 6 hours every time too. been crashing like clockwork for a few days now

[regular log stuff, then suddenly this]
[2012-03-20 14:04:09] Failed to create submit_work_thread
[lots of statistics]
[2012-03-20 14:04:09] API failed (Socket Error: (10004) Interrupted system call) - API will not be available
[2012-03-20 14:04:09] longpoll failed for http://api.bitcoin.cz:8408, sleeping for 30s
and thats the end, they are then sitting at "press any key to continue . . ."

same exact error on both

they are both running 2.3.1, no special flags aside from autoclock and autofan settings.

my 6870 is win7 64 bit, 11.11 driver, 2.5 sdk, and the 6770 is vista 32, 12.1, 2.3 sdk. clean installs, I never reuse bins, and delete bins when upgrading drivers and/or sdks. cgminer is always installed fresh, never over itself.

my 5830 on 11.4, 2.1 and XP with cgminer 2.2.7 runs perfect. same pool settings as the 6870 and 6770, so if its the longpoll fail thats killing the 2.3.1 miners the 2.2.7 version is OK with whatever it is.. is longpoll handled differently between 2.2.7 and 2.3.1?

any ideas?
Well the API 10004 error is interesting ...
According to Microsoft it can't happen coz the function that causes it has been removed (and I don't call it anyway) Tongue

Though, it looks exactly like a network problem with the computer?
legendary
Activity: 4354
Merit: 3614
what is this "brake pedal" you speak of?
been getting this error on 2 of my miners lately. they run about 6 hours every time too. been crashing like clockwork for a few days now

[regular log stuff, then suddenly this]
[2012-03-20 14:04:09] Failed to create submit_work_thread
[lots of statistics]
[2012-03-20 14:04:09] API failed (Socket Error: (10004) Interrupted system call) - API will not be available
[2012-03-20 14:04:09] longpoll failed for http://api.bitcoin.cz:8408, sleeping for 30s
and thats the end, they are then sitting at "press any key to continue . . ."

same exact error on both

they are both running 2.3.1, no special flags aside from autoclock and autofan settings.

my 6870 is win7 64 bit, 11.11 driver, 2.5 sdk, and the 6770 is vista 32, 12.1, 2.3 sdk. clean installs, I never reuse bins, and delete bins when upgrading drivers and/or sdks. cgminer is always installed fresh, never over itself.

my 5830 on 11.4, 2.1 and XP with cgminer 2.2.7 runs perfect. same pool settings as the 6870 and 6770, so if its the longpoll fail thats killing the 2.3.1 miners the 2.2.7 version is OK with whatever it is.. is longpoll handled differently between 2.2.7 and 2.3.1?

any ideas?
sr. member
Activity: 309
Merit: 250
When it comes to the GPU Driver/ADL I'm not too helpful.

As I mentioned before I'd guess that turning on debug (-D) and verbose (--verbose) and logging the output might help shed some light

As I sorta suggested before I'll look into making a change to get the actual numbers of bad events recorded against each device and probably also the details of the last event for each device if that is possible.
Then make this available through the API with a new command (e.g. something like 'events')
This may be of help to more easily see problems like this (though it may not help resolve them)

Edit: Though - it will of course have to be on my git until ckolivas is back in action (and give me a couple of days also)

Thanks.  I'm starting it with the debugging going.  I planned to run it at stock... and i didnt throw overclock values in the start command, but somehow it is running with my old overclock settings.... these are the commands I used to start
Any idea how my old overclock values are getting set when I am not specifying them in the CLI?

EDIT:  Nevermind.... I had switched to phoenix while i was working out the issue with cgminer and had set the overclocks there.... rebooted and everything is running stock now.
sr. member
Activity: 337
Merit: 252
Since perhaps two weeks I've had a problem with the aggregate cgminer stats when starting up. What happens is basically this: after 50-70 accepted shares, the total accepted shares ("A") counter stops incrementing. The "A" counter for each individual GPU keeps incrementing but on the other hand the uploaded/min "U" just increases without limit. If I restart a few times it eventually works beyond 70 shares and after that it keeps working.

When it happens only the stats seem to be wrong, the actual hashing keeps working.

Anybody else seing this problem?
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
The GPU's are showing Temp/RPM as you can see - so yep that means ADL is working.
That's all I was thinking there that maybe ADL wasn't able to get the Temp/RPM for some reason (even if the DISPLAY was set) so the cards were failing all the time.

There's no device failure statistics in cgminer other than HW which is 0 in your case.
Send ckolivas some BTC and ask him to implement it (or if he's not expecting to be able to do that soon - send it to me Smiley)
Then I could add a device status command in the API to return those numbers.

Edit: hmm the thread does have the timestamp it was last sick though ...

So... I still have this problem with threads being restarted and eating up memory.  Even when I run at stock speeds for all gpus.  The thread for GPU5 keeps getting restarted on both my 5970 rigs... i even tried underclocking gpu 5 on both rigs and it still occurred.  Phoenix seems to work fine.... should i try a different cgminer build?  I assume I'm the only one with this issue which makes it likely some configuration/hardware problem on my side, but I'm not sure how to isolate it.
When it comes to the GPU Driver/ADL I'm not too helpful.

As I mentioned before I'd guess that turning on debug (-D) and verbose (--verbose) and logging the output might help shed some light

As I sorta suggested before I'll look into making a change to get the actual numbers of bad events recorded against each device and probably also the details of the last event for each device if that is possible.
Then make this available through the API with a new command (e.g. something like 'events')
This may be of help to more easily see problems like this (though it may not help resolve them)

Edit: Though - it will of course have to be on my git until ckolivas is back in action (and give me a couple of days also)
Jump to: