Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 715. (Read 5805728 times)

hero member
Activity: 518
Merit: 500
When your miners indicate the pool is unavailable can you still remote into them (SSH)?

Yes. Im pretty much always SSH-ed into them, the ssh connection has never broken.  One of the miners is also used as fileserver.
AFAICT, there are no problems on my LAN. And the miners can access the internet just fine, even when that HTTP 503 happens in cgminer.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.

Networking and internet seems fine even when that happens, though its quite possible a brief network quirk triggers it. Restarting routers does not help.  Next time it happens, Ill see if disconnecting and reconnecting ethernet does anything. Anything else worth testing? Like flushing DNS or whatever?

Likely unrelated but the only networking issues I have seen is when my switch got too hot.  I got 4 rigs in the garage running on their own switch.  I found that is room ambient temp got too hot the switch (fanless) would go "crazy" and make the miners drop from the network.  I "solved" it by putting the switch closer to the ground (cooler) and putting a small desktop fan next to it.

When your miners indicate the pool is unavailable can you still remote into them (SSH)?
hero member
Activity: 518
Merit: 500
Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.

Networking and internet seems fine even when that happens, though its quite possible a brief network quirk triggers it. Restarting routers does not help.  Next time it happens, Ill see if disconnecting and reconnecting ethernet does anything. Anything else worth testing? Like flushing DNS or whatever?

As for DNS, not sure if its worth mentioning, but Im using Google's DNS servers.
legendary
Activity: 1022
Merit: 1000
BitMinter
Sent you one BTC to cheer you up  Cheesy
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Am i the only one without problems ?  Wink Win7 32, 2.1.1, 2 rigs, zero problems  Tongue

I haven't had any problems either.
legendary
Activity: 1022
Merit: 1000
BitMinter
Am i the only one without problems ?  Wink Win7 32, 2.1.1, 2 rigs, zero problems  Tongue
hero member
Activity: 518
Merit: 500
Got another "network bug". This time with 2.1.1 (on linux), while 2.0.8 (on windows) did not get it. I was expecting the opposite really. Usually both machines got it simultaneously and I assumed 2.1.1 fixed it. Apparently not.
 Here is the debug output:

Code:
2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:48] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:48] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:48] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:48] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:49] Queueing getwork request to work thread
[2012-01-03 12:31:49] Popping work from get queue to get work
[2012-01-03 12:31:49] Popping work to work thread


[2012-01-03 12:31:50] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:50] 28.0 C  F: 40%(1490RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:53] (5s):0.0 (avg):799.0 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:31:53] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:53] 27.5 C  F: 40%(1493RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:54] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:54] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:54] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:54] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:54] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:54] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:55] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:56] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:56] 27.0 C  F: 40%(1492RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:56] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:56] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:56] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:59] (5s):0.0 (avg):798.9 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:31:59] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:59] 27.0 C  F: 40%(1496RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%




[2012-01-03 12:32:02] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:02] 27.0 C  F: 40%(1496RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:03] HTTP request failed: The requested URL returned error: 503


[2012-01-03 12:32:04] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:32:05] (5s):0.0 (avg):798.8 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:32:05] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:05] 27.0 C  F: 40%(1497RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%

Restarting cgminer fixed it. Both primary and backup pools where working properly AFAICT.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

Would it be beneficial if we get you a 6990?
That would most definitely come under the definition of rhetorical questions. Given 6990s cost more than any other card on the market, I think I know what the likelihood of that happening is, though.

But just to be clear since I haven't answered: of course it would...
sr. member
Activity: 392
Merit: 250
ckvolias : I know everyone is coming at you from a million directions but I have a very strange problem I would love your opinion on or anyone else for that matter that can help.

I have four rigs, I have three of them working fine with CGMINER

this last rig is very problematic, at first I thought it was something wrong with a single card, than a single type of card but now I realize its not the cards

all of my rigs are 890fxa-gd70s MB

this thing
if I have five cards in it - I start it mining with cg miner, within like 30-60 mins

one of the cards the fan will show 0RPM and will show the temp at 127.5c (its ALWAYS  127.5c for some reason)
then the system will freeze up and windows will crash

if I put my hand on the card it does not feel hot at all and I can visibly see the fan spinning at a normal speed

if I move the cards around, or swap the cards out for ones I know work it does not matter

now if I take it down to having four cards on the motherboard instead of five

the system will not crash, and we do not see the 127.5c but eventually one of the cards will display an incorrect fan speed
right now its a 5970, I see it in GPU2 - its displaying 1RPM fan speed, but hashing along at a normal speed with the fan spinning at a normal rate and it does not feel overly hot.

what could be causing this? I am dumbfounded here. Any help would be greatly appreciated and will result in a 1 btc tip for the person that gives me the right answer. Thanks!

I would try managing the fans with msi afterburner and see if it still happens.  Has the pc ever had 11.12 catalyst installed on it? Does not really sound like the problem, but my 6 gpu system was not working until I removed some garbage files that 11.12 left behind before reinstalling 11.11.  https://bitcointalksearch.org/topic/m.655079
hero member
Activity: 896
Merit: 1000
Buy this account on March-2019. New Owner here!!
ckvolias : I know everyone is coming at you from a million directions but I have a very strange problem I would love your opinion on or anyone else for that matter that can help.

I have four rigs, I have three of them working fine with CGMINER

this last rig is very problematic, at first I thought it was something wrong with a single card, than a single type of card but now I realize its not the cards

all of my rigs are 890fxa-gd70s MB

this thing
if I have five cards in it - I start it mining with cg miner, within like 30-60 mins

one of the cards the fan will show 0RPM and will show the temp at 127.5c (its ALWAYS  127.5c for some reason)
then the system will freeze up and windows will crash

if I put my hand on the card it does not feel hot at all and I can visibly see the fan spinning at a normal speed

if I move the cards around, or swap the cards out for ones I know work it does not matter

now if I take it down to having four cards on the motherboard instead of five

the system will not crash, and we do not see the 127.5c but eventually one of the cards will display an incorrect fan speed
right now its a 5970, I see it in GPU2 - its displaying 1RPM fan speed, but hashing along at a normal speed with the fan spinning at a normal rate and it does not feel overly hot.

what could be causing this? I am dumbfounded here. Any help would be greatly appreciated and will result in a 1 btc tip for the person that gives me the right answer. Thanks!
sr. member
Activity: 392
Merit: 250
I have a 6990 and 2x 6970's all set at 955 clock speed, but the 6970's each run about 20-30 hash's behind the 6990. I've tried reinstalling the video drivers. They all run within 1 hash of each other with guiminer so I'm a little lost on what could cause this. Any ideas?
sr. member
Activity: 349
Merit: 250
Feature request...

Code:
 [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  69.5C 4535RPM | 357.0/363.8Mh/s | A:285 R:0 HW:0 U:5.02/m I: 9
 GPU 1:  74.0C         | 366.4/363.9Mh/s | A:299 R:0 HW:0 U:5.26/m I: 9
 GPU 2:  67.5C 4108RPM | 372.9/363.8Mh/s | A:289 R:0 HW:0 U:5.09/m I: 9
 GPU 3:  62.5C         | 366.4/363.7Mh/s | A:262 R:0 HW:0 U:4.61/m I: 9
 GPU 4:  68.0C 3564RPM | 370.8/363.6Mh/s | A:294 R:0 HW:0 U:5.18/m I: 9
 GPU 5:  71.0C         | 340.5/363.6Mh/s | A:318 R:1 HW:0 U:5.60/m I: 9

These are three 5970s.  auto-fan is on with a target of 70C for all, 3C hysteresis.  At this snapshot GPUs 1 and 5 ran 3C-4.5C hotter than their card-mates, and GPU 3 ran 5C cooler than its mate.  I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan.  Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

Would it be beneficial if we get you a 6990?
legendary
Activity: 1500
Merit: 1022
I advocate the Zeitgeist Movement & Venus Project.
Still causing the video driver to fail. I'll trying a clean reinstall and see if that helps.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Over in mining hardware I just whined that I had an instance of a "SICK" GPU even after falling back to pretty vanilla settings of gpu-engine 725 (stock) and gpu-memclock 300 for my 5970s.

Is there any chance that SICK like the following is not a GPU hardware issue?

Code:
[2012-01-02 17:56:39] Thread 2 idle for more than 60 seconds, GPU 2 declared SICK!
[2012-01-02 17:56:39] Attempting to restart GPU
[2012-01-02 17:56:39] Thread 2 still exists, killing it off
[2012-01-02 17:56:39] Thread 8 still exists, killing it off
[2012-01-02 17:56:39] Thread 2 restarted
[2012-01-02 17:56:40] Thread 8 restarted
[2012-01-02 17:56:40] Accepted 00000000.30702585.cb8fdf73 GPU 5 thread 11 pool 0
[2012-01-02 17:56:41] Accepted 00000000.676a69c6.4b59b7db GPU 5 thread 5 pool 0
[2012-01-02 17:56:43] Accepted 00000000.1e5767ae.f669070b GPU 2 thread 2 pool 0  # note how healthy it is now!
Anything's possible, but note that the restart code was tested extensively on literally dozens of GPUs to get this sick restart code working -when possible- and the person who helped me test it had 72 GPUs that would often have boxes going down with any other miner. The idea was to make it recover to a fine state after enough rest if possible.

So yes it's possible. Maybe even likely, who knows, but this particular scenario was not unusual even at normal clocks when some GPUs were run flat out, regardless of which miner it was. Interestingly it became FAR more common with the phatk2 kernel (which is what is used in cgminer) since that seemed to run GPUs that little bit more than anything else.
member
Activity: 266
Merit: 36
Over in mining hardware I just whined that I had an instance of a "SICK" GPU even after falling back to pretty vanilla settings of gpu-engine 725 (stock) and gpu-memclock 300 for my 5970s.

Is there any chance that SICK like the following is not a GPU hardware issue?

Code:
[2012-01-02 17:56:39] Thread 2 idle for more than 60 seconds, GPU 2 declared SICK!
[2012-01-02 17:56:39] Attempting to restart GPU
[2012-01-02 17:56:39] Thread 2 still exists, killing it off
[2012-01-02 17:56:39] Thread 8 still exists, killing it off
[2012-01-02 17:56:39] Thread 2 restarted
[2012-01-02 17:56:40] Thread 8 restarted
[2012-01-02 17:56:40] Accepted 00000000.30702585.cb8fdf73 GPU 5 thread 11 pool 0
[2012-01-02 17:56:41] Accepted 00000000.676a69c6.4b59b7db GPU 5 thread 5 pool 0
[2012-01-02 17:56:43] Accepted 00000000.1e5767ae.f669070b GPU 2 thread 2 pool 0  # note how healthy it is now!
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I'm sick of adding special case command line parameters...
member
Activity: 266
Merit: 36
No, that's actually unnecessary because the ADL does have information about shared thermal devices... interpreting the results would need prodding though.

Sorry, I don't understand.  Interpreting what results?  If you mean additional ADL results, then forgo that and just let the user tell you as I suggested.  Then you already have the temps and the fan speed and can control the latter.  I am suggesting that you use both relevant core temps when calculating a new auto-fan speed for a card instead of just one temp.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Feature request...
... I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan.  Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

I might've been unclear.  I was suggesting that the user have the option to specify to the software, presumably via .conf or command line, that certain GPUs comprise a "fan group," i.e., share a fan, and also which of the group has the fan output and control.  I don't know, something like, in my case,
"fan-group" : "0,1/0, 2,3/2, 4,5/5"
...meaning GPUs 0 and 1 share a fan, the speed of which is readable and controllable via GPU 0; etc.

What I'm thinking of would not require any additional hardware coding, but it would require additional fan-control logic within cgminer.
No, that's actually unnecessary because the ADL does have information about shared thermal devices... interpreting the results would need prodding though.
member
Activity: 266
Merit: 36
Feature request...
... I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan.  Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

I might've been unclear.  I was suggesting that the user have the option to specify to the software, presumably via .conf or command line, that certain GPUs comprise a "fan group," i.e., share a fan, and also which of the group has the fan output and control.  I don't know, something like, in my case,
"fan-group" : "0,1/0, 2,3/2, 4,5/5"
...meaning GPUs 0 and 1 share a fan, the speed of which is readable and controllable via GPU 0; etc.

What I'm thinking of would not require any additional hardware coding, but it would require additional fan-control logic within cgminer.
Jump to: