OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 715.

P4man

hero member

Activity: 518

Merit: 500

Quote from: DeathAndTaxes on January 03, 2012, 09:07:14 AM

When your miners indicate the pool is unavailable can you still remote into them (SSH)?

Yes. Im pretty much always SSH-ed into them, the ssh connection has never broken. One of the miners is also used as fileserver.
AFAICT, there are no problems on my LAN. And the miners can access the internet just fine, even when that HTTP 503 happens in cgminer.

DeathAndTaxes

donator

Activity: 1218

Merit: 1079

Gerald Davis

Quote from: P4man on January 03, 2012, 09:01:17 AM

Quote from: -ck on January 03, 2012, 08:30:20 AM

Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.

Networking and internet seems fine even when that happens, though its quite possible a brief network quirk triggers it. Restarting routers does not help. Next time it happens, Ill see if disconnecting and reconnecting ethernet does anything. Anything else worth testing? Like flushing DNS or whatever?

Likely unrelated but the only networking issues I have seen is when my switch got too hot. I got 4 rigs in the garage running on their own switch. I found that is room ambient temp got too hot the switch (fanless) would go "crazy" and make the miners drop from the network. I "solved" it by putting the switch closer to the ground (cooler) and putting a small desktop fan next to it.

When your miners indicate the pool is unavailable can you still remote into them (SSH)?

P4man

hero member

Activity: 518

Merit: 500

Quote from: -ck on January 03, 2012, 08:30:20 AM

Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.

Networking and internet seems fine even when that happens, though its quite possible a brief network quirk triggers it. Restarting routers does not help. Next time it happens, Ill see if disconnecting and reconnecting ethernet does anything. Anything else worth testing? Like flushing DNS or whatever?

As for DNS, not sure if its worth mentioning, but Im using Google's DNS servers.

Turbor

legendary

Activity: 1022

Merit: 1000

BitMinter

Sent you one BTC to cheer you up Cheesy

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Yay thank goodness.

By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.

DeathAndTaxes

donator

Activity: 1218

Merit: 1079

Gerald Davis

Quote from: Turbor on January 03, 2012, 08:23:52 AM

Am i the only one without problems ? Wink

Win7 32, 2.1.1, 2 rigs, zero problems Tongue

I haven't had any problems either.

Turbor

legendary

Activity: 1022

Merit: 1000

BitMinter

Am i the only one without problems ? Wink

Win7 32, 2.1.1, 2 rigs, zero problems Tongue

P4man

hero member

Activity: 518

Merit: 500

Got another "network bug". This time with 2.1.1 (on linux), while 2.0.8 (on windows) did not get it. I was expecting the opposite really. Usually both machines got it simultaneously and I assumed 2.1.1 fixed it. Apparently not.
Here is the debug output:

Code:

2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:48] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:48] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:48] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:48] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:48] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:49] Queueing getwork request to work thread
[2012-01-03 12:31:49] Popping work from get queue to get work
[2012-01-03 12:31:49] Popping work to work thread


[2012-01-03 12:31:50] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:50] 28.0 C  F: 40%(1490RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:53] (5s):0.0 (avg):799.0 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:31:53] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:53] 27.5 C  F: 40%(1493RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:54] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:54] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:54] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:54] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:54] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:54] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:55] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:56] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:56] 27.0 C  F: 40%(1492RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:56] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:31:56] Failed json_rpc_call in get_upstream_work
[2012-01-03 12:31:56] json_rpc_call failed on get work, retry after 155 seconds
[2012-01-03 12:31:59] (5s):0.0 (avg):798.9 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:31:59] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:31:59] 27.0 C  F: 40%(1496RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%




[2012-01-03 12:32:02] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:02] 27.0 C  F: 40%(1496RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:03] HTTP request failed: The requested URL returned error: 503


[2012-01-03 12:32:04] HTTP request failed: The requested URL returned error: 503
[2012-01-03 12:32:05] (5s):0.0 (avg):798.8 Mh/s | Q:14417  A:11676  R:2  HW:0  E:81%  U:10.85/m
[2012-01-03 12:32:05] 19.5 C  F: 40%(-1RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%
[2012-01-03 12:32:05] 27.0 C  F: 40%(1497RPM)  E: 157MHz  M: 300Mhz  V: 0.950V  A: 0%  P: 0%

Restarting cgminer fixed it. Both primary and backup pools where working properly AFAICT.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: tnkflx on January 03, 2012, 04:15:42 AM

Quote from: -ck on January 02, 2012, 09:32:58 PM

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

Would it be beneficial if we get you a 6990?

That would most definitely come under the definition of rhetorical questions. Given 6990s cost more than any other card on the market, I think I know what the likelihood of that happening is, though.

But just to be clear since I haven't answered: of course it would...

cuz0882

sr. member

Activity: 392

Merit: 250

Quote from: cablepair on January 03, 2012, 07:10:01 AM

ckvolias : I know everyone is coming at you from a million directions but I have a very strange problem I would love your opinion on or anyone else for that matter that can help.

I have four rigs, I have three of them working fine with CGMINER

this last rig is very problematic, at first I thought it was something wrong with a single card, than a single type of card but now I realize its not the cards

all of my rigs are 890fxa-gd70s MB

this thing
if I have five cards in it - I start it mining with cg miner, within like 30-60 mins

one of the cards the fan will show 0RPM and will show the temp at 127.5c (its ALWAYS 127.5c for some reason)
then the system will freeze up and windows will crash

if I put my hand on the card it does not feel hot at all and I can visibly see the fan spinning at a normal speed

if I move the cards around, or swap the cards out for ones I know work it does not matter

now if I take it down to having four cards on the motherboard instead of five

the system will not crash, and we do not see the 127.5c but eventually one of the cards will display an incorrect fan speed
right now its a 5970, I see it in GPU2 - its displaying 1RPM fan speed, but hashing along at a normal speed with the fan spinning at a normal rate and it does not feel overly hot.

what could be causing this? I am dumbfounded here. Any help would be greatly appreciated and will result in a 1 btc tip for the person that gives me the right answer. Thanks!

I would try managing the fans with msi afterburner and see if it still happens. Has the pc ever had 11.12 catalyst installed on it? Does not really sound like the problem, but my 6 gpu system was not working until I removed some garbage files that 11.12 left behind before reinstalling 11.11. https://bitcointalksearch.org/topic/m.655079

cablepair

hero member

Activity: 896

Merit: 1000

Buy this account on March-2019. New Owner here!!

ckvolias : I know everyone is coming at you from a million directions but I have a very strange problem I would love your opinion on or anyone else for that matter that can help.

I have four rigs, I have three of them working fine with CGMINER

this last rig is very problematic, at first I thought it was something wrong with a single card, than a single type of card but now I realize its not the cards

all of my rigs are 890fxa-gd70s MB

this thing
if I have five cards in it - I start it mining with cg miner, within like 30-60 mins

one of the cards the fan will show 0RPM and will show the temp at 127.5c (its ALWAYS 127.5c for some reason)
then the system will freeze up and windows will crash

if I put my hand on the card it does not feel hot at all and I can visibly see the fan spinning at a normal speed

if I move the cards around, or swap the cards out for ones I know work it does not matter

now if I take it down to having four cards on the motherboard instead of five

the system will not crash, and we do not see the 127.5c but eventually one of the cards will display an incorrect fan speed
right now its a 5970, I see it in GPU2 - its displaying 1RPM fan speed, but hashing along at a normal speed with the fan spinning at a normal rate and it does not feel overly hot.

what could be causing this? I am dumbfounded here. Any help would be greatly appreciated and will result in a 1 btc tip for the person that gives me the right answer. Thanks!

cuz0882

sr. member

Activity: 392

Merit: 250

I have a 6990 and 2x 6970's all set at 955 clock speed, but the 6970's each run about 20-30 hash's behind the 6990. I've tried reinstalling the video drivers. They all run within 1 hash of each other with guiminer so I'm a little lost on what could cause this. Any ideas?

tnkflx

sr. member

Activity: 349

Merit: 250

Quote from: -ck on January 02, 2012, 09:32:58 PM

Quote from: Proofer on January 01, 2012, 01:23:30 PM

Feature request...

Code:

 [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  69.5C 4535RPM | 357.0/363.8Mh/s | A:285 R:0 HW:0 U:5.02/m I: 9
 GPU 1:  74.0C         | 366.4/363.9Mh/s | A:299 R:0 HW:0 U:5.26/m I: 9
 GPU 2:  67.5C 4108RPM | 372.9/363.8Mh/s | A:289 R:0 HW:0 U:5.09/m I: 9
 GPU 3:  62.5C         | 366.4/363.7Mh/s | A:262 R:0 HW:0 U:4.61/m I: 9
 GPU 4:  68.0C 3564RPM | 370.8/363.6Mh/s | A:294 R:0 HW:0 U:5.18/m I: 9
 GPU 5:  71.0C         | 340.5/363.6Mh/s | A:318 R:1 HW:0 U:5.60/m I: 9

These are three 5970s. auto-fan is on with a target of 70C for all, 3C hysteresis. At this snapshot GPUs 1 and 5 ran 3C-4.5C hotter than their card-mates, and GPU 3 ran 5C cooler than its mate. I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan. Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

Would it be beneficial if we get you a 6990?

LightRider

legendary

Activity: 1500

Merit: 1022

I advocate the Zeitgeist Movement & Venus Project.

Still causing the video driver to fail. I'll trying a clean reinstall and see if that helps.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: Proofer on January 02, 2012, 11:08:47 PM

Over in mining hardware I just whined that I had an instance of a "SICK" GPU even after falling back to pretty vanilla settings of gpu-engine 725 (stock) and gpu-memclock 300 for my 5970s.

Is there any chance that SICK like the following is not a GPU hardware issue?

Code:

[2012-01-02 17:56:39] Thread 2 idle for more than 60 seconds, GPU 2 declared SICK!
[2012-01-02 17:56:39] Attempting to restart GPU
[2012-01-02 17:56:39] Thread 2 still exists, killing it off
[2012-01-02 17:56:39] Thread 8 still exists, killing it off
[2012-01-02 17:56:39] Thread 2 restarted
[2012-01-02 17:56:40] Thread 8 restarted
[2012-01-02 17:56:40] Accepted 00000000.30702585.cb8fdf73 GPU 5 thread 11 pool 0
[2012-01-02 17:56:41] Accepted 00000000.676a69c6.4b59b7db GPU 5 thread 5 pool 0
[2012-01-02 17:56:43] Accepted 00000000.1e5767ae.f669070b GPU 2 thread 2 pool 0  # note how healthy it is now!

Anything's possible, but note that the restart code was tested extensively on literally dozens of GPUs to get this sick restart code working -when possible- and the person who helped me test it had 72 GPUs that would often have boxes going down with any other miner. The idea was to make it recover to a fine state after enough rest if possible.

So yes it's possible. Maybe even likely, who knows, but this particular scenario was not unusual even at normal clocks when some GPUs were run flat out, regardless of which miner it was. Interestingly it became FAR more common with the phatk2 kernel (which is what is used in cgminer) since that seemed to run GPUs that little bit more than anything else.

Proofer

member

Activity: 266

Merit: 36

Over in mining hardware I just whined that I had an instance of a "SICK" GPU even after falling back to pretty vanilla settings of gpu-engine 725 (stock) and gpu-memclock 300 for my 5970s.

Is there any chance that SICK like the following is not a GPU hardware issue?

Code:

[2012-01-02 17:56:39] Thread 2 idle for more than 60 seconds, GPU 2 declared SICK!
[2012-01-02 17:56:39] Attempting to restart GPU
[2012-01-02 17:56:39] Thread 2 still exists, killing it off
[2012-01-02 17:56:39] Thread 8 still exists, killing it off
[2012-01-02 17:56:39] Thread 2 restarted
[2012-01-02 17:56:40] Thread 8 restarted
[2012-01-02 17:56:40] Accepted 00000000.30702585.cb8fdf73 GPU 5 thread 11 pool 0
[2012-01-02 17:56:41] Accepted 00000000.676a69c6.4b59b7db GPU 5 thread 5 pool 0
[2012-01-02 17:56:43] Accepted 00000000.1e5767ae.f669070b GPU 2 thread 2 pool 0  # note how healthy it is now!

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

I'm sick of adding special case command line parameters...

Proofer

member

Activity: 266

Merit: 36

Quote from: -ck on January 02, 2012, 10:53:04 PM

No, that's actually unnecessary because the ADL does have information about shared thermal devices... interpreting the results would need prodding though.

Sorry, I don't understand. Interpreting what results? If you mean additional ADL results, then forgo that and just let the user tell you as I suggested. Then you already have the temps and the fan speed and can control the latter. I am suggesting that you use both relevant core temps when calculating a new auto-fan speed for a card instead of just one temp.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: Proofer on January 02, 2012, 10:50:46 PM

Quote from: -ck on January 02, 2012, 09:32:58 PM

Quote from: Proofer on January 01, 2012, 01:23:30 PM

Feature request...
... I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan. Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

I might've been unclear. I was suggesting that the user have the option to specify to the software, presumably via .conf or command line, that certain GPUs comprise a "fan group," i.e., share a fan, and also which of the group has the fan output and control. I don't know, something like, in my case,
"fan-group" : "0,1/0, 2,3/2, 4,5/5"
...meaning GPUs 0 and 1 share a fan, the speed of which is readable and controllable via GPU 0; etc.

What I'm thinking of would not require any additional hardware coding, but it would require additional fan-control logic within cgminer.

No, that's actually unnecessary because the ADL does have information about shared thermal devices... interpreting the results would need prodding though.

Proofer

member

Activity: 266

Merit: 36

Quote from: -ck on January 02, 2012, 09:32:58 PM

Quote from: Proofer on January 01, 2012, 01:23:30 PM

Feature request...
... I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan. Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.

I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...

I might've been unclear. I was suggesting that the user have the option to specify to the software, presumably via .conf or command line, that certain GPUs comprise a "fan group," i.e., share a fan, and also which of the group has the fan output and control. I don't know, something like, in my case,
"fan-group" : "0,1/0, 2,3/2, 4,5/5"
...meaning GPUs 0 and 1 share a fan, the speed of which is readable and controllable via GPU 0; etc.

What I'm thinking of would not require any additional hardware coding, but it would require additional fan-control logic within cgminer.

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 715. (Read 5805728 times)