OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 571.

af_newbie

legendary

Activity: 2702

Merit: 1468

Quote from: MrTeal on May 14, 2012, 01:38:31 PM

Has anyone else noticed that the order of temperatures and fan speed is incorrect sometimes? For instance, this is a rig with 3 cards; a 6870, a 5970 and a 5970 with one bad GPU.

GPU 0 is the 6870, the one with the exceedingly low temperature and no fan speed is the one with the bad GPU.
The GPU order corresponds with what I see in clocktweak.
Reading data:
Adapter#:0 Temp:75 Load:99 Fan:67 Level:2 CoreL0:250 CoreL1:399 CoreL2:900 MemL0:198 MemL1:799 MemL2:800 mVoltL0:950 mVoltL1:999 mVoltL2:1150
Adapter#:3 Temp:62 Load:99 Fan:NA Level:2 CoreL0:250 CoreL1:500 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000
Adapter#:4 Temp:43 Load:99 Fan:NA Level:2 CoreL0:157 CoreL1:399 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000
Adapter#:5 Temp:60 Load:99 Fan:86 Level:2 CoreL0:157 CoreL1:400 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000

In this case, I changed the clock speed on the 5970 with only one GPU (Adapter #4) to 400MHz, and as you can see the hashrate of GPU1 when down instead of GPU2.

It's not just this configuration of cards either, I've noticed this before with 4 GPUs in the system but the cards in different orders.

Check the debug log file. Sometimes ADL does return -1 from fan speed/temp APIs. I've had people reporting it with 6000 series cards (not cgminer, my akbash watchdog) BTW, -1, is "Most likely one or more of the Escape calls to the driver failed".

Not sure why, maybe it is "not always supported" (as their ADL docs says) ?!?. I raised this issue with AMD support. Waiting for their response.

Not sure how re-order would help, ADL APIs use adapter index, not opencl gpu #.

ddd1

full member

Activity: 154

Merit: 100

What should temp-cutoff and overheat be to avoid card throttling down in speed?

I have watercooling and want the app to shutdown if core reaches 60c.

I found this in my cgminer.cfg
"temp-cutoff" : "95", This I put to 60c
"temp-overheat" : "85", This I can keep at 85?
"temp-target" : "75", This I can keep at 75?
^^^^^^^ There is no fans controlled by GPU, it's radiator with fans.

So MAX temps on my 7950 is: Core MAX 52c, VRM MAX 65c.

I'm assuming just changing the "temp-cutoff" to 60 and hysteria at 3 and not changing other stuff incase of waterpump failure the GPU once it reaches 63c will turn off the mining then?

P_Shep

legendary

Activity: 1795

Merit: 1208

This is not OK.

Quote from: kano on May 15, 2012, 09:21:34 PM

Quote from: P_Shep on May 15, 2012, 07:30:15 PM

Another this I've noticed:
When you change the priority of a pool to the top (via the API) and you save the config file, the order of the pools in the config file remains unchanged from is original. Maybe this should change to match the current priority order?

However, if your main pool has a temporary outage, then it may be moved from the top position.
So it would have to take that into account also ...

Unlucky to hit save at that moment, but yeah, possible. Not so sure that's something to worry about.

Just an idea/suggestion anyway.

kano

legendary

Activity: 4634

Merit: 1851

Linux since 1997 RedHat 4

Quote from: P_Shep on May 15, 2012, 07:30:15 PM

Another this I've noticed:
When you change the priority of a pool to the top (via the API) and you save the config file, the order of the pools in the config file remains unchanged from is original. Maybe this should change to match the current priority order?

However, if your main pool has a temporary outage, then it may be moved from the top position.
So it would have to take that into account also ...

P_Shep

legendary

Activity: 1795

Merit: 1208

This is not OK.

Another this I've noticed:
When you change the priority of a pool to the top (via the API) and you save the config file, the order of the pools in the config file remains unchanged from is original. Maybe this should change to match the current priority order?

PolymorphicAssasin

newbie

Activity: 46

Merit: 0

Quote from: PolymorphicAssasin on May 10, 2012, 08:54:39 AM

Quote from: -ck on July 13, 2011, 12:26:34 AM

...
Updated git tree:
I've added dynamic adjustment of intensity for usage on a normal desktop. This flag is enabled by default and tests to ensure the GPU is constantly available for desktop use and will scale intensity down when you are watching a movie, gaming or just about any other intense GPU usage, and scale it up when otherwise idle. This is best disabled on dedicated miners:
...
The difference can be quite dramatic in terms of how much smoother the desktop is, and how much higher the throughput is when it's idle.

I just switched over from Diablo on my work PC, and I have to say this is the one of the coolest things about cgminer. I don't have to kill my miner whenever I'm doing graphic intense work (2x or 3x a day). Kudos.

I'll send you some bitcents when next I access my wallet (from cold storage) Wink

Thanks for some great software!

Finally got around to sending those cents. You should see them in the next block. Thanks again for the great contribution to the community. I'd send more if I was richer. Wink

mdude77

legendary

Activity: 1540

Merit: 1001

Quote from: Inaba on May 14, 2012, 07:18:54 PM

Quote from: mdude77 on May 14, 2012, 07:14:34 PM

Quote from: Inaba on May 13, 2012, 11:09:09 PM

It's a 480 GB SSD, I don't think Spinrite will help

The box is a fairly recent reload, I'm not keen on doing it again. I was hoping someone might have a magic bullet.

Actually, I read recently that spinrite level 1 is read only, and can work wonders on SSDs.

I'm not even sure how that would make sense?

You can read from SSDs almost indefinitely without causing wear. It's the writing that causes problems.

So by doing a level 1 read across the drive, you're forcing the wear logic to realize failing sectors are having problems, which it then swaps out to a spare one.

Steve's words:

Quote

And all of our listeners
just got a new tip for running SpinRite, if you have a drive which, like this - the problem
is that all of the other levels are writing something. Level 1 is a read-only pass. And
that's why it's safe to run on thumb drives, because it doesn't write anything, absolutely
nothing. It only reads.
But the beauty of that is that, as we were saying before, the act of reading shows the
drive it has a problem. And clearly this, whatever was going wacky with this and a couple
other drives that Mike found, writing gave the drive fits, but reading was okay. So
reading was sort of eased into it more gently and allowed the drive to fix the problems so
that then writing to them was writing to different areas because the bad spots had been
relocated to good areas on the drive. So that's a great tip. It'll definitely make it into our
notes for the future.

from: https://www.grc.com/sn/sn-343.pdf

Inaba

legendary

Activity: 1260

Merit: 1000

Quote from: mdude77 on May 14, 2012, 07:14:34 PM

Quote from: Inaba on May 13, 2012, 11:09:09 PM

It's a 480 GB SSD, I don't think Spinrite will help

The box is a fairly recent reload, I'm not keen on doing it again. I was hoping someone might have a magic bullet.

Actually, I read recently that spinrite level 1 is read only, and can work wonders on SSDs.

I'm not even sure how that would make sense?

mdude77

legendary

Activity: 1540

Merit: 1001

Quote from: Inaba on May 13, 2012, 11:09:09 PM

It's a 480 GB SSD, I don't think Spinrite will help

The box is a fairly recent reload, I'm not keen on doing it again. I was hoping someone might have a magic bullet.

Actually, I read recently that spinrite level 1 is read only, and can work wonders on SSDs.

mdude77

legendary

Activity: 1540

Merit: 1001

Quote from: SgtSpike on May 14, 2012, 01:14:58 PM

Quote from: Krak on May 13, 2012, 05:28:34 PM

Quote from: SgtSpike on May 13, 2012, 05:24:58 PM

Question: Is there a way to mine on multiple pools at the same time with CGMiner? In other words, if I have 2 BFL miners, and want to point one to one pool, and another to another pool, how would I go about setting that up?

The --load-balance flag will basically do that for you.

How accurately would it load balance between the pools?

I heard -haven't tried- that it doesn't quite work as expected on balancing. I use it for failover only.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Please take a good look through the readme, and read carefully the extensive documentation on the advanced option --gpu-map.

MrTeal

legendary

Activity: 1274

Merit: 1004

Quote from: af_newbie on May 14, 2012, 01:55:11 PM

Check the debug log file. Sometimes ADL does return -1 from fan speed/temp APIs. I've had people reporting it with 6000 series cards (not cgminer, my akbash watchdog) BTW, -1, is "Most likely one or more of the Escape calls to the driver failed".

Not sure why, maybe it is "not always supported" (as their ADL docs says) ?!?. I raised this issue with AMD support. Waiting for their response.

Not sure how re-order would help, ADL APIs use adapter index, not opencl gpu #.

The lack of RPM for the one 5970 has to do with how it died. The GPU that failed was the one closest to the output, so this card isn't capable of outputting an image, or controlling/reporting on fan speed. The fan is just always pegged at 100.

MrTeal

legendary

Activity: 1274

Merit: 1004

I tried --gpu-reorder, and it did organize them in a more logical layout but it doesn't fix the problem.

Code:

 [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  64.0C 3192RPM | 273.0/274.5Mh/s | A:15 R:0 HW:0 U: 4.82/m I: 8
 GPU 1:  59.0C 4408RPM | 331.0/330.2Mh/s | A:25 R:0 HW:0 U: 8.04/m I: 8
 GPU 2:  60.5C 4403RPM |  88.6/177.9Mh/s | A: 5 R:0 HW:0 U: 1.61/m I: 8
 GPU 3:  36.0C         | 331.3/330.2Mh/s | A:14 R:0 HW:0 U: 4.50/m I: 8

Here, I changed the core of the single GPU 5970 to 200MHz and the voltage to 0.950V. The hashrate dropped on GPU2, but the corresponding drop in temperature happened on GPU3.

Krak

hero member

Activity: 591

Merit: 500

Quote from: MrTeal on May 14, 2012, 01:48:57 PM

No, I haven't. What does the flag do?

Quote from: -ck on July 12, 2011, 10:02:53 PM

--gpu-reorder Attempt to reorder GPU devices according to PCI Bus ID

Although in my experience, it was only necessary when I used Windows. So far it's been accurate without that flag on Ubuntu.

MrTeal

legendary

Activity: 1274

Merit: 1004

Quote from: Krak on May 14, 2012, 01:47:33 PM

Quote from: MrTeal on May 14, 2012, 01:38:31 PM

Has anyone else noticed that the order of temperatures and fan speed is incorrect sometimes? For instance, this is a rig with 3 cards; a 6870, a 5970 and a 5970 with one bad GPU.

...

It's not just this configuration of cards either, I've noticed this before with 4 GPUs in the system but the cards in different orders.

Have you tried using the --gpu-reorder flag?

No, I haven't. What does the flag do?

Krak

hero member

Activity: 591

Merit: 500

Quote from: MrTeal on May 14, 2012, 01:38:31 PM

Has anyone else noticed that the order of temperatures and fan speed is incorrect sometimes? For instance, this is a rig with 3 cards; a 6870, a 5970 and a 5970 with one bad GPU.

...

It's not just this configuration of cards either, I've noticed this before with 4 GPUs in the system but the cards in different orders.

Have you tried using the --gpu-reorder flag?

MrTeal

legendary

Activity: 1274

Merit: 1004

Has anyone else noticed that the order of temperatures and fan speed is incorrect sometimes? For instance, this is a rig with 3 cards; a 6870, a 5970 and a 5970 with one bad GPU.

GPU 0 is the 6870, the one with the exceedingly low temperature and no fan speed is the one with the bad GPU.
The GPU order corresponds with what I see in clocktweak.
Reading data:
Adapter#:0 Temp:75 Load:99 Fan:67 Level:2 CoreL0:250 CoreL1:399 CoreL2:900 MemL0:198 MemL1:799 MemL2:800 mVoltL0:950 mVoltL1:999 mVoltL2:1150
Adapter#:3 Temp:62 Load:99 Fan:NA Level:2 CoreL0:250 CoreL1:500 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000
Adapter#:4 Temp:43 Load:99 Fan:NA Level:2 CoreL0:157 CoreL1:399 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000
Adapter#:5 Temp:60 Load:99 Fan:86 Level:2 CoreL0:157 CoreL1:400 CoreL2:750 MemL0:198 MemL1:199 MemL2:200 mVoltL0:950 mVoltL1:999 mVoltL2:1000

In this case, I changed the clock speed on the 5970 with only one GPU (Adapter #4) to 400MHz, and as you can see the hashrate of GPU1 when down instead of GPU2.

It's not just this configuration of cards either, I've noticed this before with 4 GPUs in the system but the cards in different orders.

Krak

hero member

Activity: 591

Merit: 500

Quote from: SgtSpike on May 14, 2012, 01:14:58 PM

How accurately would it load balance between the pools?

It sounds like it tries to balance the work equally, but I haven't tested it out myself to see how it works.

SgtSpike

legendary

Activity: 1400

Merit: 1005

Quote from: Krak on May 13, 2012, 05:28:34 PM

Quote from: SgtSpike on May 13, 2012, 05:24:58 PM

Question: Is there a way to mine on multiple pools at the same time with CGMiner? In other words, if I have 2 BFL miners, and want to point one to one pool, and another to another pool, how would I go about setting that up?

The --load-balance flag will basically do that for you.

How accurately would it load balance between the pools?

P_Shep

legendary

Activity: 1795

Merit: 1208

This is not OK.

OK, I have a few issues I would like to raise. Now I would look at these myself, but I just don't have the time to analyse/learn the code and do the work myself. I am still looking at it, it's just going to take me some time...

1. If a config file is specified in the cmd line, don't attempt loading from the default location, even if the file specified is invalid. Only attempt to read from the default location if no file is specified.

2. To save a file via the API, a blank parameter (Null text), indicates that the loaded file should be updated... It should save to the file which it uses, whether that's the file specified in the cmd line, or the default location. (Kano is looking into this, I think?).

3. For the BFL (and possibly the other FPGA), when the unit is disabled, either by the user or fault, the worker thread should not be terminated. Comms with the device should be maintained for (at least) 2 reasons:
a) To update the state of the device, Alive or dead etc: no comms attempts means we don't know.
b) To update the temps of the device: no comms mean we can't ask for the temperature.
Disabling the device should just stop new work being sent to it.

4. For the BFL (and possibly the other FPGA), if a device dies and a re-enable request is sent, the device should be re-initialised.
I found that when a BFL throttles, CGminer would report it having zero hashes and disable it. Hitting re-enable didn't do anything, but restarting CGminer would bring it back to life. This suggests the BFL just needs re-initialising.

Also,
I see that when work is submitted to the BFL, it waits 4500ms then start's polling at 10ms intervals for work. I think there should be a timeout at maybe 10s (BFL say that work should take 5.125s so 10s is ample) when the BFL is declared sick, and is re-initialised.

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 571. (Read 5806088 times)