Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 629. (Read 5805546 times)

full member
Activity: 174
Merit: 100
The ADL library isn't just randomly dying here. It happens at 200k accepted shares (this was easy for me to recognize since I have a 32 GH/s mining farm). I'm just pointing out my observation so hopefully it gets fixed.

This time.  Has it died @ 200K previously too?  Does it die at 200K each time? If it is a repeatable and verifiable bug that is more useful than just it dying once @ 200K (which may not have any significance).

Hrm, I looked back over my 24 rigs and I'm afraid I was wrong. It doesn't always happen 200k but the bug is occurring on all of them within hours of each other after running for about a week straight with 4 GPU's per rig. It will take some time but i'll try to identify the exact trigger.
vip
Activity: 1358
Merit: 1000
AKA: gigavps
Kano & Conman,

I have my BFL single running on BAMT with cgminer but it would seem that the API call "devs" does not report back about the BFL single. Here is the output from the "devs" call.

Code:
     STATUS=S
    ,Code=9
    ,Msg=2 GPU(s)
    ,Description=cgminer 2.2.7
        |GPU=0,Enabled=Y,Status=Alive,Temperature=73.50,Fan Speed=2355,Fan Percent=48,GPU Clock=895,Memory Clock=1000,GPU Voltage=1.100,GPU Activity=99,Powertune=0,MHS av=364.64,MHS 5s=368.09,Accepted=81,Rejected=0,Hardware Errors=0,Utility=4.87,Intensity=9,Last Share Pool=0,Last Share Time=1330436872
        |GPU=1,Enabled=Y,Status=Alive,Temperature=73.50,Fan Speed=1892,Fan Percent=42,GPU Clock=895,Memory Clock=1000,GPU Voltage=1.100,GPU Activity=99,Powertune=0,MHS av=363.33,MHS 5s=367.82,Accepted=73,Rejected=0,Hardware Errors=0,Utility=4.38,Intensity=9,Last Share Pool=0,Last Share Time=1330436857

So the devs call is missing all BFL info.
donator
Activity: 1218
Merit: 1079
Gerald Davis
The ADL library isn't just randomly dying here. It happens at 200k accepted shares (this was easy for me to recognize since I have a 32 GH/s mining farm). I'm just pointing out my observation so hopefully it gets fixed.

This time.  Has it died @ 200K previously too?  Does it die at 200K each time? If it is a repeatable and verifiable bug that is more useful than just it dying once @ 200K (which may not have any significance).
full member
Activity: 174
Merit: 100
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).

Ha, +1, I also noticed this, didn't link it with the 200k shares though... ;-)
It's more likely to be simply that the ADL library gave up and now says it's not working.

Not really helpful, but you can check the API config command:
It will say (ADL=Y,ADL in use=N) if all cards are showing blank for the GPU/Fan/Temp values.
If any cards are still working (unexpected) then it will still say (ADL in use=Y)

Also, if you are using ANUBIS - does it have a log of the stats over time?
You should be able to tell exactly what and where if it logs all the stats (I don't know how much it logs)

ANUBIS doesn't log anything, and all the cards are still working and hashing away. It's just a bug that affects the display of those values and makes those specific values unretrievable via the API.
That's what I said Smiley

The ADL library that AMD wrote and cgminer uses is what controls GPU/Fan/Temp.
The ADL library is known to just give up and die sometimes.
cgminer keeps going but can no longer access that information ... that's why it's blank (like if you forget "export DISPLAY=:0" in linux)
You will find that anything in cgminer that requires ADL now no longer works.

That really means: if cgminer was given --auto-gpu or --auto-fan (or both) it is also time to stop and restart cgminer since it is no longer monitoring your fan and/or gpu.

The ADL library isn't just randomly dying here. It happens at 200k accepted shares (this was easy for me to recognize since I have a 32 GH/s mining farm). I'm just pointing out my observation so hopefully it gets fixed.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).

Ha, +1, I also noticed this, didn't link it with the 200k shares though... ;-)
It's more likely to be simply that the ADL library gave up and now says it's not working.

Not really helpful, but you can check the API config command:
It will say (ADL=Y,ADL in use=N) if all cards are showing blank for the GPU/Fan/Temp values.
If any cards are still working (unexpected) then it will still say (ADL in use=Y)

Also, if you are using ANUBIS - does it have a log of the stats over time?
You should be able to tell exactly what and where if it logs all the stats (I don't know how much it logs)

ANUBIS doesn't log anything, and all the cards are still working and hashing away. It's just a bug that affects the display of those values and makes those specific values unretrievable via the API.
That's what I said Smiley

The ADL library that AMD wrote and cgminer uses is what controls GPU/Fan/Temp.
The ADL library is known to just give up and die sometimes.
cgminer keeps going but can no longer access that information ... that's why it's blank (like if you forget "export DISPLAY=:0" in linux)
You will find that anything in cgminer that requires ADL now no longer works.

That really means: if cgminer was given --auto-gpu or --auto-fan (or both) it is also time to stop and restart cgminer since it is no longer monitoring your fan and/or gpu.
full member
Activity: 174
Merit: 100
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).

Ha, +1, I also noticed this, didn't link it with the 200k shares though... ;-)
It's more likely to be simply that the ADL library gave up and now says it's not working.

Not really helpful, but you can check the API config command:
It will say (ADL=Y,ADL in use=N) if all cards are showing blank for the GPU/Fan/Temp values.
If any cards are still working (unexpected) then it will still say (ADL in use=Y)

Also, if you are using ANUBIS - does it have a log of the stats over time?
You should be able to tell exactly what and where if it logs all the stats (I don't know how much it logs)

ANUBIS doesn't log anything, and all the cards are still working and hashing away. It's just a bug that affects the display of those values and makes those specific values unretrievable via the API.
sr. member
Activity: 308
Merit: 250
I would love to see an expanded version of this that goes down to 100 (or lower) memclock using latest version of cgminer, latest drivers, and SDK 2.1.
Here is some tests

You sir, are full of awesome. Will we be able to see the other test results soon?
Do you have a 5870 to test with?

@tenzor Were you running this on linux? And are you willing to share the script you used to generate these results?

Here is it
http://pastebin.com/gpFDyXef

requires screen

Copy it into cgminer's directory. Make sure you change values on config section. Results will appear in directory defined in "results" variable. Make sure it writable.
Cgminer always include default config, so rename it or remove. Config to run cgminer in "cgminer_config_path" var. Here is my config for device #0 (test0.conf)
Code:
{
"pools" : [
        {
                "url" : "http://pit.deepbit.net:8332",
                "user" : "XXX",
                "pass" : "XXX"
        }
],
"intensity" : "9",
"gpu-engine" : "0-930",
"gpu-fan" : "0-85",
"gpu-powertune" : "0",
"gpu-vddc" : "0.000",
"temp-cutoff" : "95",
"temp-overheat" : "85",
"temp-target" : "75",
"auto-fan" : true,
"expiry" : "120",
"gpu-threads" : "2",
"log" : "5",
"no-restart" : true,
"queue" : "1",
"retry-pause" : "5",
"scan-time" : "60",
"temp-hysteresis" : "3",
"api-listen" : true,
"donation" : "0.00",
"shares" : "0",
"kernel-path" : "/usr/local/bin"
}

sometimes X server hangs, in this case script will create file "reboot_required" in current dir, so you could use ither script to reboot automatically

First parameter is device number to run tests on. It 0 by default.

Also run another instances of cgminer befor running this script, because it will overwrite another GPU clock values.

Sometimes identical GPUs with same clocks produce different MH/s, so be careful comparing results from different GPU.

Results will appear in subfolders in json files. I wrote php script that draw graphs. http://dl.dropbox.com/u/569082/cgminer-tester.zip Put it into webserver webroot somewere, tests put in tests dir inside.


Currently I am testing 5850 all kernels 150-300 memclock. I expect results will be ready after 3 days
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).

Ha, +1, I also noticed this, didn't link it with the 200k shares though... ;-)
It's more likely to be simply that the ADL library gave up and now says it's not working.

Not really helpful, but you can check the API config command:
It will say (ADL=Y,ADL in use=N) if all cards are showing blank for the GPU/Fan/Temp values.
If any cards are still working (unexpected) then it will still say (ADL in use=Y)

Also, if you are using ANUBIS - does it have a log of the stats over time?
You should be able to tell exactly what and where if it logs all the stats (I don't know how much it logs)
sr. member
Activity: 349
Merit: 250
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).

Ha, +1, I also noticed this, didn't link it with the 200k shares though... ;-)
full member
Activity: 174
Merit: 100
Found a small bug, when cgminer passes 200k accepted shares in Windows 7 the GPU temp and fan speed columns go completely blank and are also unable to be accessed via the API (which I noticed vis ANUBIS).
hero member
Activity: 769
Merit: 500
I'll be taking an extended break from coding on cgminer shortly since most things are stable at the moment for my sanity.
This begins now and I have disabled all notifications from the forum and github so do not be surprised when I don't respond for many days. Email me if it's urgent but try to use the forums please as there are heaps of helpful people here. Thanks everyone for your understanding.

Take your time, enjoy your break and come back, when the fire is burning again Smiley.

Dia
sr. member
Activity: 392
Merit: 250
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?

What do you mean not accepting commands?

You can't change any parameter to any value?  If so that is something new but likely you mean "I am trying to raise voltage or select a clock way outside what bios allows" right? 

cgminer can't FORCE the card to do anything (not even return to stock clocks).  All it can do is ask.
cgminer: "card #1 please raise clock to 800 Mhz"
card #1: "request recevied"
card #1 (internally): fuck off cgminer I am tired.

Some ignore any change, even returning to stock. (excluding fan speed)
If I put in 725 engine in cgminer it displays back
Engine Clock: 157 Mhz
Memory Clock: 150 Mhz
Vddc:0.950 V


Vbs
hero member
Activity: 504
Merit: 500
Edit 3: Well, I took one idea from your code and created this one Cheesy.
Code:
	if((V[7].x == 0x136032edU) + (V[7].y == 0x136032edU))
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032edU) ? nonce.x : nonce.y;

Nice! Grin

Here is alternative solution that should have the same performance as yours (same min & max exec time, same alu count, etc):
Code:
	V[7] -= 0x136032EDu;
uint result = V[7].x ? 0u:nonce.x;
     result = V[7].y ? result:nonce.y;
if (result)
output[FOUND] = output[NFLAG & result] = result;

Would be nice to see if there's any significant difference between both on real GCN hardware! Smiley

---------------------------------
EDIT: Another similar performance solution:
Code:
	bool result = V[7].x == 0x136032EDu ? true:false;
    result = V[7].y == 0x136032EDu ? true:result;
if (result)
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032EDu) ? nonce.x : nonce.y;

full member
Activity: 210
Merit: 100
I'll be taking an extended break from coding on cgminer shortly since most things are stable at the moment for my sanity.
This begins now and I have disabled all notifications from the forum and github so do not be surprised when I don't respond for many days. Email me if it's urgent but try to use the forums please as there are heaps of helpful people here. Thanks everyone for your understanding.
Roger that. You do deserve to get some real life, have a great time.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I'll be taking an extended break from coding on cgminer shortly since most things are stable at the moment for my sanity.
This begins now and I have disabled all notifications from the forum and github so do not be surprised when I don't respond for many days. Email me if it's urgent but try to use the forums please as there are heaps of helpful people here. Thanks everyone for your understanding.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Trying to get familiar with some of the params I never use:

Code:
--scan-time|-s  Upper bound on time spent scanning current work, in seconds (default: 60)
--expiry|-E    Upper bound on how many seconds after getting work we consider a share from it stale (default: 120)

Does scan-time or expiry have any effect if pool is using LP? 
Would there be any advantage to setting it shorter when using p2pool for example (avg LP interval ~10 sec)?

Code:
--retry-pause|-R  Number of seconds to pause, between retries (default: 5)
I am assuming this refer to pool <-> miner communication not miner <-> API client communication. 
Or is it not used for pool mining and only used for solo/bitcoind mining?


scan time is set high intentionally with longpoll since longpoll tells the miner when to get new work. Setting it less than longpoll time will only make you throw out good work.
Expiry is irrelevant when you have submit stale enabled or the pool asks for submitold (as p2pool does).

Retry pause is between miner and pool after each communication failure. They really shouldn't happen at all when talking to a p2pool node running on the same machine but maybe if talking to a node elsewhere they might.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Trying to get familiar with some of the params I never use:

Code:
--scan-time|-s  Upper bound on time spent scanning current work, in seconds (default: 60)
--expiry|-E    Upper bound on how many seconds after getting work we consider a share from it stale (default: 120)

Does scan-time or expiry have any effect if pool is using LP? 
Would there be any advantage to setting it shorter when using p2pool for example (avg LP interval ~10 sec)?

Code:
--retry-pause|-R  Number of seconds to pause, between retries (default: 5)
I am assuming this refer to pool <-> miner communication not miner <-> API client communication. 
Or is it not used for pool mining and only used for solo/bitcoind mining?

full member
Activity: 210
Merit: 100
BTW, I've tried to change voltage with Trixx, but GPU-Z did not see the change.
Was accepted by the driver but refused by bios, I guess.
Precisely. The driver doesn't check whether or not the card actually honored the change requests and assumes that it did.
Programs that query the driver might get an incorrect answer. The only way to check the GPU status is to query the device, not the driver.

Instead of
- Hey, driver, what clocks is that hd6970 running at?
- Ummm... I told it to run at 950/300 so it's running at 950/300.    (the card refused to downclock the memory)
 *facepalm*
use
- hd6970, what clocks are you running at?
- 950/1370. The driver wanted 950/300 but I told it to shove off.
full member
Activity: 210
Merit: 100
ckolivas,

I'm running one 7970card with

 --auto-gpu --auto-fan --gpu-engine 450-1179 --gpu-memdiff -150 --gpu-powertune 20 -q -I 12 -k diakgcn -d 0 -v 2 -w 256

right now, I see Q: counter steadily increasing, is that something to worry about?

Also, what is the efficiency (E:) that you get in your Linux setup?

On win7, with Sapphire card I'm getting AT avg of 680-685Mh/s, efficiency swings between 75-85%.  Is that normal?

Also, I've noticed that sometimes one thread hash rate drops significantly and then the whole thing recovers to above 650 (for two).
Is that just sampling of context switching artifact?  Or my intensity is too high?

On Win7, I cannot run this card to 1200Mhz the way you run it on Linux.

Does anybody get better AT avg on Windows?
It's all explained in the README file:

Q:   The number of requested (Queued) work items from the pools
The sum of all work items cgminer requested from the pools has little choice but to go up.

E:   The Efficiency defined as number of shares returned / work item
In other words, efficiency is the ratio of accepted shares to all requested work items (E=A/Q)
Efficiency >=75% is nothing unusual, it means cgminer is able to process 3/4 of all the work items it requests.

The 10 second average hash rate (marked as 10s in cgminer main window) will oscillate quite a bit, you should only pay heed to the total average (avg).

Is that the latest version of cgminer you're running? 2.2.7 had a minor glitch where the hash rate would fall on LP requests.
This issue is fixed in version 2.3.1.
hero member
Activity: 769
Merit: 500
ckolivas,

I'm running one 7970card with

 --auto-gpu --auto-fan --gpu-engine 450-1179 --gpu-memdiff -150 --gpu-powertune 20 -q -I 12 -k diakgcn -d 0 -v 2 -w 256

right now, I see Q: counter steadily increasing, is that something to worry about?

Also, what is the efficiency (E:) that you get in your Linux setup?

On win7, with Sapphire card I'm getting AT avg of 680-685Mh/s, efficiency swings between 75-85%.  Is that normal?

Also, I've noticed that sometimes one thread hash rate drops significantly and then the whole thing recovers to above 650 (for two).
Is that just sampling of context switching artifact?  Or my intensity is too high?

On Win7, I cannot run this card to 1200Mhz the way you run it on Linux.

Does anybody get better AT avg on Windows?

Perhaps I can share some observations I made, because you use diakgcn (cool ^^) and I'm on Windows, too Smiley.
It's normal that Q raises, as this is a counter for the total requested work items. The efficieny swing is also something normal, because it's simply luck when a valid share is found ... so you can have minutes where no share is found, which lowers efficieny and the opposite is true, if you are lucky and find many shares in a short period of time, efficiency raises.

To diakgcn, I and Con can confirm your findings, the displayed hash-rate is not very stable with that kernel, so best thing you can do is to compare the final hash-rate, which is displayed after you quit CGMINER. I have a XFX Core Edition card an I'm not able to hit 1200 MHz without raisind the VCore via AfterBurner, so that seems to depend on the card and perhaps Windows in general (because of the GUI / drivers or whatever) allows a bit lower stable clocks, than Linux does.

Con always recommends to set -I 9 on Windows, as higher values will raise CPU usage upto 100% for one CPU core and I can confirm this. If CPU-usage or the power-usage doesn't matter, you can use whatever intensity you like and which works for you (-I 11 is what Con uses to bench kernels).

Dia
Jump to: