Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 628. (Read 5805971 times)

-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I'll be taking an extended break from coding on cgminer shortly since most things are stable at the moment for my sanity.
This begins now and I have disabled all notifications from the forum and github so do not be surprised when I don't respond for many days. Email me if it's urgent but try to use the forums please as there are heaps of helpful people here. Thanks everyone for your understanding.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Trying to get familiar with some of the params I never use:

Code:
--scan-time|-s  Upper bound on time spent scanning current work, in seconds (default: 60)
--expiry|-E    Upper bound on how many seconds after getting work we consider a share from it stale (default: 120)

Does scan-time or expiry have any effect if pool is using LP? 
Would there be any advantage to setting it shorter when using p2pool for example (avg LP interval ~10 sec)?

Code:
--retry-pause|-R  Number of seconds to pause, between retries (default: 5)
I am assuming this refer to pool <-> miner communication not miner <-> API client communication. 
Or is it not used for pool mining and only used for solo/bitcoind mining?


scan time is set high intentionally with longpoll since longpoll tells the miner when to get new work. Setting it less than longpoll time will only make you throw out good work.
Expiry is irrelevant when you have submit stale enabled or the pool asks for submitold (as p2pool does).

Retry pause is between miner and pool after each communication failure. They really shouldn't happen at all when talking to a p2pool node running on the same machine but maybe if talking to a node elsewhere they might.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Trying to get familiar with some of the params I never use:

Code:
--scan-time|-s  Upper bound on time spent scanning current work, in seconds (default: 60)
--expiry|-E    Upper bound on how many seconds after getting work we consider a share from it stale (default: 120)

Does scan-time or expiry have any effect if pool is using LP? 
Would there be any advantage to setting it shorter when using p2pool for example (avg LP interval ~10 sec)?

Code:
--retry-pause|-R  Number of seconds to pause, between retries (default: 5)
I am assuming this refer to pool <-> miner communication not miner <-> API client communication. 
Or is it not used for pool mining and only used for solo/bitcoind mining?

full member
Activity: 210
Merit: 100
BTW, I've tried to change voltage with Trixx, but GPU-Z did not see the change.
Was accepted by the driver but refused by bios, I guess.
Precisely. The driver doesn't check whether or not the card actually honored the change requests and assumes that it did.
Programs that query the driver might get an incorrect answer. The only way to check the GPU status is to query the device, not the driver.

Instead of
- Hey, driver, what clocks is that hd6970 running at?
- Ummm... I told it to run at 950/300 so it's running at 950/300.    (the card refused to downclock the memory)
 *facepalm*
use
- hd6970, what clocks are you running at?
- 950/1370. The driver wanted 950/300 but I told it to shove off.
full member
Activity: 210
Merit: 100
ckolivas,

I'm running one 7970card with

 --auto-gpu --auto-fan --gpu-engine 450-1179 --gpu-memdiff -150 --gpu-powertune 20 -q -I 12 -k diakgcn -d 0 -v 2 -w 256

right now, I see Q: counter steadily increasing, is that something to worry about?

Also, what is the efficiency (E:) that you get in your Linux setup?

On win7, with Sapphire card I'm getting AT avg of 680-685Mh/s, efficiency swings between 75-85%.  Is that normal?

Also, I've noticed that sometimes one thread hash rate drops significantly and then the whole thing recovers to above 650 (for two).
Is that just sampling of context switching artifact?  Or my intensity is too high?

On Win7, I cannot run this card to 1200Mhz the way you run it on Linux.

Does anybody get better AT avg on Windows?
It's all explained in the README file:

Q:   The number of requested (Queued) work items from the pools
The sum of all work items cgminer requested from the pools has little choice but to go up.

E:   The Efficiency defined as number of shares returned / work item
In other words, efficiency is the ratio of accepted shares to all requested work items (E=A/Q)
Efficiency >=75% is nothing unusual, it means cgminer is able to process 3/4 of all the work items it requests.

The 10 second average hash rate (marked as 10s in cgminer main window) will oscillate quite a bit, you should only pay heed to the total average (avg).

Is that the latest version of cgminer you're running? 2.2.7 had a minor glitch where the hash rate would fall on LP requests.
This issue is fixed in version 2.3.1.
hero member
Activity: 772
Merit: 500
ckolivas,

I'm running one 7970card with

 --auto-gpu --auto-fan --gpu-engine 450-1179 --gpu-memdiff -150 --gpu-powertune 20 -q -I 12 -k diakgcn -d 0 -v 2 -w 256

right now, I see Q: counter steadily increasing, is that something to worry about?

Also, what is the efficiency (E:) that you get in your Linux setup?

On win7, with Sapphire card I'm getting AT avg of 680-685Mh/s, efficiency swings between 75-85%.  Is that normal?

Also, I've noticed that sometimes one thread hash rate drops significantly and then the whole thing recovers to above 650 (for two).
Is that just sampling of context switching artifact?  Or my intensity is too high?

On Win7, I cannot run this card to 1200Mhz the way you run it on Linux.

Does anybody get better AT avg on Windows?

Perhaps I can share some observations I made, because you use diakgcn (cool ^^) and I'm on Windows, too Smiley.
It's normal that Q raises, as this is a counter for the total requested work items. The efficieny swing is also something normal, because it's simply luck when a valid share is found ... so you can have minutes where no share is found, which lowers efficieny and the opposite is true, if you are lucky and find many shares in a short period of time, efficiency raises.

To diakgcn, I and Con can confirm your findings, the displayed hash-rate is not very stable with that kernel, so best thing you can do is to compare the final hash-rate, which is displayed after you quit CGMINER. I have a XFX Core Edition card an I'm not able to hit 1200 MHz without raisind the VCore via AfterBurner, so that seems to depend on the card and perhaps Windows in general (because of the GUI / drivers or whatever) allows a bit lower stable clocks, than Linux does.

Con always recommends to set -I 9 on Windows, as higher values will raise CPU usage upto 100% for one CPU core and I can confirm this. If CPU-usage or the power-usage doesn't matter, you can use whatever intensity you like and which works for you (-I 11 is what Con uses to bench kernels).

Dia
donator
Activity: 919
Merit: 1000
[...]
kano goes read the code to try get the numbers ...
while you did ^this, you maybe learned one obvious thing I don't get: where in the code is the nonce incremented?

I understood that FPGAs are counting the nonce range up in their FW, but I can't see this done for GPUs. Either I'm double-blind or using grep wrongly, but there is no nonce-increment in the source code, or? Undecided
sr. member
Activity: 308
Merit: 250
About graphs, what I did recently.
It seems to be wrong, because I ran 4 instances of cgminer with small interval and small memclock diff. And each next cgminer process overrides memclock for all GPUs even if you use -d 0

So, this does not work
Code:
./cgminer -d0 --remove-disabled --gpu-memclock=305
this work
Code:
./cgminer -d0 --remove-disabled --gpu-memclock=305,0,0,0

donator
Activity: 1218
Merit: 1079
Gerald Davis
... with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

Thanks for that.  Intensity is an often misunderstood value.  Hindsight being 20/20 a less "biased" name like "chunk-size" would have been better.  I mean biased as in "why would I want to be less intense I want the most hashez?"

Still it works out like this:
Code:
Millions of hashes calculated in each batch.

I          V1      V2      V3     V4
1   0.07   0.13   0.20   0.26
2   0.13   0.26   0.39   0.52
3   0.26   0.52   0.79   1.05
4   0.52   1.05   1.57   2.10
5   1.05   2.10   3.15   4.19
6   2.10   4.19   6.29   8.39
7   4.19   8.39  12.58  16.78
8   8.39  16.78  25.17  33.55
9  16.78  33.55  50.33  67.11
10  33.55  67.11 100.66 134.22
11  67.11 134.22 201.33 268.44
12 134.22 268.44 402.65 536.87
13 268.44 536.87 805.31 1073.74
14 536.87 1073.74 1610.61 2147.48
15 1073.74 2147.48 3221.23 4294.97
16 2147.48 4294.97 4294.97
17 4294.97

So it likely will be some time before we have processor capable of handling full nonce range (4.295 billion) in one pass efficiently.
hero member
Activity: 772
Merit: 500
Dia, have you though on changing (for example, in VECTORS2),
Code:
	V[7] ^= 0x136032edU;
bool result = V[7].x & V[7].y;
if (!result) {
if (!V[7].x)
output[FOUND] = output[NFLAG & nonce.x] = nonce.x;
if (!V[7].y)
output[FOUND] = output[NFLAG & nonce.y] = nonce.y;
}
to
Code:
	uint result = V[7].x == 0x136032edU ? nonce.x:0u;
result = V[7].y == 0x136032edU ? nonce.y:result;
if (result)
output[FOUND] = output[NFLAG & result] = result;

This should give a small boost.

Btw, the probability that both nonce.x and nonce.y are correct at the same time is 1/(2^64). This means, that on a 400MH/s card, this will probably happen once in every ~1463 years!  Grin

I came up with this (which is by far the fastest solution I found to date), and currently test how it's working (things are looking good so far).

Code:
	if((V[7].x == 0x136032ed) ^ (V[7].y == 0x136032ed)) {
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032ed) > (V[7].y == 0x136032ed) ? nonce.x : nonce.y;
}

This checks V[7].x and V[7].y for a valid nonce and will continue, if only one is true (as we discussed before, the chance that both are true is too small to lower efficiency). The next line checks if .x or .y contain the valid nonce and then writes it to output[NFLAG & nonce.x], which is a pseudo-random possition, so no need to ever use .y.

I will open a pull-request soon, to get it integrated into CGMINER and hope Con accepts it for diakgcn.

Edit: Your idea seems even faster, which is pretty cool if it works ... will play around with it.

Edit 2: I now see 1 problem with your code, the most used codepath is out of the if-clause and is more expensive on GCN in comparison to Cons current code or my posted version. My version uses the least number of instructions, if the if-clause is false ... your version would be the best, if there were more positive nonces, than negative ones!

Edit 3: Well, I took one idea from your code and created this one Cheesy.
Code:
	if((V[7].x == 0x136032edU) + (V[7].y == 0x136032edU))
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032edU) ? nonce.x : nonce.y;

Edit 4: Created the pull-request as this looks like a great little change Smiley!

Dia
donator
Activity: 1218
Merit: 1079
Gerald Davis
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?

What do you mean not accepting commands?

You can't change any parameter to any value?  If so that is something new but likely you mean "I am trying to raise voltage or select a clock way outside what bios allows" right? 

cgminer can't FORCE the card to do anything (not even return to stock clocks).  All it can do is ask.
cgminer: "card #1 please raise clock to 800 Mhz"
card #1: "request recevied"
card #1 (internally): fuck off cgminer I am tired.
legendary
Activity: 1379
Merit: 1003
nec sine labore
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
Yes the CPU is sleeping a lot less with lower intensity as I said here:
Quote
When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I'll explain again I guess:
A work request is 2^32 hashes (a full nonce range)
It maybe could be done in 1 one long go or 2^32 tiny goes.
It is actually, of course, done somewhere in the middle of that.

kano goes read the code to try get the numbers ...

OK it looks like (though I could be wrong) that with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

This figure doesn't really matter if it's wrong, but how I've described how it works is correct.
Anyway assuming it is (2^(15+9)) or 16 million - it takes about 52ms for a single 6950 GPU to do that.
Thus every 52ms (if I had only one 6950) the CPU needs to setup and feed data into the GPU and then wait about 52ms to get the answer back, then setup the next 16 million hashes and feed them into the GPU ... over and again ... until it's processed the whole 2^32 hashes

If instead the intensity was 8, each setup would contain half the number of hashes (2^(15+8)) or 8 million, or approx 26ms so the CPU would have to setup work twice as often to complete the whole 2^32 nonce range - thus the CPU usage would increase since it's doing twice as many setups for the same amount of work - to do the complete 2^32 hashes

kano,

interesting numbers, in particular for p2pool, because we can now find out, given hashing power, which is max intensity we can use.

So a 300Mh/s card, at intensity 6, thakes takes around 6ms to do a batch and so wastes around 1% of its mining time when there are fast, 1 second long, rounds.

But if we say that median rounds are 5 seconds long, at intensity 6 we are wasting just 0.1% of our time.

Thanks for your explanation.

spiccioli
Vbs
hero member
Activity: 504
Merit: 500
Dia, have you though on changing (for example, in VECTORS2),
Code:
	V[7] ^= 0x136032edU;
bool result = V[7].x & V[7].y;
if (!result) {
if (!V[7].x)
output[FOUND] = output[NFLAG & nonce.x] = nonce.x;
if (!V[7].y)
output[FOUND] = output[NFLAG & nonce.y] = nonce.y;
}
to
Code:
	uint result = V[7].x == 0x136032edU ? nonce.x:0u;
result = V[7].y == 0x136032edU ? nonce.y:result;
if (result)
output[FOUND] = output[NFLAG & result] = result;

This should give a small boost.

Btw, the probability that both nonce.x and nonce.y are correct at the same time is 1/(2^64). This means, that on a 400MH/s card, this will probably happen once in every ~1463 years!  Grin
legendary
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
Yes the CPU is sleeping a lot less with lower intensity as I said here:
Quote
When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I'll explain again I guess:
A work request is 2^32 hashes (a full nonce range)
It maybe could be done in 1 one long go or 2^32 tiny goes.
It is actually, of course, done somewhere in the middle of that.

kano goes read the code to try get the numbers ...

OK it looks like (though I could be wrong) that with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

This figure doesn't really matter if it's wrong, but how I've described how it works is correct.
Anyway assuming it is (2^(15+9)) or 16 million - it takes about 52ms for a single 6950 GPU to do that.
Thus every 52ms (if I had only one 6950) the CPU needs to setup and feed data into the GPU and then wait about 52ms to get the answer back, then setup the next 16 million hashes and feed them into the GPU ... over and again ... until it's processed the whole 2^32 hashes

If instead the intensity was 8, each setup would contain half the number of hashes (2^(15+8)) or 8 million, or approx 26ms so the CPU would have to setup work twice as often to complete the whole 2^32 nonce range - thus the CPU usage would increase since it's doing twice as many setups for the same amount of work - to do the complete 2^32 hashes
legendary
Activity: 1379
Merit: 1003
nec sine labore
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?
That would be precisely what 3 posts in a row just said.
sr. member
Activity: 392
Merit: 250
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?
hero member
Activity: 772
Merit: 500
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
The answer is yes, it will allow you to send commands outside the bios range but the GPU can happily ignore them. It can do more than the bios range, but less than windows specific tools that bypass the driver and poke it directly.

He asked 2 questions and I answered the 2nd one :-P ...

Dia
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
The answer is yes, it will allow you to send commands outside the bios range but the GPU can happily ignore them. It can do more than the bios range, but less than windows specific tools that bypass the driver and poke it directly.
hero member
Activity: 772
Merit: 500
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
Jump to: