Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 630. (Read 5805546 times)

donator
Activity: 919
Merit: 1000
[...]
kano goes read the code to try get the numbers ...
while you did ^this, you maybe learned one obvious thing I don't get: where in the code is the nonce incremented?

I understood that FPGAs are counting the nonce range up in their FW, but I can't see this done for GPUs. Either I'm double-blind or using grep wrongly, but there is no nonce-increment in the source code, or? Undecided
sr. member
Activity: 308
Merit: 250
About graphs, what I did recently.
It seems to be wrong, because I ran 4 instances of cgminer with small interval and small memclock diff. And each next cgminer process overrides memclock for all GPUs even if you use -d 0

So, this does not work
Code:
./cgminer -d0 --remove-disabled --gpu-memclock=305
this work
Code:
./cgminer -d0 --remove-disabled --gpu-memclock=305,0,0,0

donator
Activity: 1218
Merit: 1079
Gerald Davis
... with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

Thanks for that.  Intensity is an often misunderstood value.  Hindsight being 20/20 a less "biased" name like "chunk-size" would have been better.  I mean biased as in "why would I want to be less intense I want the most hashez?"

Still it works out like this:
Code:
Millions of hashes calculated in each batch.

I          V1      V2      V3     V4
1   0.07   0.13   0.20   0.26
2   0.13   0.26   0.39   0.52
3   0.26   0.52   0.79   1.05
4   0.52   1.05   1.57   2.10
5   1.05   2.10   3.15   4.19
6   2.10   4.19   6.29   8.39
7   4.19   8.39  12.58  16.78
8   8.39  16.78  25.17  33.55
9  16.78  33.55  50.33  67.11
10  33.55  67.11 100.66 134.22
11  67.11 134.22 201.33 268.44
12 134.22 268.44 402.65 536.87
13 268.44 536.87 805.31 1073.74
14 536.87 1073.74 1610.61 2147.48
15 1073.74 2147.48 3221.23 4294.97
16 2147.48 4294.97 4294.97
17 4294.97

So it likely will be some time before we have processor capable of handling full nonce range (4.295 billion) in one pass efficiently.
hero member
Activity: 769
Merit: 500
Dia, have you though on changing (for example, in VECTORS2),
Code:
	V[7] ^= 0x136032edU;
bool result = V[7].x & V[7].y;
if (!result) {
if (!V[7].x)
output[FOUND] = output[NFLAG & nonce.x] = nonce.x;
if (!V[7].y)
output[FOUND] = output[NFLAG & nonce.y] = nonce.y;
}
to
Code:
	uint result = V[7].x == 0x136032edU ? nonce.x:0u;
result = V[7].y == 0x136032edU ? nonce.y:result;
if (result)
output[FOUND] = output[NFLAG & result] = result;

This should give a small boost.

Btw, the probability that both nonce.x and nonce.y are correct at the same time is 1/(2^64). This means, that on a 400MH/s card, this will probably happen once in every ~1463 years!  Grin

I came up with this (which is by far the fastest solution I found to date), and currently test how it's working (things are looking good so far).

Code:
	if((V[7].x == 0x136032ed) ^ (V[7].y == 0x136032ed)) {
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032ed) > (V[7].y == 0x136032ed) ? nonce.x : nonce.y;
}

This checks V[7].x and V[7].y for a valid nonce and will continue, if only one is true (as we discussed before, the chance that both are true is too small to lower efficiency). The next line checks if .x or .y contain the valid nonce and then writes it to output[NFLAG & nonce.x], which is a pseudo-random possition, so no need to ever use .y.

I will open a pull-request soon, to get it integrated into CGMINER and hope Con accepts it for diakgcn.

Edit: Your idea seems even faster, which is pretty cool if it works ... will play around with it.

Edit 2: I now see 1 problem with your code, the most used codepath is out of the if-clause and is more expensive on GCN in comparison to Cons current code or my posted version. My version uses the least number of instructions, if the if-clause is false ... your version would be the best, if there were more positive nonces, than negative ones!

Edit 3: Well, I took one idea from your code and created this one Cheesy.
Code:
	if((V[7].x == 0x136032edU) + (V[7].y == 0x136032edU))
output[FOUND] = output[NFLAG & nonce.x] = (V[7].x == 0x136032edU) ? nonce.x : nonce.y;

Edit 4: Created the pull-request as this looks like a great little change Smiley!

Dia
donator
Activity: 1218
Merit: 1079
Gerald Davis
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?

What do you mean not accepting commands?

You can't change any parameter to any value?  If so that is something new but likely you mean "I am trying to raise voltage or select a clock way outside what bios allows" right? 

cgminer can't FORCE the card to do anything (not even return to stock clocks).  All it can do is ask.
cgminer: "card #1 please raise clock to 800 Mhz"
card #1: "request recevied"
card #1 (internally): fuck off cgminer I am tired.
legendary
Activity: 1378
Merit: 1003
nec sine labore
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
Yes the CPU is sleeping a lot less with lower intensity as I said here:
Quote
When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I'll explain again I guess:
A work request is 2^32 hashes (a full nonce range)
It maybe could be done in 1 one long go or 2^32 tiny goes.
It is actually, of course, done somewhere in the middle of that.

kano goes read the code to try get the numbers ...

OK it looks like (though I could be wrong) that with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

This figure doesn't really matter if it's wrong, but how I've described how it works is correct.
Anyway assuming it is (2^(15+9)) or 16 million - it takes about 52ms for a single 6950 GPU to do that.
Thus every 52ms (if I had only one 6950) the CPU needs to setup and feed data into the GPU and then wait about 52ms to get the answer back, then setup the next 16 million hashes and feed them into the GPU ... over and again ... until it's processed the whole 2^32 hashes

If instead the intensity was 8, each setup would contain half the number of hashes (2^(15+8)) or 8 million, or approx 26ms so the CPU would have to setup work twice as often to complete the whole 2^32 nonce range - thus the CPU usage would increase since it's doing twice as many setups for the same amount of work - to do the complete 2^32 hashes

kano,

interesting numbers, in particular for p2pool, because we can now find out, given hashing power, which is max intensity we can use.

So a 300Mh/s card, at intensity 6, thakes takes around 6ms to do a batch and so wastes around 1% of its mining time when there are fast, 1 second long, rounds.

But if we say that median rounds are 5 seconds long, at intensity 6 we are wasting just 0.1% of our time.

Thanks for your explanation.

spiccioli
Vbs
hero member
Activity: 504
Merit: 500
Dia, have you though on changing (for example, in VECTORS2),
Code:
	V[7] ^= 0x136032edU;
bool result = V[7].x & V[7].y;
if (!result) {
if (!V[7].x)
output[FOUND] = output[NFLAG & nonce.x] = nonce.x;
if (!V[7].y)
output[FOUND] = output[NFLAG & nonce.y] = nonce.y;
}
to
Code:
	uint result = V[7].x == 0x136032edU ? nonce.x:0u;
result = V[7].y == 0x136032edU ? nonce.y:result;
if (result)
output[FOUND] = output[NFLAG & result] = result;

This should give a small boost.

Btw, the probability that both nonce.x and nonce.y are correct at the same time is 1/(2^64). This means, that on a 400MH/s card, this will probably happen once in every ~1463 years!  Grin
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
Yes the CPU is sleeping a lot less with lower intensity as I said here:
Quote
When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I'll explain again I guess:
A work request is 2^32 hashes (a full nonce range)
It maybe could be done in 1 one long go or 2^32 tiny goes.
It is actually, of course, done somewhere in the middle of that.

kano goes read the code to try get the numbers ...

OK it looks like (though I could be wrong) that with intensity 9 it divides the work into pieces of (2^(15+9) * vectors) hashes
So about 16 million hashes with -v 1 (there are other limitations it does to this number but it looks like that's maybe correct for me with -I 9)

This figure doesn't really matter if it's wrong, but how I've described how it works is correct.
Anyway assuming it is (2^(15+9)) or 16 million - it takes about 52ms for a single 6950 GPU to do that.
Thus every 52ms (if I had only one 6950) the CPU needs to setup and feed data into the GPU and then wait about 52ms to get the answer back, then setup the next 16 million hashes and feed them into the GPU ... over and again ... until it's processed the whole 2^32 hashes

If instead the intensity was 8, each setup would contain half the number of hashes (2^(15+8)) or 8 million, or approx 26ms so the CPU would have to setup work twice as often to complete the whole 2^32 nonce range - thus the CPU usage would increase since it's doing twice as many setups for the same amount of work - to do the complete 2^32 hashes
legendary
Activity: 1378
Merit: 1003
nec sine labore
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.

kano,

with lower intensity occupation of GPU is lower, so I was thinking that the thread serving a GPU with lower intensity had to be sleeping a lot more, while it is, instead, sleeping a lot less.

Code:
73.0 C  F: 24% (1335 RPM)  E: 800 MHz  M: 150 Mhz  V: 1.000V  A: 80% P: 0%
Last initialised: [2012-02-26 23:06:14]
Intensity: Dynamic (only one thread in use)
Thread 1: 299.8 Mh/s Enabled ALIVE

How many more hashes (or how longer does it keep counting) are counted by a GPU when intensity is increased by one?

spiccioli.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?
That would be precisely what 3 posts in a row just said.
sr. member
Activity: 392
Merit: 250
Some of my 5970's are not accepting any commands from cgminer. I had to use afterburner on about half my rigs. This is to be expected?
hero member
Activity: 769
Merit: 500
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
The answer is yes, it will allow you to send commands outside the bios range but the GPU can happily ignore them. It can do more than the bios range, but less than windows specific tools that bypass the driver and poke it directly.

He asked 2 questions and I answered the 2nd one :-P ...

Dia
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
The answer is yes, it will allow you to send commands outside the bios range but the GPU can happily ignore them. It can do more than the bios range, but less than windows specific tools that bypass the driver and poke it directly.
hero member
Activity: 769
Merit: 500
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?

The answer is no, it just can set what the ADL / the driver allows it to, which is more limited, what AfterBurner can offer.

Dia
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?
cgminer uses ADL (AMD Display Library, written by ... AMD)
If ADL can't do it, cgminer can't do it.
The expected answer to your question is 'no'
hero member
Activity: 531
Merit: 505
Can cgminer set GPU/MEM frequencies out of BIOS ranges? For example, something like MSI Afterburner in Windows can do?
hero member
Activity: 769
Merit: 500
I would love to see an expanded version of this that goes down to 100 (or lower) memclock using latest version of cgminer, latest drivers, and SDK 2.1.
Here is some tests


Wow, now that are very nice graphs ... it would be so cool, if you could add the diakgcn kernel into that (-k diakgcn). Not because I think it is faster, but it would be great to know how it performs in comparison to the other available kernels.

Dia
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
...
kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
...

spiccioli

Actually - logically as the intensity drops, the CPU must increase.

When the intensity goes down, it means the work is broken up into smaller pieces.
More pieces means more CPU processing and transferring for the same amount of hashes.

I run 2x6950 on a 3.07GHz i3 CPU running at 1.2GHz and -I 9 and get ~2% CPU (hmm 2 seems low - I would have guessed 5 ...)
(that % number means of a single core in linux coz if you e.g. have 2 cores - total is CPU 200%)
So it's not totally surprising getting high CPU use with 5 GPU's and low power CPU and lower intensity.
newbie
Activity: 26
Merit: 0
I would love to see an expanded version of this that goes down to 100 (or lower) memclock using latest version of cgminer, latest drivers, and SDK 2.1.
Here is some tests
http://i.imgur.com/TPRiol.png
You sir, are full of awesome. Will we be able to see the other test results soon?
Do you have a 5870 to test with?

@tenzor Were you running this on linux? And are you willing to share the script you used to generate these results?
legendary
Activity: 1378
Merit: 1003
nec sine labore
...
So either my CPU is too weak to handle 5 GPUs or cpu usage can be lower, but it does not depend on catalyst/amd sdk.

BTW I've tested catalyst 11.12 (11.11 gives problems installing) and 12.1 on xubuntu 11.10 64bit this morning with more or less same levels of CPU usage.

spiccioli
While it's running at 60% CPU ...
cat /proc/cpuinfo

Note it says CPU MHz 800.00 so I think I've being chasing the wrong problem for two days, if CPU is at 2.8Ghz cgminer uses around 10% of CPU Smiley

Thanks for your help!

spiccioli.

kano,

intensity has an even bigger influence than CPU speed, at intensity 'd', same 800 MHz CPU speed, cgminer uses around 50% CPU time

Code:
top - 23:00:10 up  3:39,  2 users,  load average: 0.06, 0.03, 0.05
Tasks: 145 total,   1 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.2%us,  8.5%sy,  0.9%ni, 78.2%id,  2.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4055036k total,  1491940k used,  2563096k free,    52424k buffers
Swap:  4187132k total,        0k used,  4187132k free,   418096k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17249 user      20   0  515m 134m  37m S 48.9  3.4  11:41.81 cgminer
 1307 user      20   0  175m 106m 3596 S  3.9  2.7   5:34.95 python
18999 user      20   0 19348 1292  924 R  2.0  0.0   0:00.01 top
    1 root      20   0 24000 2104 1292 S  0.0  0.1   0:01.17 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.85 ksoftirqd/0
    5 root      20   0     0    0    0 S  0.0  0.0   0:00.22 kworker/u:0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0

spiccioli
Jump to: