Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 633. (Read 5805546 times)

-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Hi,

I don't know if this is the correct forum where ask my question, but since it is cgminer related I start from here Smiley

I've got a five 5870 rig I've just set up, it runs xubuntu 11.10 with catalyst 11.8 (the one that installs using xubuntu proprietary drivers applet) and I've installed AMD SDK 2.4.

If I don't use GPU_USE_SYNC_OBJECTS=1  CPU usage goes to 90%, If I use it, on the other hand, cgminer uses from 20 to 35% of CPU and CPU is a sempron 2.8 Ghz

Dud catalyst driver. Use 11.6 or 11.11+ on linux.
legendary
Activity: 1378
Merit: 1003
nec sine labore
Hi,

I don't know if this is the correct forum where ask my question, but since it is cgminer related I start from here Smiley

I've got a five 5870 rig I've just set up, it runs xubuntu 11.10 with catalyst 11.8 (the one that installs using xubuntu proprietary drivers applet) and I've installed AMD SDK 2.4.

If I don't use GPU_USE_SYNC_OBJECTS=1  CPU usage goes to 90%, If I use it, on the other hand, cgminer uses from 20 to 35% of CPU and CPU is a sempron 2.8 Ghz

This is with intensity 5 and standard 2 threads per GPU (a single thread per GPU makes little difference).

My question is: how much CPU should cgminer use? How much uses it on your multi-gpu rigs?

TIA.

spiccioli.

ps. my other two rigs have 2 GPUs  each and with sync objects use very little cpu (less than 5%).

This is cgminer -ndevs output, note it says AMD SDK 2.5 but I've installed 2.4 following kanoi  recipe
https://github.com/kanoi/linux-usb-cgminer/blob/master/linux-usb-cgminer

Code:
$ cgminer/cgminer -ndevs
[2012-02-25 19:38:05] CL Platform 0 vendor: Advanced Micro Devices, Inc.
[2012-02-25 19:38:05] CL Platform 0 name: AMD Accelerated Parallel Processing
[2012-02-25 19:38:05] CL Platform 0 version: OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)
[2012-02-25 19:38:05] Platform 0 devices: 5
[2012-02-25 19:38:05] GPU 0 ATI Radeon HD 5800 Series hardware monitoring enabled
[2012-02-25 19:38:05] Setting GPU 0 engine clock to 830
[2012-02-25 19:38:05] Setting GPU 0 memory clock to 160
[2012-02-25 19:38:05] Setting GPU 0 voltage to 1.050
[2012-02-25 19:38:05] GPU 1 ATI Radeon HD 5800 Series hardware monitoring enabled
[2012-02-25 19:38:05] Setting GPU 1 engine clock to 830
[2012-02-25 19:38:05] Setting GPU 1 memory clock to 160
[2012-02-25 19:38:05] Setting GPU 1 voltage to 1.050
[2012-02-25 19:38:05] GPU 2 ATI Radeon HD 5800 Series hardware monitoring enabled
[2012-02-25 19:38:05] Setting GPU 2 engine clock to 830
[2012-02-25 19:38:05] Setting GPU 2 memory clock to 160
[2012-02-25 19:38:05] Setting GPU 2 voltage to 1.050
[2012-02-25 19:38:05] GPU 3 ATI Radeon HD 5800 Series hardware monitoring enabled
[2012-02-25 19:38:05] Setting GPU 3 engine clock to 830
[2012-02-25 19:38:05] Setting GPU 3 memory clock to 160
[2012-02-25 19:38:05] Setting GPU 3 voltage to 1.050
[2012-02-25 19:38:05] GPU 4 ATI Radeon HD 5800 Series hardware monitoring enabled
[2012-02-25 19:38:05] Setting GPU 4 engine clock to 830
[2012-02-25 19:38:05] Setting GPU 4 memory clock to 160
[2012-02-25 19:38:05] Setting GPU 4 voltage to 1.050
[2012-02-25 19:38:05] 5 GPU devices max detected
hero member
Activity: 769
Merit: 500
I currently use -I 8, too looks good so far.
donator
Activity: 1218
Merit: 1079
Gerald Davis
New release: Version 2.2.7 - February 20, 2012
......
reject ratio higher for me, about 3% instead 0,5% at 2.1.2

my conf.
p2pool 462b252 multi merged mining
bitcoind 0.6
atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)

Quote
Version 2.3.1 - February 24, 2012
2.3.1-2
reject ratio still about 3% for me...
cgminer supports the SUBMITOLD extension now and p2pool is telling cgminer to submit the stale shares. So yep, it's working.

I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right?
The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).

I found 1 intensity lower than normal works well.  I run 5970s @ Intensity 9.  I use Intensity 8 for p2pool.
Also make sure queue and threads per GPU are 1.  With LP time of 10 sec unless your per thread hashrate is > 430MH/s using multiple threads and a deep queue is essentially useless.
legendary
Activity: 916
Merit: 1003
I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right?
The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).

Dia

I'm with p2pool and I've simply left the -I switch off entirely to allow dynamic intensity.  With dynamic intensities running from 4-6, I've found almost no performance benefit to manually setting it higher than that (at least in my setup).
hero member
Activity: 769
Merit: 500
I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right?
The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).
Yes to the first question, README to the second.

You Con rock Smiley!
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right?
The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).
Yes to the first question, README to the second.
hero member
Activity: 769
Merit: 500
New release: Version 2.2.7 - February 20, 2012
......
reject ratio higher for me, about 3% instead 0,5% at 2.1.2

my conf.
p2pool 462b252 multi merged mining
bitcoind 0.6
atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)

Quote
Version 2.3.1 - February 24, 2012
2.3.1-2
reject ratio still about 3% for me...
cgminer supports the SUBMITOLD extension now and p2pool is telling cgminer to submit the stale shares. So yep, it's working.

I'm currently playing around with p2pool, too ... so ne need to add --submit-stale as this is forced, if needed via SUBMITOLD, right?
The LPs occur quite often for p2pool, so what would you suggest as a good intensity, perhaps in relation to MH/s ... could that be an idea to let CGMINER compute the best value for -I with p2pool (and I'm not talking about the normal -I d switch).

Dia
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
New release: Version 2.2.7 - February 20, 2012
......
reject ratio higher for me, about 3% instead 0,5% at 2.1.2

my conf.
p2pool 462b252 multi merged mining
bitcoind 0.6
atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)

Quote
Version 2.3.1 - February 24, 2012
2.3.1-2
reject ratio still about 3% for me...
cgminer supports the SUBMITOLD extension now and p2pool is telling cgminer to submit the stale shares. So yep, it's working.
hero member
Activity: 558
Merit: 500
   
[SOLVED] https://bitcointalksearch.org/topic/solved-aticonfig-this-program-must-be-run-as-root-when-no-x-server-is-active-22554



Don't see temperature etc...

AMD 2.4 SDK, ubuntu 11.04

Code:
------------------------------------------------------------------------
cgminer 2.3.1
------------------------------------------------------------------------

Configuration Options Summary:

  OpenCL...............: FOUND. GPU mining support enabled
  ADL..................: SDK found, GPU monitoring support enabled

  BitForce.FPGAs.......: Disabled
  Icarus.FPGAs.........: Disabled

  CPU Mining...........: Disabled

Compilation............: make (or gmake)
  CPPFLAGS.............:
  CFLAGS...............: -O2 -Wall -march=native
  LDFLAGS..............:  -lpthread
  LDADD................: -ldl -lcurl   compat/jansson/libjansson.a -lpthread -lOpenCL -lncurses   -lm

Installation...........: make install (as root if needed, with 'su' or 'sudo')
  prefix...............: /usr/local


In miner when I see only

Code:
GPU 0: 169.2 / 181.7 Mh/s | A:2  R:0  HW:0  U:10.48/m  I:10
Last initialised: [2012-02-25 13:39:33]
Intensity: 10
Thread 0: 170.6 Mh/s Enabled ALIVE

Ed
member
Activity: 69
Merit: 10
New release: Version 2.2.7 - February 20, 2012
......
reject ratio higher for me, about 3% instead 0,5% at 2.1.2

my conf.
p2pool 462b252 multi merged mining
bitcoind 0.6
atiumdag 8.920.0.0 (Catalyst 11.12) / Win7 64
OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)

Quote
Version 2.3.1 - February 24, 2012
2.3.1-2
reject ratio still about 3% for me...
omo
full member
Activity: 147
Merit: 100
sometimes cgminer display chaios info(for ex. the GPU1 line below)
also the info "^[^B, sleeping for 30s" :

Code:
 cgminer version 2.3.1 - Started: [2012-02-24 23:38:00]
--------------------------------------------------------------------------------
 (5s):336.42(avg):472.4 Mh/s | Q:77450 A:43434 R:39  HW:0  E:56%  U:6.59/m
 TQ: 33 STT 44  S: 30  DW: 1911  NB: 80  LW: 69  GF: 27  RF: 11
 Connected to http://mmrpc.bitparking.com:80/ with LP as user ****
 Block: 0000040d77cbcf9fb906c8b45953feef...  Started: [10:32:52]
--------------------------------------------------------------------------------
 [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit
 GPU 0:  60.05  578    | 105.45105.0Mh/s | A: 9878R: 7 HW:0 U:1.50/m I: 3
 GPU 1:  7765C 29469PM | 372.65367.44h/s | A:33556R:32 HW:0 U:5.09/m I: 9
--------------------------------------------------------------------------------

 [2012-02-25 10:35:46] Accepted 00000000.b7ef9df9.0dcb577c GPU 1 thread 3 pool 0
 ^[^B, sleeping for 30s
 [2012-02-25 10:36:06] Accepted 00000000.0428b7b0.81fe4c1c GPU 1 thread 2 pool 0
 [2012-02-25 10:36:08] Accepted 00000000.ea1491ca.ebef84ea GPU 1 thread 3 pool 0
 [2012-02-25 10:36:13] Accepted 00000000.8f4412eb.39157414 GPU 1 thread 2 pool 0
 [2012-02-25 10:36:18] Accepted 00000000.f0ae9070.23e80435 GPU 1 thread 3 pool 0
 ^[^B, sleeping for 30s
 [2012-02-25 10:36:37] Accepted 00000000.e112fd60.55ec2a41 GPU 0 thread 0 pool 0
 [2012-02-25 10:36:50] Accepted 00000000.cd40f348.5ed1d95e GPU 1 thread 3 pool 0
 ^[^B, sleeping for 30s
 [2012-02-25 10:37:02] Accepted 00000000.0915ea82.c0bdaab4 GPU 1 thread 2 pool 0
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Hi Ckolivas,

Downloaded the new version, let it run for a while while I was watching some videos on my pc (Windows XP), running the new version with Dynamic Intensity (1 thread automatically disabled). Then I changed the Intensity to 8 (running a 5800) and the following weird message came up:

G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d  R 1: b1e  HiWn:g0
 re  U-:e3n.a7b1l/emd  I
8
Yeah the curses interface just scrambles output occasionally. Harmless.
hero member
Activity: 868
Merit: 1000
Hi Ckolivas,

Downloaded the new version, let it run for a while while I was watching some videos on my pc (Windows XP), running the new version with Dynamic Intensity (1 thread automatically disabled). Then I changed the Intensity to 8 (running a 5800) and the following weird message came up:

G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d  R 1: b1e  HiWn:g0
 re  U-:e3n.a7b1l/emd  I
8

The intensity did set to 8, so no problems there, might just be cosmetic, thought I would put it out here anyway

This is a dump of the entire screen:

 cgminer version 2.3.1 - Started: [2012-02-24 23:13:58]
--------------------------------------------------------------------------------
 (5s):282.2 (avg):280.3 Mh/s | Q:1541  A:583  R:1  HW:0  E:38%  U:3.70/m
 TQ: 2  ST: 4  SS: 0  DW: 84  NB: 12  LW: 0  GF: 3  RF: 0
 Connected to http://mine2.btcguild.com:8332 with LP as user
 Block: 00000a7c9a40539601dd382d3a7d13a0...  Started: [01:35:44]
--------------------------------------------------------------------------------
 [P]ool management [G]PU management ettings [D]isplay options [Q]uit
 GPU 0:  72.0C 2757RPM | 284.0/280.3Mh/s | A:583 R:1 HW:0 U:  3.70/m I: 8
--------------------------------------------------------------------------------

8
Intensity on gpu 0 set to 8
G[P2U0 01:2 2-8002.-92 /5 2 0810:.530 M:h0/4s] | T Ah:r5e7a9d  R 1: b1e  HiWn:g0
 re  U-:e3n.a7b1l/emd  I
8
72.0 C  F: 65% (2758 RPM)  E: 900 MHz  M: 800 Mhz  V: 1.163V  A: 98%  P: 0%
Last initialised: [2012-02-24 23:14:03]
Intensity: 8
Thread 0: 282.8 Mh/s Enabled ALIVE
Thread 1: 2.0 Mh/s Enabled ALIVE

[E]nable [D]isable ntensity [R]estart GPU [C]hange settings

Brat
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
min(x,y) http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/commonMin.html
gets implemented low-level as
Code:
w: MIN_UINT    R0.w,  R0.x,  PV1350.y
, which *should*  (I know, AMD... Roll Eyes) be rather stable. The big problem with the alternative (&) is the huge number of false positives, since it's bitwise, like 01010011 & 10101100 = 00000000, which is bad for the branch predictor. I'm testing now with a conservative approach (just this one change from default),
Code:
#elif defined VECTORS2
bool result = min(W[117].x,W[117].y);
if (!result) {
if (!W[117].x)
output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
if (!W[117].y)
output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
}
and got a slight (3~4MH/s) increase (5850, SDK 2.5 from Cat 11.11).
You can do the maths on false positives. You're greatly exaggerating the "HUGE NUMBER". It's about 1 share for 1 false positive. More so on 4 vectors (but no one uses them). That is not remotely common...

Increase eh?

Call me sceptical to the core.

EDIT: I will look into it, but I'm so terrified of unintentionally breaking shit like I did last time. It was in this code specifically where the slowdown was, so you can imagine why I'm so resistant.
Vbs
hero member
Activity: 504
Merit: 500
Been testing some changes on phatk with the KernelAnalyzer and my own personal testing.

Using a VECTORS2 example,
Code:
bool result = W[117].x & W[117].y;

gives a lot of false positives, changing it to
Code:
bool result = min(W[117].x,W[117].y);

is guaranteed to give yummy results!  Grin

(same ALU #ops and fetch, no false positives on the next 'if')  Cool
See now this is dangerous. Do you REALLY  know how fast the "min" function is on all SDKs? Don't expect AMD to do the right thing and to guarantee it's as fast as &.

min(x,y) http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/commonMin.html
gets implemented low-level as
Code:
w: MIN_UINT    R0.w,  R0.x,  PV1350.y
, which *should*  (I know, AMD... Roll Eyes) be rather stable. The big problem with the alternative (&) is the huge number of false positives, since it's bitwise, like 01010011 & 10101100 = 00000000, which is bad for the branch predictor. I'm testing now with a conservative approach (just this one change from default),
Code:
#elif defined VECTORS2
bool result = min(W[117].x,W[117].y);
if (!result) {
if (!W[117].x)
output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
if (!W[117].y)
output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
}
and got a slight (3~4MH/s) increase (5850, SDK 2.5 from Cat 11.11).
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
SDK 2.4:
GPU 1:  51.5C 1569RPM | 375.7/375.7Mh/s | A: 98 R:0 HW:0 U:  4.86/m I:10
GPU 2:  55.0C 1569RPM | 375.7/375.7Mh/s | A: 97 R:0 HW:0 U:  4.81/m I:10

SDK 2.1:
GPU 0:  82.5C 3840RPM | 375.6/375.4Mh/s | A:457 R:2 HW:0 U: 5.27/m I:10
GPU 1:  82.5C 3840RPM | 375.4/375.4Mh/s | A:477 R:0 HW:0 U: 5.50/m I:10
So it seems that 2.4 is very slightly better at 300 memclocks and I 10, 1 thread.

I would say the difference is below noise levels, so I would say they perform identically on that hardware/software combo.
member
Activity: 121
Merit: 10
Can you try 950 / 300 on SDK 2.1 and 2.4 and see what the difference is ( make sure to delete the bins etc. ) ?

Maybe also try 960 core / 300 memory ?

What OS btw ?

Thanks !

Sorry, can't. Cards are on different computer, and the 5870 on the sdk 2.4 machine is on an extender, which slows down the hashrate somewhat.

What I can do however, is give you the difference between a 5970 @ 810/300 on 2.4 and a 5970 at the same clocks on 2.1.

SDK 2.4:
GPU 1:  51.5C 1569RPM | 375.7/375.7Mh/s | A: 98 R:0 HW:0 U:  4.86/m I:10
GPU 2:  55.0C 1569RPM | 375.7/375.7Mh/s | A: 97 R:0 HW:0 U:  4.81/m I:10

SDK 2.1:
GPU 0:  82.5C 3840RPM | 375.6/375.4Mh/s | A:457 R:2 HW:0 U: 5.27/m I:10
GPU 1:  82.5C 3840RPM | 375.4/375.4Mh/s | A:477 R:0 HW:0 U: 5.50/m I:10

You can multiply that by 960/810 or 950/810 to get a good estimate of a 5870's performance at 960 and 950 clocks, respectively.

So it seems that 2.4 is very slightly better at 300 memclocks and I 10, 1 thread. Haven't had time to test other settings, it could vary. Oh and OS is 64-bit Lubuntu.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
I've got a nice idea for VECTORS2 and the nonce-check ^^ ... so the chance to get 2 positive nonces within a single uint2 work-item is extremely small, right?
Will play around with it tomorrow and perhaps I'll do another commit for diakgcn.

Dia
The chance of getting a positive nonce is ALWAYS the same for each hash you do, no matter when you do it.

If a single thread is idle it is wasted.

Edit: and aborting all threads when you find a nonce means you on average double the overhead of setting up work.
(i.e. time wasted when the GPU could be mining)
Vbs
hero member
Activity: 504
Merit: 500
Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example,
Code:
#elif defined VECTORS2
bool result = min(W[117].x,W[117].y);
if (!result) {
if (!W[117].x)
output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
else //if (!W[117].y)
output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
}
Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is:
Code:
phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363
phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362

 Grin
This looks okay but it's in the output path so not hit very often so unlikely to make a demonstrable performance change :\

True, and the better the branching prediction works with "if (!result)" the lesser it will be taken. I'll check how min() gets implemented in low level.
Jump to: