Author

Topic: OFFICIAL CGMINER mining software thread for linux/win/osx/mips/arm/r-pi 4.11.0 - page 634. (Read 5805546 times)

hero member
Activity: 518
Merit: 500
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel.   Smiley

What kernel is this ? Still phatk one ?

I got 5870s. Can memory still be underclocked to 300 and you get still good performance ?

Thanks !

phatk as I mentioned  Wink

And on 2.1 & 2.4 SDK, yes they can. Not sure about 2.6, never used that with 5870s.

Currently hashing away at 444.4MH/s on a 950/300 5870, SDK 2.1 -g 1 -I 10 -w 256 -v 2, although -g 1 is probably not a good idea, just something I've stuck with . SDK 2.4 gives me the same, perhaps even very slightly faster hashrate Smiley


Can you try 950 / 300 on SDK 2.1 and 2.4 and see what the difference is ( make sure to delete the bins etc. ) ?

Maybe also try 960 core / 300 memory ?

What OS btw ?

Thanks !
member
Activity: 121
Merit: 10
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel.   Smiley

What kernel is this ? Still phatk one ?

I got 5870s. Can memory still be underclocked to 300 and you get still good performance ?

Thanks !

phatk as I mentioned  Wink

And on 2.1 & 2.4 SDK, yes they can. Not sure about 2.6, never used that with 5870s.

Currently hashing away at 444.4MH/s on a 950/300 5870, SDK 2.1 -g 1 -I 10 -w 256 -v 2, although -g 1 is probably not a good idea, just something I've stuck with . SDK 2.4 gives me the same, perhaps even very slightly faster hashrate Smiley
hero member
Activity: 518
Merit: 500
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel.   Smiley

What kernel is this ? Still phatk one ?

I got 5870s. Can memory still be underclocked to 300 and you get still good performance ?

Thanks !
hero member
Activity: 769
Merit: 500
I've got a nice idea for VECTORS2 and the nonce-check ^^ ... so the chance to get 2 positive nonces within a single uint2 work-item is extremely small, right?
Will play around with it tomorrow and perhaps I'll do another commit for diakgcn.

Dia
member
Activity: 121
Merit: 10
Great job on the 2.3.1! I gained some 1% or even a bit more with my 5k series cards on both SDK 2.1 and SDK 2.4 systems, compared to 2.3.0 on phatk kernel.   Smiley
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example,
Code:
#elif defined VECTORS2
bool result = min(W[117].x,W[117].y);
if (!result) {
if (!W[117].x)
output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
else //if (!W[117].y)
output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
}
Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is:
Code:
phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363
phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362

 Grin
This looks okay but it's in the output path so not hit very often so unlikely to make a demonstrable performance change :\
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Been testing some changes on phatk with the KernelAnalyzer and my own personal testing.

Using a VECTORS2 example,
Code:
bool result = W[117].x & W[117].y;

gives a lot of false positives, changing it to
Code:
bool result = min(W[117].x,W[117].y);

is guaranteed to give yummy results!  Grin

(same ALU #ops and fetch, no false positives on the next 'if')  Cool
See now this is dangerous. Do you REALLY  know how fast the "min" function is on all SDKs? Don't expect AMD to do the right thing and to guarantee it's as fast as &.
Vbs
hero member
Activity: 504
Merit: 500
Ok, one unrolled branch then!  Wink

Code:
#elif defined VECTORS2
          if (!W[117].x) {
               output[FOUND] = FOUND;
       output[NFLAG & W[3].x] = W[3].x;
            if (!W[117].y)
                         output[NFLAG & W[3].y] = W[3].y;
          }
          else if (!W[117].y) {
               output[FOUND] = FOUND;
       output[NFLAG & W[3].y] = W[3].y;
          }
Heh, you're not a coder are you? That's still two branches unless it's positive on the first branch.

To save ckolivas from more frustration maybe I can help.

vbs:  AMD (and I assume Nvidia) GPU take a horrible hit on branches.  The number of total checks is irrelevant.  What matters is the number of branches on the main path.

Only one in 4.3 billion hashes will be a share thus 99.999999976716935634613037109375% of the time any subsequent share checks are never executed.  Optimizing the path which occurs one in 4.3 billion executions is silly right?

We want to make the one that occurs 4.29999999999 billlion out of 4.3 billion attempts as fast as possible.  Given the massive (and I do mean massive forget what you think you know about C++ compilers on x86 hardware) hit that AMD GPU take when it comes to branches that means making the main path have as few branches as possible. 

Neither of your code snippets do that.

Thanks for this mate. This means that the probability of finding 2 hashes in the same vector is 1/(4.3e9*4.3e9)), which is infinitesimally close to 1/inf ~= 0. This allows for a further optimization of the code. Using a VECTORS2 example,
Code:
#elif defined VECTORS2
bool result = min(W[117].x,W[117].y);
if (!result) {
if (!W[117].x)
output[FOUND] = output[NFLAG & W[3].x] = W[3].x;
else //if (!W[117].y)
output[FOUND] = output[NFLAG & W[3].y] = W[3].y;
}
Since min() takes care of the false positives, the 'else' branch is only true when W[117].y==0. The result in the KernelAnalyzer for a 5870 is:
Code:
phatk 120223 -> cycles: min:67.65, max:68.15, avg:67.82, alu:1363
phatk "new" -> cycles: min:67.65, max:67.90, avg:67.78, alu:1362

 Grin
donator
Activity: 1218
Merit: 1079
Gerald Davis
VDDC: 1.084 V, VDDC current: 144 A so the card is using about 150W, got to measure wall power consumption.

Just a heads up.
VDDC isn't all power consumed by card.

There is also VDDCI (which handles things like PCIe interface, memory controller, and ancillary ASICS) and then a separate memory VRM which isn't adjustable/reported.

VDDC of 150W simply means the "cores" are using 150W.
Vbs
hero member
Activity: 504
Merit: 500
Been testing some changes on phatk with the KernelAnalyzer and my own personal testing.

Using a VECTORS2 example,
Code:
bool result = W[117].x & W[117].y;

gives a lot of false positives, changing it to
Code:
bool result = min(W[117].x,W[117].y);

is guaranteed to give yummy results!  Grin

(same ALU #ops and fetch, no false positives on the next 'if')  Cool
newbie
Activity: 28
Merit: 0
Excellent update,  Running FASTer +2-3% AND cooler!  So happy!

Linux 11.04 SDK 2.4 cgminer 2.3.1

single rig open air no risers

GPU 0 5870   850/300 392 Mh/s I:9 82.5C
GPU 1 5770   850/300 193 Mh/s I:9 84.0C
GPU 2 6950   840/900 369.Mh/x I:11 73.0C

Efficiency 90%

Total avg 954.8 Mh/s  Was getting 925 with 2.2.7

Running for 2 hours now  Deepbit  w/Tripplemining fallback pools



-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I wonder what settings are people running their 7970s with.  Not for few hours, but days :-)
The weather's hot here at the moment, but..

GPU 0: 718.2 / 713.3 Mh/s | A:5180  R:16  HW:0  U:10.00/m  I:11
74.0 C  F: 79% (4532 RPM)  E: 1200 MHz  M: 1050 Mhz  V: 1.170V  A: 99% P: 5%
Last initialised: [2012-02-24 17:38:34]
Intensity: 11
Thread 0: 357.7 Mh/s Enabled ALIVE
Thread 1: 360.4 Mh/s Enabled ALIVE

Running flat out since the day I installed it a couple of weeks back (note the +5% powertune as well).

Thanks. I will try those. Fan is on auto, right?  
Which kernel, what options?
You need to confirm your GPU will actually run at those speeds. Every card has different top stability levels.
--auto-gpu --auto-fan -I 11 --gpu-engine 450-1200 --gpu-memdiff -150 --gpu-powertune 5

This is driver 8.921 on Linux 64 bit with GL SYNC enabled. This means it ends up being -k poclbm -w 64 -v 1 . On windows you will not be able to run that high an intensity without running into high CPU usage issues (probably -I 9 is max), and there's no way to enable GL SYNC that anyone's aware of.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I wonder what settings are people running their 7970s with.  Not for few hours, but days :-)
The weather's hot here at the moment, but..

GPU 0: 718.2 / 713.3 Mh/s | A:5180  R:16  HW:0  U:10.00/m  I:11
74.0 C  F: 79% (4532 RPM)  E: 1200 MHz  M: 1050 Mhz  V: 1.170V  A: 99% P: 5%
Last initialised: [2012-02-24 17:38:34]
Intensity: 11
Thread 0: 357.7 Mh/s Enabled ALIVE
Thread 1: 360.4 Mh/s Enabled ALIVE

Running flat out since the day I installed it a couple of weeks back (note the +5% powertune as well).
full member
Activity: 164
Merit: 100
Hi,
I am running a 5870 and with 2.3.1-2 i get ~415Mh/s approximately the same results as with
2.2.3 (which was a bit better then 2.2.6).

When i started the new version it was ok but after a while when i connected to the console it looked like this.
Notice the inflated hashrates:
--------------------------------------------------------------------------------
 (5s):5218.6(avg):4154.5Mh/s | Q:171  A:2956 R:5  HW:0  E:173%  U:5.547m
 TQ: 2  ST: 3  SS: 1  DW: 13  NB: 8  LW: 4335 GF: 0  RF: 0
 Connected to http://xxx:xxxx with LP as user xxx.xxx1
 Block: 0000002e559399ea9c7e863264a387ce...  Started: [06:01:52]
--------------------------------------------------------------------------------
 [P]ool management [G]PU management [fixeStrikerhough]ettings [D]isplay options [Q]uit
 GPU 0: 417.20/414.5h/s | A:296 R:5 HW:0 U: 5.547m I:10
--------------------------------------------------------------------------------

after a while the averages went back to 414.7 but the 5s average did not change back (yet)

//GoK
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
* stupidly limiting memclock on 6000 & 7000 series (I mean I understand a limit on overclock but underclock Huh)
Believe it or not I can actually shed some light on this, and since I'm in a much better mood as someone just kindly donated some BTC, I'll even answer in less than my usual appalling AMD-induced tone of late.

One of the things about the 69xx and now the 79xx architectures is the ability internally to underclock memory relative to running speed. There is a power-usage war going on now between manufacturers, and this is one place where AMD is working very hard (unlike fixing drivers, SDKs and so on). Since the GPUs are running the ram double channel, if the RAM bandwidth is not in full use, they can shut down one of the channels. This is why the power usage does not appear to be directly proportional to the RAM speed. So even though you can only decrease your clock speed to say 900, it might be internally running at 450. This is also why when you flash the bios and underclock the ram, it might crash at apparently satisfactory rates and 6970s are virtually never stable below 300 whereas 5xxx can happily run down to 150. Sure there is more power to be saved if you can actually flash their bios and turn them down to 300 since you are guaranteed to never actually jump between 300 and 600, but it is not universally half the ram speed and power consumption.  Bear in mind that most people do not touch the clock speed of their memory (except usually to increase it) but they do care about power consumption. This is also why it's so hard to pin down power usage on these things as they fluctuate wildly depending on the type of load rather than just the overall load. 100% GPU load could really mean anything and might or might not be high ram bandwidth.
hero member
Activity: 769
Merit: 500
cgminer is not doing any adjustment of anything. It sends the request to the driver. The driver says it has accepted the value for the profile. The hardware then gladly ignores you and although the profile now says the memory is 300, the GPU goes back to its default speed. This is why I made cgminer report back the actual values to you after you try to make a change. If it doesnt work it doesnt work. Nothing can make cgminer make it work because it doesnt have access to the special hardware backdoor commands that afterburner and co. can fuck the operating system up the arse with. AMD did not release a public library for anal reaming of GPUs.

Thanks.  It makes sense now.

What is powertune setting for?

http://sites.amd.com/de/Documents/PowerTune_Technology_Whitepaper.pdf

It's nothing you should change as long as everything works ... in border case situations a + to the PowerTune setting would lead to possible higher power consumption and perhaps a tad more performance, but I think AMDs defaults are fine.

Dia
donator
Activity: 1218
Merit: 1079
Gerald Davis
cgminer is not doing any adjustment of anything. It sends the request to the driver. The driver says it has accepted the value for the profile. The hardware then gladly ignores you and although the profile now says the memory is 300, the GPU goes back to its default speed. This is why I made cgminer report back the actual values to you after you try to make a change. If it doesnt work it doesnt work.

This.

There are two ways to control a videocard:
* the right way
* and the hack way

The "right" way:
Using the AMD driver library you can send requests tot he card.  You can't control anything.  The card is free to ignore or modify any request as it sees fit.  

So it is more like this:
cgminer (via driver): "Video card #1 can you please change clock to 300 Mhz".
video card #1: "command is valid"
internal BIO check. 300Mhz is invalid, ignoring.

So why does AB "show" 300Mhz.  It doesn't.

AB, GPU-Z (main tab), Trixx, etc show what the card has been SET TO not what it is RUNNING AT.

The only three places I have found to always report correct values on what the card IS ACTUALLY RUNNING AT are:
* cgminer.  If cgminer says a card is running at 300Mhz, 30 Ghz, or 0.1 Mhz it probably is right.  It doesn't matter how weird you may think that is.  I can't remember a single instance where cgminer turned out to be wrong.
* (windows) GPU-Z Sensor tab.  Note the first tab shows what card is set at.  That is useless.  If you look on sensor tab it shows what card is running at.
* (linux) aticonfig

The "hack" way:
So how do tools like AB change the clock?  They bypass the drivers and write directly to the GPU BIOS.  This is why it often requires a new version before AB will work with newly released cards.  This also applies to things like how GPU-Z can read VRM temps or how radeonvolt can modify voltage beyond what is allowed by drivers.  The problem with this method is that it isn't universal.  RadeonVolt doesn't work on 5970 so I can't modify my voltage above what is allowed by GPU BIOS.  Maybe someday someone will hack a solution together for Linux, maybe they never will.  

The annoyance:
Since so many hacks exist but aren't universal and bypass the drivers one would think AMD would expand the official drivers to allow full range of clock adjustments, voltage adjustments, and sensor data readings.  Of course they won't.  In the meantime you can flash a card with custom bios to make it do just about anything.  Run at 100Mhz memclock, have "stock" core speed of 1.2Ghz, have a core voltage of 0.7V, etc.  Granted you can also completely destroy the card in an non-warrantied manner but it is possible to change just about anything.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
cgminer is not doing any adjustment of anything. It sends the request to the driver. The driver says it has accepted the value for the profile. The hardware then gladly ignores you and although the profile now says the memory is 300, the GPU goes back to its default speed. This is why I made cgminer report back the actual values to you after you try to make a change. If it doesnt work it doesnt work. Nothing can make cgminer make it work because it doesnt have access to the special hardware backdoor commands that afterburner and co. can fuck the operating system up the arse with. AMD did not release a public library for anal reaming of GPUs.
full member
Activity: 155
Merit: 100
You could try to use MSI Afterburner, see here: http://forums.guru3d.com/showthread.php?t=358990 and enable it's unofficial overclocking mode, see here: http://forums.guru3d.com/showthread.php?t=338906
Dia

Afterburner doesn't work anymore if using the latest (12.3) drivers from AMD.
I'm not sure if it's still working with older versions of catalyst, since I just installed my 7970, and the latest drivers are all I've used.
hero member
Activity: 769
Merit: 500
ckolivas,

Is there a way to downclock memory on 7970 lower than 900MHz with my setup

diakgcn kernel , -v 2 - w 256

Engine - 1050
Memory - 900 (cgminer cannot set it lower, ignores what is set by afterburner and others)
Fan Auto
Power Tune - 10%

I'm getting about 615 Mh/s, steady at 72 C, 50% fan.

Also I cannot lower voltage, every time I lower voltage, cgminer drops Engine speed to like 300.

Also, what is the best setting for "power tune" in my setup.

Thanks,
af_newbie

BTW, I've run this card all the way to 750 Mh/s (with max everything) but cgminer shuts it down engine to 300 MHz after a while.
Good job on the controls.  

 

You could try to use MSI Afterburner, see here: http://forums.guru3d.com/showthread.php?t=358990 and enable it's unofficial overclocking mode, see here: http://forums.guru3d.com/showthread.php?t=338906

Dia
Jump to: