Author

Topic: another 3% mining increase with poclbm kernel.cl (Read 10088 times)

legendary
Activity: 1162
Merit: 1000
DiabloMiner author
I am locking this topic.

As per OpenCL specification:

Quote
6.5.2 __local (or local)
The __local or local address space name is used to describe variables that need to be
allocated in local memory and are shared by all work-items of a work-group. This qualifier can
be used with arguments to functions (including __kernel functions) declared as pointers, or
with variables declared inside a __kernel function.

6.5.4 __private (or private)
All variables inside a function (including __kernel functions), or passed into the function as
arguments are in the __private or private address space. Variables declared as pointers
are considered to point to the __private address space if an address space qualifier is not
specified except for arguments declared to be of type image2d_t and image3d_t which
implicitly point to the __global address space.

In other words, the hardware threads are stomping each others work and producing nonsense.
full member
Activity: 420
Merit: 100
This method actually caused a decrease of 30%. I went from 26.81 MHash/s down to 18.71 MHash/s.

I have used this with a Radeon 5830 card. from 305MH/s to 315Mh/s
Here's some proof.


new kernel
http://imgur.com/JcMmZ

old kernel
http://imgur.com/YrRuL






what values clock/mem/voltage u have in ur card?
legendary
Activity: 1708
Merit: 1020
couple of sapphire hd 5850 xtremes

hard to tell if it does anything good.

lots of kernel errors on one of the cards. crashed with vpu recover after a couple of minutes.
member
Activity: 98
Merit: 10
Actually 1MHash per card less (POCLBM with Ma() mod)
newbie
Activity: 42
Merit: 0
On my 5870 it jumped 16 mh from 385 to 401. I'm using the poclbm "old" kernel + the other +3% patch (that gives me only 1% of speedup :-( )
Weird, what's the rest of your setup? I've listed mine above and this mod decreases my performance on every 5870 I have tried, mix of ubuntu/windows, ati versions & sdk.
I have a single card, clocked 300/900 on Windows and Catalyst 11.5 (too lazy to install 11.6 and I've read that they don't change anything) and I'm using -w 256. Someone had a big difference on the other patch. I got only a 1%. Perhaps the trick is in the ratio of memory clock/process clock.
hero member
Activity: 927
Merit: 1000
฿itcoin ฿itcoin ฿itcoin
On my 5870 it jumped 16 mh from 385 to 401. I'm using the poclbm "old" kernel + the other +3% patch (that gives me only 1% of speedup :-( )
Weird, what's the rest of your setup? I've listed mine above and this mod decreases my performance on every 5870 I have tried, mix of ubuntu/windows, ati versions & sdk.
newbie
Activity: 21
Merit: 0
basically, add a  __local  key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
Are you using an ATI card? I know that I get a similar drop on my Nvidia, but I only have one and no ATI.  This may only work with ATI, all the reports are with ATI cards.

Only works with ATI card mate. Nvidia are a load of bollocks anyway with Bitcoin, total waste of time. 
Ah, thanks for confirming my suspicions.  I've heard that ATI is the way to go when mining; however, Nvidia is a better choice for what I use my computer for mostly.  Tongue
newbie
Activity: 42
Merit: 0
On my 5870 it jumped 16 mh from 385 to 401. I'm using the poclbm "old" kernel + the other +3% patch (that gives me only 1% of speedup :-( )
newbie
Activity: 8
Merit: 0
I made this change for my two 6950's (unlocked 908/310 and 850/310). The one clocked at 908 went up about 1 MH/s while the other went up 3-4 MH/s . I did the initial 3% increase mod and my rate shot up from 380 MH/S to 404 MH/s. To say the least I was more then pleased. After making this change it went from 404 to 405 (I sometimes see it jump to 405.Cool. I'm not sure about my other cards numbers pre initial 3% increase but I remember it was on track with 3%.

I'm going to keep my eye on stales and make sure they don't creep up. I'm not too worried.

Every bit helps. Thanks!
newbie
Activity: 27
Merit: 0
AFAIK declaring those variable as local will make theme shared for all the the threads in the workgroup so it's possible to have a performance increase because you are using less memory , but speed doesn't matter when the calculations are all fucked up! !
member
Activity: 103
Merit: 10
Used the file posted here ...

no changes on my 5830 ...

looks like guiminer was already "optimized" for my card ...

nice mod anyway

sr. member
Activity: 464
Merit: 250
It seems the new poclbm already includes the 3%+ boost including the phatk kernel. No need to edit code.  Smiley. I have tested it and it matches my hashrate with the edited code.
Download here: http://forum.bitcoin.org/index.php?topic=1334.0

When I first looked at this I was thinking yea right... I was using GUI miner but decided I wanted something more command line when I set my second box up so I can just make a .bat file and stick it in startup.

Anyway up from 198mhash to 212 on a 5770. And up from 262 to 285 on a 6870
full member
Activity: 154
Merit: 100
It seems the new poclbm already includes the 3%+ boost including the phatk kernel. No need to edit code.  Smiley. I have tested it and it matches my hashrate with the edited code.
Download here: http://forum.bitcoin.org/index.php?topic=1334.0
newbie
Activity: 40
Merit: 0
Decreases my hashrate on my 5850 from 370 (with the first 3%-fix) to 368.
sr. member
Activity: 418
Merit: 250
Went from 130 MH/s to 70 MH/s on my GTX570
newbie
Activity: 42
Merit: 0
Got about 2.5-3% less on a HD 4650.
newbie
Activity: 28
Merit: 0
I've just tried this, didn't help me at all, but I've sent some BTCs your way anyways; thanks for trying to make it better!
legendary
Activity: 1442
Merit: 1005
basically, add a  __local  key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
Are you using an ATI card? I know that I get a similar drop on my Nvidia, but I only have one and no ATI.  This may only work with ATI, all the reports are with ATI cards.
'
Sure, but why?!
sr. member
Activity: 252
Merit: 251
Didn't do anything to be honest. No performance dip but no increase either.

Tried on a 6990 and a 5850.
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
I'll wait for everyone to report back before I try out something from a guy named goxed  Roll Eyes
Smiley
hero member
Activity: 784
Merit: 502
basically, add a  __local  key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
Are you using an ATI card? I know that I get a similar drop on my Nvidia, but I only have one and no ATI.  This may only work with ATI, all the reports are with ATI cards.

Only works with ATI card mate. Nvidia are a load of bollocks anyway with Bitcoin, total waste of time. 
newbie
Activity: 21
Merit: 0
basically, add a  __local  key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
Are you using an ATI card? I know that I get a similar drop on my Nvidia, but I only have one and no ATI.  This may only work with ATI, all the reports are with ATI cards.
legendary
Activity: 1540
Merit: 1002
Well, marking it __local actually hints the opencl compiler on to where you want that variable to be stored. Local memory is the fastest iirc but there's a limited amount of it and if you are already on the limit you may end up getting the wrong thing in local memory, thus making everything slower.

This is obviously an oversimplification. For a little more " to the metal " kind of facts, my experimentation gives me:
- The 1~2% speed increase on a 5870, which can do -w 256 but is at the performance peak at -w 128.
- A fair amount of speed DECREASE on a 5970, with -w 256 which is where it performs best.
- The 1~2% speed increase on the same 5970, with -w 128, although even with the speed increase it's a little slower than the -w 256.

I'm using one of each 5970 and 5870 on a computer, but I don't want to use separate kernels, one for each card, so I'm stuck with the no __local version Sad
hero member
Activity: 927
Merit: 1000
฿itcoin ฿itcoin ฿itcoin
Just tried this on my quad 5870 setup on ubuntu 11.04 using CC 11.5, SDK 2.1, using phoenix and poclbm + the 3% boost from http://forum.bitcoin.org/index.php?topic=22965.0
my hash rate went from 428.57 down to 426.26
member
Activity: 103
Merit: 10
I'll wait for everyone to report back before I try out something from a guy named goxed  Roll Eyes
hero member
Activity: 784
Merit: 502
yup, I gets a speedup on my 5770 too. Respect due
hero member
Activity: 686
Merit: 501
Stephen Reed
Quote
what can I switch to that won't drop my hashrate but will lower CPU usage?

You could try poclbm, as I do not know much about configuring phoenix.  
sr. member
Activity: 418
Merit: 250
On linuxcoin I experimented with phoenix+phatk but found that my CPU utilization went up sharply - from say 3% per miner to 20% or more.  I am heat-constrained, operating my caseless rigs in the crawl space under my Austin, Texas house without air-conditioning or extra fans, so poclbm appeared more watt-efficient, and also I had some stability issues, e.g. crashes when overclocking my Radeon HD 5770 GPUs to 960 MHz with phoenix.  In contrast, poclbm handles my overclocking OK.

That's an excellent reason, I'd say.  Living in Texas I know exactly what you mean.  I'm actually considering trying to change the software my dedicated miners out in the storage room are running, the MH/s are great but the mining threads use 100% of the CPU so it probably wastes electricity (three CPU's pegged out).  I'm using command-line phoenix 1.5 with phatk, what can I switch to that won't drop my hashrate but will lower CPU usage?
hero member
Activity: 686
Merit: 501
Stephen Reed
Quote
Do you mind if I ask why people choose poclbm over phoenix with PhatK ?

as I understand it PhatK with 2.4 SDK has a decent performance edge over poclbm

On linuxcoin I experimented with phoenix+phatk but found that my CPU utilization went up sharply - from say 3% per miner to 20% or more.  I am heat-constrained, operating my caseless rigs in the crawl space under my Austin, Texas house without air-conditioning or extra fans, so poclbm appeared more watt-efficient, and also I had some stability issues, e.g. crashes when overclocking my Radeon HD 5770 GPUs to 960 MHz with phoenix.  In contrast, poclbm handles my overclocking OK.
full member
Activity: 213
Merit: 100
Do you mind if I ask why people choose poclbm over phoenix with PhatK ?

No, but poclbm now include PhatK (there is the same as Phoenix+PhatK).
sr. member
Activity: 418
Merit: 250
Do you mind if I ask why people choose poclbm over phoenix with PhatK ?

as I understand it PhatK with 2.4 SDK has a decent performance edge over poclbm
hero member
Activity: 686
Merit: 501
Stephen Reed
I use linuxcoin and the kernel I edited was named BitcoinMiner.cl, located in /opt/miners/poclbm.

This simple change took each of my six 5770 GPUs, overclocked to 960 MHz, from 204 MH/sec to 212 MH/sec, a 4% increase!
full member
Activity: 213
Merit: 100
Where do I find that kernel.cl?

I've got a folder called "poclbm_py2exe_20110428" and run poclbm with a batch file. But I do not find any kernel.cl in that folder.

If you have problems, simply update your poclbm with the latest version (20110627) including the phatk kernel modification of this thread. Just launch your miner as you do usually.

https://github.com/downloads/m0mchil/poclbm/poclbm_py2exe_20110627.7z

For answer your question, kernel.cl is on the poclbm folder of kernels folder from Phoenix miner. You need to modify the BitcoinMiner.cl of your poclbm miner because you mine with poclbm and not with Phoenix.
newbie
Activity: 3
Merit: 0
Where do I find that kernel.cl?

I've got a folder called "poclbm_py2exe_20110428" and run poclbm with a batch file. But I do not find any kernel.cl in that folder.
legendary
Activity: 1442
Merit: 1005
basically, add a  __local  key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.

I see some Ma3() function in your kernel (I don't have it), which seems to be almost the same as the original Ma(), and my optimization could be applied to it as well. Why didn't you change this Ma3()? Any particular reason?

Good catch Smiley I have been playing a little bit with the kernel. this macro was added by me and is not part of original kernel.
newbie
Activity: 28
Merit: 0

I see some Ma3() function in your kernel (I don't have it), which seems to be almost the same as the original Ma(), and my optimization could be applied to it as well. Why didn't you change this Ma3()? Any particular reason?
newbie
Activity: 21
Merit: 0
This method actually caused a decrease of 30%. I went from 26.81 MHash/s down to 18.71 MHash/s.

I have used this with a Radeon 5830 card. from 305MH/s to 315Mh/s
Here's some proof.


new kernel
http://imgur.com/JcMmZ

old kernel
http://imgur.com/YrRuL

I was using it with a Geforce GTS 360M.  Perhaps it only works with ATI cards?
full member
Activity: 213
Merit: 100
New release of m0mchil's poclbm (2011-06-27) : https://github.com/downloads/m0mchil/poclbm/poclbm_py2exe_20110627.7z

Added an improvement in kernel (use phatk with modification of Ma formula).
+3% hashrate in average.
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
This method actually caused a decrease of 30%. I went from 26.81 MHash/s down to 18.71 MHash/s.

I have used this with a Radeon 5830 card. from 305MH/s to 315Mh/s
Here's some proof.


new kernel
http://imgur.com/JcMmZ

old kernel
http://imgur.com/YrRuL




newbie
Activity: 21
Merit: 0
This method actually caused a decrease of 30%. I went from 26.81 MHash/s down to 18.71 MHash/s.
newbie
Activity: 28
Merit: 0
There's a u W[128] at the beginning of the Search() function, would those be the same as poclbm's w0-w15 ?

TLDR - Probably yes, maybe not. Try it anyway Smiley

When mining, you're calculating SHA256( SHA256( w0,w1,... nonce ..., w15) ), where the w-s are some constants, most of which depend on the block you're trying to solve. The calculation of each SHA consists of two stages - the expansion (a.k.a. message schedule), which takes your input words w0 through w15, and expands them into 64 words, w0 through w63, and the compression, which iterates on these 64 expanded words 64 times, with each wi used only once, on iteration i. So.... There are at least two ways to carry out this calculation. 1 - expand the 16 original input words into 64 words, and then use them. 2 - expand these as necessary, i.e. calculate wi on the iteration i, when it is needed. My guess is that one kernel implements #1, and another implements #2. So, w[128] is probably the same as w0,w1... . It is also hard to tell which way is better, it depends on the compiler and your hardware (register pressure, scheduling and so on)... What I don't know is why they've got 128 words instead of 64; perhaps it is because of calculating the hash twice, or it could be because they're calculating two hashes at once, in parallel. Unfortunately, I don't have access to the source code at the moment, so all I can do is guess Smiley I hope this helps.

newbie
Activity: 28
Merit: 0
Thank you, you are a scholar and a gentleman (for finding and sharing this) Smiley

I'll try it out tonight and report my results, will definitely tip you if it helps!

sr. member
Activity: 418
Merit: 250
Any idea if it would work with PhatK as well?

Normally I wouldn't try it without a truth table proof like bitless posted on his speedup, but since you're just making variables local to a function, it makes sense.  I'll try and report back!

edit: I noticed the PhatK kernel.cl doesn't contain the exact line to modify, I looked for W0 - W15 variables to make __local but I'm no OpenCL coder, so I can't find them if they even exist at all.

There's a u W[128] at the beginning of the Search() function, would those be the same as poclbm's w0-w15 ?

edit2: changed u W[128] to __local u W[128] and no change in hashrate (except from 217.5 -> 217.7 which could be noise), perhaps it's already compiling as local
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
you have to change this line in kernel.cl. tested this with poclbm kernel only.

   u W0, W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15;
to
    __local u W0, W1, W2, W3, W4, W5, W6, W7, W8, W9, W10, W11, W12, W13, W14, W15;

basically, add a  __local  key keyword to this line and it should increase your performance.
Jump to: