Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 12. (Read 51276 times)

sr. member
Activity: 742
Merit: 250
went up from 385 to 398. didn't use the 3% thing before. thanks. now i'll have to look at the stale rates.
newbie
Activity: 42
Merit: 0
Very nice, will try this on my new rig.
newbie
Activity: 36
Merit: 0
2x 6970s:  755 (old) -> 781 (new)


thanks!
hero member
Activity: 772
Merit: 500
All those who are happy and gain a few MHash/sec make me proud and happy, too Smiley. Keep up posting here!

Dia
sr. member
Activity: 413
Merit: 250
Was using some patch to get a few extra mh/s yesterday, but I just downloaded this new one and get about 6 more! Badass!

I need to find a way to stop having to implement all these changes across all my machines! Cheesy

Thanks!
newbie
Activity: 36
Merit: 0
Thanks alot for this.

303 -> 309 @ radeon 6870 (1005/350) using phoenix with your kernel!
legendary
Activity: 1218
Merit: 1019
360 -> 362 HD6950@900MHz  Smiley
hero member
Activity: 772
Merit: 500
Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.

That one was removed a few hours after I added it, don´t worry Smiley. You can safely remove "u t1W;" and replace "t1W = t1w(n);" with "t1 = t1W(n);" in sharound2.

Dia
newbie
Activity: 56
Merit: 0
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may Smiley.
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay Cheesy.

My work is not over Wink.

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia

newbie rules are pretty hard here.
you do have to spend more than 4 hours play with this forum to become able post anywhere
newbie
Activity: 28
Merit: 0
Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.

hero member
Activity: 772
Merit: 500
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may Smiley.
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay Cheesy.

My work is not over Wink.

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia
newbie
Activity: 28
Merit: 0
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may Smiley.
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

Well, it worked for me, might be because I only have a slow 4670. However, the same change in sharound2 decreases performance.

Another thing that seems to run a little bit faster on cards without BFI_INT:
Code:
#define Ma(x, y, z) Ch((z^x), (y), (x))
newbie
Activity: 78
Merit: 0
downloaded newest file.

307 > 310
278 > 280

cant't complain, it was free Tongue
newbie
Activity: 12
Merit: 0
http://i55.tinypic.com/2q0kq69.png

http://i55.tinypic.com/2q0kq69.png

Testing my new Sapphire 5850
I need new PSU and watercooling for the card to make it stable. Currently running cards (overnight) steady at 412 Mh/s @ 1030 MHz core and other one ~360 Mh/s @ 950 core because only AGGRESSION=8. Psu is Corsair 450W so cannot OC & OV both cards  to the max. For the first card this mod gave increase from 401-> 412 while card is running @ 1030 core.
legendary
Activity: 1437
Merit: 1002
https://bitmynt.no
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may Smiley.
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.
newbie
Activity: 55
Merit: 0
Went 232 to 233 on an MSI 5770   Cheesy

Sending a donation your way!
full member
Activity: 213
Merit: 100
Try to uninstall SDK 2.1 and use that Cat 11.7 preview here: http://developer.amd.com/Downloads/110619a-121104E.zip

390 with new Catalyst and old kernel.
388 with new Catalyst and new kernel.

ATI Radeon 6970 on Windows 7.
newbie
Activity: 42
Merit: 0
340 Mh/s -> 344 Mh/sec in peak on a 5850.
h2
newbie
Activity: 8
Merit: 0
thanks a lot! brought me from 421 to 423 (x2) for my 5970 (already had the 3% Ma patch applied)

everybody getting "FATAL kernel error: Failed to load OpenCL kernel!" run phoenix once with sudo prefix. afterwards you can launch it again without sudo.

greets
sr. member
Activity: 350
Merit: 250

Use Cat 11.7 preview!

Dia

The ATI 110619a-121104E file doesn't include a "11.7" video driver. It does have a new AMD APP SDK, version 2.5.684.24. The package also installs drivers for ATI TV Wonder 600 USB and Hydravision (WTF?), so if you want to try out the new SDK, be sure to do a custom install and uncheck the other stuff.

That being said, the new SDK makes no difference. I've ran 10.11 and 11.6 drivers on 2.4 and 2.5 SDK with identical benchmarks on all in a very repeatable setup. However your patch does make a difference! The improvement on my overclocked 5830 (1070/392, WinXPsp3 stripped, Sempron 2.7ghz):

Before: 340.21 Mhash/s
After: 341.59 Mhash/s
Improvement: 0.41%

I haven't run it long enough to see if it increases rejects or stales, but I will report back. This is looking good (but it will take a week of mining with the patch to even make up for my 30 minutes downtime...)


What is your setting? How you make 340 with 5830? i am on 307max with same cards.
You know from where to download the "11.7" video driver?

Thanks for your help
Pages:
Jump to: