further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 12.

nebiki

sr. member

Activity: 742

Merit: 250

went up from 385 to 398. didn't use the 3% thing before. thanks. now i'll have to look at the stale rates.

DullJack

newbie

Activity: 42

Merit: 0

Very nice, will try this on my new rig.

erek

newbie

Activity: 36

Merit: 0

2x 6970s: 755 (old) -> 781 (new)

thanks!

Diapolo

hero member

Activity: 772

Merit: 500

All those who are happy and gain a few MHash/sec make me proud and happy, too

. Keep up posting here!

Dia

xurious

sr. member

Activity: 413

Merit: 250

Was using some patch to get a few extra mh/s yesterday, but I just downloaded this new one and get about 6 more! Badass!

I need to find a way to stop having to implement all these changes across all my machines! Cheesy

Thanks!

r4in

newbie

Activity: 36

Merit: 0

Thanks alot for this.

303 -> 309 @ radeon 6870 (1005/350) using phoenix with your kernel!

Alex AXe

legendary

Activity: 1218

Merit: 1019

360 -> 362 HD6950@900MHz

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: bitless on July 03, 2011, 04:55:19 PM

Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.

That one was removed a few hours after I added it, don´t worry

. You can safely remove "u t1W;" and replace "t1W = t1w(n);" with "t1 = t1W(n);" in sharound2.

Dia

plantucha

newbie

Activity: 56

Merit: 0

Quote from: Diapolo on July 03, 2011, 03:07:14 PM

Quote from: sturle on July 03, 2011, 01:22:12 PM

Quote from: Diapolo on July 03, 2011, 04:38:10 AM

Quote from: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 04:04:15 AM

Hello,

you might want to change this

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may

.
Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay Cheesy

.

My work is not over Wink

.

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia

newbie rules are pretty hard here.
you do have to spend more than 4 hours play with this forum to become able post anywhere

bitless

newbie

Activity: 28

Merit: 0

Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: sturle on July 03, 2011, 01:22:12 PM

Quote from: Diapolo on July 03, 2011, 04:38:10 AM

Quote from: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 04:04:15 AM

Hello,

you might want to change this

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may

.
Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently won´t work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If it´s faster to do a calculation twice but be independend of eachother then okay Cheesy

.

My work is not over Wink

.

Thanks for the 3 people who sent a donation so far! It´s a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia

1MLyg5WVFSMifFjkrZiyGW2nw

newbie

Activity: 28

Merit: 0

Quote from: sturle on July 03, 2011, 01:22:12 PM

Quote from: Diapolo on July 03, 2011, 04:38:10 AM

Quote from: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 04:04:15 AM

Hello,

you might want to change this

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may

.
Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

Well, it worked for me, might be because I only have a slow 4670. However, the same change in sharound2 decreases performance.

Another thing that seems to run a little bit faster on cards without BFI_INT:

Code:

#define Ma(x, y, z) Ch((z^x), (y), (x))

rcocchiararo

newbie

Activity: 78

Merit: 0

downloaded newest file.

307 > 310
278 > 280

cant't complain, it was free Tongue

Apopfis

newbie

Activity: 12

Merit: 0

http://i55.tinypic.com/2q0kq69.png

http://i55.tinypic.com/2q0kq69.png

Testing my new Sapphire 5850
I need new PSU and watercooling for the card to make it stable. Currently running cards (overnight) steady at 412 Mh/s @ 1030 MHz core and other one ~360 Mh/s @ 950 core because only AGGRESSION=8. Psu is Corsair 450W so cannot OC & OV both cards to the max. For the first card this mod gave increase from 401-> 412 while card is running @ 1030 core.

sturle

legendary

Activity: 1437

Merit: 1002

https://bitmynt.no

Quote from: Diapolo on July 03, 2011, 04:38:10 AM

Quote from: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 04:04:15 AM

Hello,

you might want to change this

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:

#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may

.
Can only think of a good compiler optimization...

It is not obvious that this will gain anything. On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first. This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

strictlyfocused

newbie

Activity: 55

Merit: 0

Went 232 to 233 on an MSI 5770 Cheesy

Sending a donation your way!

Soak

full member

Activity: 213

Merit: 100

Quote from: Diapolo on July 01, 2011, 05:19:02 PM

Try to uninstall SDK 2.1 and use that Cat 11.7 preview here: http://developer.amd.com/Downloads/110619a-121104E.zip

390 with new Catalyst and old kernel.
388 with new Catalyst and new kernel.

ATI Radeon 6970 on Windows 7.

figvam

newbie

Activity: 42

Merit: 0

340 Mh/s -> 344 Mh/sec in peak on a 5850.

h2

newbie

Activity: 8

Merit: 0

thanks a lot! brought me from 421 to 423 (x2) for my 5970 (already had the 3% Ma patch applied)

everybody getting "FATAL kernel error: Failed to load OpenCL kernel!" run phoenix once with sudo prefix. afterwards you can launch it again without sudo.

greets
H²

saykor

sr. member

Activity: 350

Merit: 250

Quote from: deepceleron on July 03, 2011, 09:59:59 AM

Quote from: Diapolo on July 03, 2011, 05:46:21 AM

Use Cat 11.7 preview!

Dia

The ATI 110619a-121104E file doesn't include a "11.7" video driver. It does have a new AMD APP SDK, version 2.5.684.24. The package also installs drivers for ATI TV Wonder 600 USB and Hydravision (WTF?), so if you want to try out the new SDK, be sure to do a custom install and uncheck the other stuff.

That being said, the new SDK makes no difference. I've ran 10.11 and 11.6 drivers on 2.4 and 2.5 SDK with identical benchmarks on all in a very repeatable setup. However your patch does make a difference! The improvement on my overclocked 5830 (1070/392, WinXPsp3 stripped, Sempron 2.7ghz):

Before: 340.21 Mhash/s
After: 341.59 Mhash/s
Improvement: 0.41%

I haven't run it long enough to see if it increases rejects or stales, but I will report back. This is looking good (but it will take a week of mining with the patch to even make up for my 30 minutes downtime...)

What is your setting? How you make 340 with 5830? i am on 307max with same cards.
You know from where to download the "11.7" video driver?

Thanks for your help

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 12. (Read 51276 times)