Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 14. (Read 106947 times)

full member
Activity: 219
Merit: 120
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.
hero member
Activity: 772
Merit: 500
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia
newbie
Activity: 38
Merit: 0
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

So .. what about some more information?  Sad

Pool? Version used? Clock speeds? 5830 @ 305 seems to be somewhat overclocked ..
newbie
Activity: 53
Merit: 0
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
hero member
Activity: 772
Merit: 500
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?


I own a 5870, a 5830 and use AMD KernelAnalyzer to get infos for 69XX cards. You see I focused on that cards during my own tests. I could receive infos for more cards via AMD KA, but it seems hard to optimize one kernel for all cards Smiley.

Dia
sr. member
Activity: 434
Merit: 250
My 5830 went from 304mh/s to 307mh/s. Small increase, but why not? Smiley
legendary
Activity: 1246
Merit: 1011
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?
hero member
Activity: 772
Merit: 500
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia
newbie
Activity: 20
Merit: 0
I just tried your latest kernel in GUIminer with poclbm miner went from 417 to 419 each GPU great work thanks. was able to tweak a little more up to 420.6 to 421.0 using -f 1
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Im using the previous patch, that is almost as fast, in that card, so its ok.

Im wondering as well if the card has some kind of problem, but with other kernels it has been running non-stop for days without a problem. Dont know why some kernels trigger the crash.
full member
Activity: 216
Merit: 100
2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
How'd you get this future kernel?
newbie
Activity: 38
Merit: 0
So here are some more changes:

I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it,

extended self.f:
self.f = np.zeros(5, np.uint32)
to
self.f = np.zeros(6, np.uint32)

just after W17 calculation in calculateF:
        #W17_2
        self.f[5] = np.uint32(0x11002000+(
                rot(self.f[2], 32-13) ^
                rot(self.f[2], 32-15) ^
                (self.f[2] >> 10)
                ))

added the parameter (right after W17) in call and function

=> Effectively 3 Op's saved.

next change:
You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones.
Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them.

=> less memory use, but has the same speed for me

Next one:
Round 3

#ifdef VECTORS
        Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4;
#endif

--
        // Round 3
        Vals[0] = state0 + Vals[4];
        Vals[4] += T1;

--

W[64 - O] = state0 + Vals[0];

you can reorganize and shorten round 3 to:
        Vals[0] = T1 + Vals[4];

needed changes in precalculation:
Preval4 += T1
T1 = state0 - T1

=> another addition almost effortless


here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!



newbie
Activity: 41
Merit: 0
2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
newbie
Activity: 36
Merit: 0
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

I totally disagree, each version for me has been getting faster and faster. 7-11-11 is the fastest, yet for me.
newbie
Activity: 38
Merit: 0
Decreasing hashrates .. thats really strange. These 58xx-cards sometimes behave quite strange.

I cant test it cause all my rigs run on 6950's unlocked to 6970's
hero member
Activity: 588
Merit: 500
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.
newbie
Activity: 42
Merit: 0
7/4/11   = a 1-2 MH/s increase
7/6/11   = 0 increase (maybe slight decrease)
7/11/11  = 1-2 MH/s further increase over 7/4/11

These are on 2x 5830s, and 3x 5770s using ati-drivers-11.6, phoenix-1.50, pyopencl-0.92, and ati-stream-sdk-bin-2.4.
hero member
Activity: 812
Merit: 502
On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

Same here. With the previous version my average was 1758 and now it is 1756.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Ubuntu 32bit
sr. member
Activity: 254
Merit: 250
On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.
hero member
Activity: 588
Merit: 500
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.
Pages:
Jump to: