Pages:

Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 14. (Read 107059 times)

jedi95

full member

Activity: 219

Merit: 120

Quote from: Diapolo on July 12, 2011, 09:12:18 AM

Quote from: kbsbtc on July 12, 2011, 08:05:59 AM

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: kbsbtc on July 12, 2011, 08:05:59 AM

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Vince

newbie

Activity: 38

Merit: 0

Quote from: kbsbtc on July 12, 2011, 08:05:59 AM

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

So .. what about some more information? Sad

Pool? Version used? Clock speeds? 5830 @ 305 seems to be somewhat overclocked ..

kbsbtc

newbie

Activity: 53

Merit: 0

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: teukon on July 12, 2011, 06:26:22 AM

Quote from: Diapolo on July 12, 2011, 04:38:26 AM

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again

.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent. That being the case, which cards are you using for testing?

I own a 5870, a 5830 and use AMD KernelAnalyzer to get infos for 69XX cards. You see I focused on that cards during my own tests. I could receive infos for more cards via AMD KA, but it seems hard to optimize one kernel for all cards

.

Dia

pandemic

sr. member

Activity: 434

Merit: 250

My 5830 went from 304mh/s to 307mh/s. Small increase, but why not?

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: Diapolo on July 12, 2011, 04:38:26 AM

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent. That being the case, which cards are you using for testing?

Diapolo

hero member

Activity: 772

Merit: 500

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again

Dubs420

newbie

Activity: 20

Merit: 0

I just tried your latest kernel in GUIminer with poclbm miner went from 417 to 419 each GPU great work thanks. was able to tweak a little more up to 420.6 to 421.0 using -f 1

hugolp

legendary

Activity: 1148

Merit: 1001

Radix-The Decentralized Finance Protocol

Quote from: Diapolo on July 11, 2011, 02:31:55 PM

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Im using the previous patch, that is almost as fast, in that card, so its ok.

Im wondering as well if the card has some kind of problem, but with other kernels it has been running non-stop for days without a problem. Dont know why some kernels trigger the crash.

TurdHurdur

full member

Activity: 216

Merit: 100

Quote from: Wildvest on July 11, 2011, 07:25:55 PM

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel

How'd you get this future kernel?

Vince

newbie

Activity: 38

Merit: 0

So here are some more changes:

I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it,

extended self.f:
self.f = np.zeros(5, np.uint32)
to
self.f = np.zeros(6, np.uint32)

just after W17 calculation in calculateF:
   #W17_2
   self.f[5] = np.uint32(0x11002000+(
   rot(self.f[2], 32-13) ^
   rot(self.f[2], 32-15) ^
   (self.f[2] >> 10)
   ))

added the parameter (right after W17) in call and function

=> Effectively 3 Op's saved.

next change:
You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones.
Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them.

=> less memory use, but has the same speed for me

Next one:
Round 3

#ifdef VECTORS
   Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
   Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4;
#endif

--
   // Round 3
   Vals[0] = state0 + Vals[4];
   Vals[4] += T1;

--

W[64 - O] = state0 + Vals[0];

you can reorganize and shorten round 3 to:
   Vals[0] = T1 + Vals[4];

needed changes in precalculation:
Preval4 += T1
T1 = state0 - T1

=> another addition almost effortless

here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!

Wildvest

newbie

Activity: 41

Merit: 0

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel

erek

newbie

Activity: 36

Merit: 0

Quote from: error on July 11, 2011, 05:21:38 PM

Quote from: error on July 11, 2011, 03:25:28 PM

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

I totally disagree, each version for me has been getting faster and faster. 7-11-11 is the fastest, yet for me.

Vince

newbie

Activity: 38

Merit: 0

Decreasing hashrates .. thats really strange. These 58xx-cards sometimes behave quite strange.

I cant test it cause all my rigs run on 6950's unlocked to 6970's

error

hero member

Activity: 588

Merit: 500

Quote from: error on July 11, 2011, 03:25:28 PM

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

wazoo42

newbie

Activity: 42

Merit: 0

7/4/11 = a 1-2 MH/s increase
7/6/11 = 0 increase (maybe slight decrease)
7/11/11 = 1-2 MH/s further increase over 7/4/11

These are on 2x 5830s, and 3x 5770s using ati-drivers-11.6, phoenix-1.50, pyopencl-0.92, and ati-stream-sdk-bin-2.4.

CYPER

hero member

Activity: 812

Merit: 502

Quote from: pennytrader on July 11, 2011, 04:22:54 PM

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

Same here. With the previous version my average was 1758 and now it is 1756.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Ubuntu 32bit

pennytrader

sr. member

Activity: 254

Merit: 250

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

error

hero member

Activity: 588

Merit: 500

Quote from: Diapolo on July 11, 2011, 09:10:13 AM