further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 15.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: hugolp on July 11, 2011, 01:19:54 PM

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Vince on July 11, 2011, 12:39:13 PM

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.

I can't read or edit Phyton, so yes there is room if one could alter or add some more kernel arguments.
Strange thing is, that I saw some additions, of known values, which I tried to to eleminate via constants, but this led to lower kernel performance. I played around with this today and saw no more improvement ... too bad, was real fun the last days!
If you would like to share your work, we all will be happy

. What is your kernel doing for 58XX cards? I thought it makes no sense, to optimize one over the other and tried to reduce ALU OP count for both platforms.

Dia

Vince

newbie

Activity: 38

Merit: 0

Quote from: BOARBEAR on July 11, 2011, 01:28:47 PM

I wonder why do we need const uint D1.
It is only use once.

Its part of the precalculation. Its needed.

BOARBEAR

member

Activity: 77

Merit: 10

I wonder why do we need const uint D1.
It is only use once.

hugolp

legendary

Activity: 1148

Merit: 1001

Radix-The Decentralized Finance Protocol

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Vince

newbie

Activity: 38

Merit: 0

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.

dikidera

full member

Activity: 126

Merit: 100

At 2 megahashes, your device produces around 2 to 2,5 million hashes per second, If we halven that to 1 megahash, that's still 1,25 million hashes per second, if we halven that, around 750 thousand per second.
So a increase of 0.2% or so, yields around 100 thousand hashes more per second.

dishwara

legendary

Activity: 1855

Merit: 1016

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

I wish & hope you will still find out some ways to get more hashes.
Thanks.

OCedHrt

member

Activity: 111

Merit: 10

Quote from: Diapolo on July 11, 2011, 11:51:53 AM

Quote from: OCedHrt on July 11, 2011, 11:29:14 AM

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy

.

Dia

I only searched for Vals[2] so did not see the % XD

Very small increase for me on this one 278->278.5. Interestingly a lower memory clock on my 6870 actually has a detrimental effect. Downclocking from 1050->800 reduces hash rate b 0.5MH/s. I can't clock it any lower so don't know if 300 will be better or not.

teukon

legendary

Activity: 1246

Merit: 1011

This does work with SDK 2.1 but it might be a tiny bit slower than your previous version.

HD5850, 1.0875V, 975MHz clock, 360 MHz RAM, aggression=14, worksize=256, Catalyst 11.6 (Linux)

SDK 2.1: 404.6 MH/s -> 404.5 MH/s
SDK 2.4: 401.8 MH/s -> 402.2 MH/s

Note that at aggression=14 my rate can sometimes drop as much as 1 MH/s suddenly before recovering but usually varies by 0.2 MH/s so the apparent decrease with SDK 2.1 could well be statistical noise.

I might also have to play with the RAM frequency again.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: OCedHrt on July 11, 2011, 11:29:14 AM

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy

.

Dia

OCedHrt

member

Activity: 111

Merit: 10

Quote from: Diapolo on July 11, 2011, 09:10:13 AM

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Bobnova

full member

Activity: 210

Merit: 100

I also gained about 1Mh/s from todays update compared to the previous update, this is on a 5830 at 875/900 in linux.
The previous update made a big difference over what ships with phoenix 1.50.

I sent a small donation, as you've helped me make more money Cheesy

Turix

member

Activity: 76

Merit: 10

Gained about 1 Mhash (431->432) from the 7th version to todays new version on my 5870 (950/315).

Diapolo

hero member

Activity: 772

Merit: 500

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Vince

newbie

Activity: 38

Merit: 0

I'm pretty sure the compiler will catch this Wink

Note that the speed increase is minimal, ~0.1-0.2% maybe.

BOARBEAR

member

Activity: 77

Merit: 10

So i changed the whole thing to

   Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS
   if(Vals[7].x == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W[3].x;
   if(Vals[7].y == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] = W[3].y;
#else
   if(Vals[7] == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W[3];
#endif

does not notice any speed difference, hope that helps

Ps: Does the compiler really do the optimization? If not you introduced one more step cause K[60] appear twice now

Vince

newbie

Activity: 38

Merit: 0

Quote from: BOARBEAR on July 10, 2011, 01:50:32 PM

Mind explaining more? I don't get it

Its a constant.
If I add it together with the other stuff in round 124 to Vals[7], it takes an addition to do so, cause its the only constant.

If moved to the comparison at the end, the two constants H[7] and K[60] are merged together into one by the compiler, same execution time here.

BOARBEAR

member

Activity: 77

Merit: 10

Quote from: Vince on July 10, 2011, 01:30:33 PM

Quote from: BOARBEAR on July 10, 2011, 01:13:45 PM

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.

Mind explaining more? I don't get it

Vince

newbie

Activity: 38

Merit: 0

Quote from: BOARBEAR on July 10, 2011, 01:13:45 PM

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 15. (Read 107059 times)