further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 10.

Diapolo

hero member

Activity: 772

Merit: 500

I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: iopq on July 29, 2011, 09:07:11 AM

Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card

Thanks for pointing me to that thread!

Dia

iopq

hero member

Activity: 658

Merit: 500

Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Diapolo on July 24, 2011, 03:48:26 AM

Quote from: Vince on July 23, 2011, 10:29:33 PM

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:

        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy

.
And thanks again for your work!

Dia

No chance, tried different things and combinations, but the OpenCL compiler does it better, than I do (again ^^).

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Vince on July 23, 2011, 10:29:33 PM

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:

        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy

.
And thanks again for your work!

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: indio007 on July 23, 2011, 06:49:00 PM

Quote from: Diapolo on July 23, 2011, 05:11:17 PM

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?

6850 is a VLIW5 design and will be slower ... really only 69XX cards!

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: BOARBEAR on July 23, 2011, 05:43:17 PM

Quote from: Diapolo on July 23, 2011, 05:11:17 PM

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this? It seems very weird to me that the compiler interpret the two statement differently.

You are not the only one Cheesy

, but the compiler sais it's one ALU OP less!

Dia

Vince

newbie

Activity: 38

Merit: 0

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:

        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

MiningBuddy

hero member

Activity: 927

Merit: 1000

฿itcoin ฿itcoin ฿itcoin

Quote from: Diapolo on July 23, 2011, 05:11:17 PM

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Thanks, gave me a 0.31 Mh/s increase per core on my 6990's Cool

indio007

full member

Activity: 224

Merit: 100

Quote from: Diapolo on July 23, 2011, 05:11:17 PM

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?

BOARBEAR

member

Activity: 77

Merit: 10

Quote from: Diapolo on July 23, 2011, 05:11:17 PM

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this? It seems very weird to me that the compiler interpret the two statement differently.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: xcooling on July 23, 2011, 07:03:01 AM

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697

. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works

. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

deepceleron

legendary

Activity: 1512

Merit: 1036

Quote from: hugolp on July 18, 2011, 04:23:29 PM

Quote from: phorensic on July 18, 2011, 02:08:31 PM

I think I jumped the gun. I believe I am having a real hardware problem. It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell. I think what happened is that these errors were hidden from the console until this new kernel update! If that's the case, kudos for making the errors work! haha.

Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.

The code that creates this error is in the phatk init file:

                    if not hash.endswith('\x00\x00\x00\x00'):
                        self.interface.error('Unusual behavior from OpenCL. '
                            'Hardware problem?')

The error is reported if the hash returned by OpenCL does not begin with zeros. The error means that the hash-checking done in OpenCL thought the hash was valid and returned it, but this simple sanity check showed it was invalid. Either the hash-checking math was done wrong in OpenCL (saying that a bad hash was good, and perhaps silently discarding good hashes), or the correct hash is being corrupted when it is returned back to phatk core. It seems like something about the 07-17 kernel causes more errors on high overclock cards (perhaps running a different shader instruction that in silicon that is less tolerant to overclock?), errors that were not produced before at the same clock speed, which reduces overclockability.

xcooling

member

Activity: 145

Merit: 10

69xx version would be wonderful ;-P

MiningBuddy

hero member

Activity: 927

Merit: 1000

฿itcoin ฿itcoin ฿itcoin

Quote from: Diapolo on July 23, 2011, 04:37:33 AM

perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX)

Yes, please do! Grin

Diapolo

hero member

Activity: 772

Merit: 500

Reposted 2011-07-17 version because of a small mistake in variable naming. T1substate0 was wrong, it has to be state0subT1.
No further changes, that will do anything for those, who grabbed the version before this posting!

Currently no news for you guys, perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX), when there is demand. But for 58XX cards I'm out of optimisation ideas Wink

.

Dia

Dubs420

newbie

Activity: 20

Merit: 0

Sent you a little donation as thanks for your work.

bmgjet

member

Activity: 98

Merit: 10

lower heat, less power for sure
Also possible it does something with the timings.

MegaBux

newbie

Activity: 31

Merit: 0

Quote from: MegaBux on July 20, 2011, 11:38:14 AM

I am wondering though if this is an accurate throughput measurement. My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration. I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?

I just realized that the difficulty went up during the time I was testing; this explains the lower 24h rewards. I'm still curious as to how a lower memory clock frequency could improve the hash rate though.

MegaBux

newbie

Activity: 31

Merit: 0

I am running a 4x6770 rig, with all cards manufactured by Sapphire. Also using Phoenix 1.5 with SDK 2.4 and 11.6 Catalyst drivers. Each GPU is clocked to 960/800 at the stock voltage.

Prior to the patch, each GPU capped at 217Mhps. This is with the 3% phatk mod. After the patch, I saw no difference until I reduced the memory clock to 300Mhz. The GPUs now cap at about 220Mhps.

I am wondering though if this is an accurate throughput measurement. My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration. I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 10. (Read 107059 times)