Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 10. (Read 106995 times)

hero member
Activity: 772
Merit: 500
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia
hero member
Activity: 772
Merit: 500
Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card

Thanks for pointing me to that thread!

Dia
hero member
Activity: 658
Merit: 500
Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card
hero member
Activity: 772
Merit: 500
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy.
And thanks again for your work!

Dia

No chance, tried different things and combinations, but the OpenCL compiler does it better, than I do (again ^^).

Dia
hero member
Activity: 772
Merit: 500
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy.
And thanks again for your work!

Dia
hero member
Activity: 772
Merit: 500
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?

6850 is a VLIW5 design and will be slower ... really only 69XX cards!
hero member
Activity: 772
Merit: 500
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this?  It seems very weird to me that the compiler interpret the two statement differently.

You are not the only one Cheesy, but the compiler sais it's one ALU OP less!

Dia
newbie
Activity: 38
Merit: 0
thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)
hero member
Activity: 927
Merit: 1000
฿itcoin ฿itcoin ฿itcoin
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia
Thanks, gave me a 0.31 Mh/s increase per core on my 6990's  Cool
full member
Activity: 224
Merit: 100
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?
member
Activity: 77
Merit: 10
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this?  It seems very weird to me that the compiler interpret the two statement differently.
hero member
Activity: 772
Merit: 500
69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia
legendary
Activity: 1512
Merit: 1036
I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.

Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.

The code that creates this error is in the phatk init file:

                    if not hash.endswith('\x00\x00\x00\x00'):
                        self.interface.error('Unusual behavior from OpenCL. '
                            'Hardware problem?')


The error is reported if the hash returned by OpenCL does not begin with zeros. The error means that the hash-checking done in OpenCL thought the hash was valid and returned it, but this simple sanity check showed it was invalid. Either the hash-checking math was done wrong in OpenCL (saying that a bad hash was good, and perhaps silently discarding good hashes), or the correct hash is being corrupted when it is returned back to phatk core. It seems like something about the 07-17 kernel causes more errors on high overclock cards (perhaps running a different shader instruction that in silicon that is less tolerant to overclock?), errors that were not produced before at the same clock speed, which reduces overclockability.
member
Activity: 145
Merit: 10
69xx version would be wonderful ;-P
hero member
Activity: 927
Merit: 1000
฿itcoin ฿itcoin ฿itcoin
perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX)

Yes, please do!  Grin
hero member
Activity: 772
Merit: 500
Reposted 2011-07-17 version because of a small mistake in variable naming. T1substate0 was wrong, it has to be state0subT1.
No further changes, that will do anything for those, who grabbed the version before this posting!

Currently no news for you guys, perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX), when there is demand. But for 58XX cards I'm out of optimisation ideas Wink.

Dia
newbie
Activity: 20
Merit: 0
Sent you a little donation as thanks for your work.
member
Activity: 98
Merit: 10
lower heat, less power for sure
Also possible it does something with the timings.
newbie
Activity: 31
Merit: 0
I am wondering though if this is an accurate throughput measurement.  My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration.  I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?

I just realized that the difficulty went up during the time I was testing; this explains the lower 24h rewards.  I'm still curious as to how a lower memory clock frequency could improve the hash rate though.
newbie
Activity: 31
Merit: 0
I am running a 4x6770 rig, with all cards manufactured by Sapphire.  Also using Phoenix 1.5 with SDK 2.4 and 11.6 Catalyst drivers.  Each GPU is clocked to 960/800 at the stock voltage.

Prior to the patch, each GPU capped at 217Mhps.  This is with the 3% phatk mod.  After the patch, I saw no difference until I reduced the memory clock to 300Mhz.  The GPUs now cap at about 220Mhps.

I am wondering though if this is an accurate throughput measurement.  My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration.  I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?
Pages:
Jump to: