Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 18. (Read 106900 times)

hero member
Activity: 769
Merit: 500
New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM Smiley.
member
Activity: 111
Merit: 10
7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.

You misread my post. I am running at 460 core because the card is 105C at that speed. I cannot run it any faster. I can run 480 core with this new kernel. Btw, the days of SDK 2.1 and poclbm are nearly over. I get 84MH/s at 675 core and 494 mem. I can't do 250 mem the card doesn't downclock that far with afterburner. At 250 mem it would be even higher. However I can actually clock 700+ even though only for a few seconds.
full member
Activity: 173
Merit: 100
7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.
hero member
Activity: 769
Merit: 500
Great, so we have a fix and a version that works with 2.1. Will release a fix later today!

Dia
legendary
Activity: 1246
Merit: 1011
Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

You were perfectly clear, I was just being dumb.  Incase you missed my second post, this fix works for SDK 2.1.  Thank you very much.
hero member
Activity: 769
Merit: 500
legendary
Activity: 1246
Merit: 1011
You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Sorry about that last post.  I'm not usually that dumb I assure you.

I've modified your kernel code by adding (u) before each of the 5 raw hex values corresponding to the error messages.  I also added (u) directly before L from the other error message.  After this everything starts working in SDK 2.1.

For my stock voltage 5850:
423.7 (+/- 0.1) MH/s -> 425.9 (+/- 0.05) MH/s

This does of course mean that SDK 2.1 has increased its lead against SDK 2.4 for me.  So many people are convinced that SDK 2.4 is faster so perhaps this is a Windows/Linux thing.

If this runs for 24 hours without freezing then I have a new personal best!  I will want to test what proportion of these hashes are inaccurate but things are looking good.  Another donation is coming your way.
member
Activity: 111
Merit: 10
7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.
legendary
Activity: 1246
Merit: 1011
legendary
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
5870, Ubuntu 11.04, 11.6, 2.4, poclbm, went up 1MH/s (with last modification from previous modification).

The good news is the card that was randomly crashing the miner every 20 minutes with previous patch has been running for more than an hour without problems, so it seems stable now. Just crashed. I dont know what happens with this card and the modified kernel. Also, consumption has gone down like 5W. Im very puzzled by this changes in consumption by the different kernels.

Very good job. A small donation is going your way.
sr. member
Activity: 279
Merit: 250
All miner are Windows 7 x32 - SDK 2.4 - Catalyst 11.6

Latest changes:
HD5770 - from 219 up to 220
HD6950 (unlockable) - from 367 to 370
HD6970 (6950 with 6950 BIOS) - from 405 up to 408

Small speed increase on all three kind of cards and the rejected rate seems better, too.

Well done again, Sir!

hero member
Activity: 769
Merit: 500
c_k
donator
Activity: 242
Merit: 100
New release gives me 2-3MH/s more

I've given a small donation Smiley

Thanks for the hard work!
newbie
Activity: 42
Merit: 0
2001-06-07: Wow, nice work! Now I'm getting not 5-15% rejections and working like a charm. No speed increase though from last update.
legendary
Activity: 1246
Merit: 1011
Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on :

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
newbie
Activity: 41
Merit: 0
THANKS for your efforts  ! just reporting back  Cool

6990 version 2011-07-06 with Catalyst 11.4, SDK 2.4 now equal with the latest poclbm (phatk) - maybe 0.5 MH/s slower  Cry
legendary
Activity: 1246
Merit: 1011
There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

I read elsewhere that this is the theory but in practice phatk is faster than poclbm on SDK 2.1 for me.  Maybe this has something to do with the fact that I've applied the MA tweak (one less operation) to both kernels.

E.g. Sapphire HD5850 Xtreme 1000MHz core, 350MHz RAM, Catalyst 11.6 (Linux x86_64), VECTORS BFI_INT FASTLOOP=false AGGRESSION=13 WORKSIZE=256:
phatk: 413.3 MH/s (+/- 0.2 MH/s)
poclbm: 411.4 MH/s (+/- 0.2 MH/s)

I've tried lower core speeds and higher RAM speeds but always phatk outperforms poclbm on SDK 2.1 for me.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

I'll try that.
legendary
Activity: 1855
Merit: 1016
I am sure, it increases.
424-447 Mhash/s & 413 -430 Mhash/s
Sapphire 5870 & MSI 5870.
full member
Activity: 219
Merit: 120
I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://:@:/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.


If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().
legendary
Activity: 1246
Merit: 1011
If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Would I get more detailed feedback from another front-end to phatk?  I haven't really 'shopped around' with the front ends.
Pages:
Jump to: