further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 18.

Diapolo

hero member

Activity: 772

Merit: 500

New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM

.

OCedHrt

member

Activity: 111

Merit: 10

Quote from: n4l3hp on July 07, 2011, 05:44:05 AM

Quote from: OCedHrt on July 07, 2011, 03:32:42 AM

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.

You misread my post. I am running at 460 core because the card is 105C at that speed. I cannot run it any faster. I can run 480 core with this new kernel. Btw, the days of SDK 2.1 and poclbm are nearly over. I get 84MH/s at 675 core and 494 mem. I can't do 250 mem the card doesn't downclock that far with afterburner. At 250 mem it would be even higher. However I can actually clock 700+ even though only for a few seconds.

n4l3hp

full member

Activity: 173

Merit: 100

Quote from: OCedHrt on July 07, 2011, 03:32:42 AM

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.

Diapolo

hero member

Activity: 772

Merit: 500

Great, so we have a fix and a version that works with 2.1. Will release a fix later today!

Dia

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: Diapolo on July 07, 2011, 04:46:52 AM

Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

You were perfectly clear, I was just being dumb. Incase you missed my second post, this fix works for SDK 2.1. Thank you very much.

Diapolo

hero member

Activity: 772

Merit: 500

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: Diapolo on July 07, 2011, 01:02:05 AM

You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Sorry about that last post. I'm not usually that dumb I assure you.

I've modified your kernel code by adding (u) before each of the 5 raw hex values corresponding to the error messages. I also added (u) directly before L from the other error message. After this everything starts working in SDK 2.1.

For my stock voltage 5850:
423.7 (+/- 0.1) MH/s -> 425.9 (+/- 0.05) MH/s

This does of course mean that SDK 2.1 has increased its lead against SDK 2.4 for me. So many people are convinced that SDK 2.4 is faster so perhaps this is a Windows/Linux thing.

If this runs for 24 hours without freezing then I have a new personal best! I will want to test what proportion of these hashes are inaccurate but things are looking good. Another donation is coming your way.

OCedHrt

member

Activity: 111

Merit: 10

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

teukon

legendary

Activity: 1246

Merit: 1011

hugolp

legendary

Activity: 1148

Merit: 1001

Radix-The Decentralized Finance Protocol

5870, Ubuntu 11.04, 11.6, 2.4, poclbm, went up 1MH/s (with last modification from previous modification).

~~The good news is the card that was randomly crashing the miner every 20 minutes with previous patch has been running for more than an hour without problems, so it seems stable now.~~ Just crashed. I dont know what happens with this card and the modified kernel. Also, consumption has gone down like 5W. Im very puzzled by this changes in consumption by the different kernels.

Very good job. A small donation is going your way.

dsky

sr. member

Activity: 279

Merit: 250

All miner are Windows 7 x32 - SDK 2.4 - Catalyst 11.6

Latest changes:
HD5770 - from 219 up to 220
HD6950 (unlockable) - from 367 to 370
HD6970 (6950 with 6950 BIOS) - from 405 up to 408

Small speed increase on all three kind of cards and the rejected rate seems better, too.

Well done again, Sir!

Diapolo

hero member

Activity: 772

Merit: 500

c_k

donator

Activity: 242

Merit: 100

New release gives me 2-3MH/s more

I've given a small donation

Thanks for the hard work!

Alan Lupton

newbie

Activity: 42

Merit: 0

2001-06-07: Wow, nice work! Now I'm getting not 5-15% rejections and working like a charm. No speed increase though from last update.

teukon

legendary

Activity: 1246

Merit: 1011

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on :

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[19] = P4(19) + 0x11002000 + P1(19);
^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[30] = P3(30) + 0xA00055 + P1(30);
^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
Vals[3] = L + W[64];
^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[81] = P4(81) + P2(81) + 0xA00000;
^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[94] = P3(94) + 0x400022 + P1(94);
^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through. If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.

Wildvest

newbie

Activity: 41

Merit: 0

THANKS for your efforts ! just reporting back Cool

6990 version 2011-07-06 with Catalyst 11.4, SDK 2.4 now equal with the latest poclbm (phatk) - maybe 0.5 MH/s slower Cry

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: jedi95 on July 06, 2011, 03:53:03 PM

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

I read elsewhere that this is the theory but in practice phatk is faster than poclbm on SDK 2.1 for me. Maybe this has something to do with the fact that I've applied the MA tweak (one less operation) to both kernels.

E.g. Sapphire HD5850 Xtreme 1000MHz core, 350MHz RAM, Catalyst 11.6 (Linux x86_64), VECTORS BFI_INT FASTLOOP=false AGGRESSION=13 WORKSIZE=256:
phatk: 413.3 MH/s (+/- 0.2 MH/s)
poclbm: 411.4 MH/s (+/- 0.2 MH/s)

I've tried lower core speeds and higher RAM speeds but always phatk outperforms poclbm on SDK 2.1 for me.

Quote from: jedi95 on July 06, 2011, 03:53:03 PM

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

I'll try that.

dishwara

legendary

Activity: 1855

Merit: 1016

I am sure, it increases.
424-447 Mhash/s & 413 -430 Mhash/s
Sapphire 5870 & MSI 5870.

jedi95

full member

Activity: 219

Merit: 120

Quote from: Diapolo on July 06, 2011, 03:07:43 PM

Quote from: teukon on July 06, 2011, 02:49:09 PM

Quote from: Diapolo on July 06, 2011, 02:34:13 PM

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it. The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk. When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://:@:/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[ ] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily. I wish I had more details for you but I just don't know how to get them.

If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy

.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

teukon

legendary

Activity: 1246

Merit: 1011

Quote from: Diapolo on July 06, 2011, 03:07:43 PM

If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy

.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Would I get more detailed feedback from another front-end to phatk? I haven't really 'shopped around' with the front ends.

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 18. (Read 107059 times)