Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 (Read 106700 times)

hero member
Activity: 769
Merit: 500
Can someone tell me which aggression setting gives me the absolute best performance with my HD6950 (Heavily OC'd with unlocked shaders) please as I can't seem to find it in this thread or anywhere? I also want a list of flags for this miner (running in GUIMiner with Phoneix as polcm kept having issues lately)

thanks

AGGRESSION=12, higher levels will lead to an idle miner in Phoenix, because it can't get work fast enough. Perhaps 13 or 14 works for your setup!

Available switches can be found in the init file:
Code:
PLATFORM = KernelOption(
'PLATFORM', int, default=None,
help='The ID of the OpenCL platform to use')
DEVICE = KernelOption(
'DEVICE', int, default=None,
help='The ID of the OpenCL device to use')
VECTORS2 = KernelOption(
'VECTORS2', bool, default=False, advanced=True,
help='Enable vector uint2 support in the kernel.')
VECTORS4 = KernelOption(
'VECTORS4', bool, default=False, advanced=True,
help='Enable vector uint4 support in the kernel.')
FASTLOOP = KernelOption(
'FASTLOOP', bool, default=False, advanced=True,
help='Run iterative mining thread.')
AGGRESSION = KernelOption(
'AGGRESSION', int, default=5, advanced=True,
help='Exponential factor indicating how much work to run per OpenCL execution')
WORKSIZE = KernelOption(
'WORKSIZE', int, default=None, advanced=True,
help='The local worksize to use when executing OpenCL kernels.')
BFI_INT = KernelOption(
'BFI_INT', bool, default=True, advanced=True,
help='Use the BFI_INT instruction for AMD GPUs.')

Remember, this one will get no further development time!

Dia
legendary
Activity: 1022
Merit: 1000
Freelance videographer
Can someone tell me which aggression setting gives me the absolute best performance with my HD6950 (Heavily OC'd with unlocked shaders) please as I can't seem to find it in this thread or anywhere? I also want a list of flags for this miner (running in GUIMiner with Phoneix as polcm kept having issues lately)

thanks
hero member
Activity: 769
Merit: 500
Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia

Wouldn't it be easier to separate the kernels out for each of the 5xxx, 6xxx, and 7xxx series cards instead of trying to make a one size fits all. Is it possible to test at start up and exclude cards that a certain kernel isn't designed to run on? I hate to see you wasting time supporting the older cards with new kernels.





Easy answer, I just focused on GCN performance with DiaKGCN, that it runs on VLIW4/5 is just nice to have. I won't spent anymore time in optimising performance of phatk_dia or DiaKGCN for older cards. I even don't know for what I do all this, because this one (phatk_dia) seems to not be faster really for anyone + no one cares to support development via a small donation. People seem to just donate something if they gain 10+ MH/s over another kernel ... my hard work that was put into a specific version is not paid any attention to it seems :-/.

Dia

PS.: To discuss DiaKGCN further please use https://bitcointalksearch.org/topic/diakgcn-kernel-for-cgminer-phoenix-2-79xx-78xx-77xx-gcn-2012-05-25-61406
member
Activity: 86
Merit: 10
Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia

Wouldn't it be easier to separate the kernels out for each of the 5xxx, 6xxx, and 7xxx series cards instead of trying to make a one size fits all. Is it possible to test at start up and exclude cards that a certain kernel isn't designed to run on? I hate to see you wasting time supporting the older cards with new kernels.



legendary
Activity: 3472
Merit: 1721
Now tried on 12.1 and SDK 2.6.

HD 6850 - 35 Mhash/s slower
5850s - 100Mhash/s slower

win7 64 pro
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128

5850s are 80 Mhash/s slower if I turn the 6850 off
a solo 5850 will be only 5-8 Mhash/s slower if ran on its own
6850 3 Mhash/s slower when ran on its own

only solution was to run miner on all 4 cores, instead of 1 but then it is utilized in 50-75% (=more heat)
only then I get 5-8Mh/s less

any ideas?

hero member
Activity: 769
Merit: 500
DiaKGCN -> Diapolo Kernel Graphics Core Next

As I said, the new one this is for the 79XX cards, but I really would be interested in how it performs on older cards with current drivers / OpenCL runtime.
The next time you should perhaps reply in the other thread, as I won't work on phatk_dia anymore.

Thanks for your tests,
Dia
newbie
Activity: 11
Merit: 0
Since you had an update today, I guess I'll retest. (diaggcn?Huh, spelling mistake?). Like before all test are short test at aggression 12, unless stated otherwise.

2.5
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps
diaggcn, WORKSIZE=64, VECTORS: 248 MHps
diaggcn, WORKSIZE=64, VECTORS2: 277 MHps
diaggcn, WORKSIZE=64, VECTORS4: 545 MHps  (Guess the vectors 4 bug has not been fixed? That probably means this is 272MHps)
diaggcn, WORKSIZE=128, VECTORS: 248 MHps
diaggcn, WORKSIZE=128, VECTORS2: 277 MHps
diaggcn, WORKSIZE=128, VECTORS4: 551 MHps (Probably 276MHps)
diaggcn, WORKSIZE=256, VECTORS: 248 MHps
diaggcn, WORKSIZE=256, VECTORS2: 271 MHps
diaggcn, WORKSIZE=256, VECTORS4: 540 MHps (Probably 270MHps)
diaggcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 276 MHps

might test my card on 2.6 later, but on 2.5, I am getting worse results than before, oh well.
hero member
Activity: 769
Merit: 500
Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia
newbie
Activity: 11
Merit: 0
Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870
hero member
Activity: 769
Merit: 500
DiaKGCN kernel is ready, if you like try it with VLIW5 and VLIW4 hardware It should be interesting how worse or good a GCN optimized kernel performs on older hardware:
https://bitcointalksearch.org/topic/diakgcn-kernel-for-cgminer-phoenix-2-79xx-78xx-77xx-gcn-2012-05-25-61406

Dia
sr. member
Activity: 434
Merit: 250
You should start rolling out pre-compiled compressed files or something. It's getting above my knowledge, lol!
hero member
Activity: 769
Merit: 500
I'm currently working pretty hard on a kernel for 7970 cards and am looking for a few guys, who are willing to test / benchmark it.
Please apply in this thread or via PM, you need to have a 7970 card and be on a current Phoenix version with latest Catalyst.
For now I don't want to release the kernel into the wild, sorry ... it's not polished Cheesy.

Thanks,
Dia
zvs
legendary
Activity: 1680
Merit: 1000
https://web.archive.org/web/*/nogleg.com
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
Hasn't been my experience, nor any of the other half a dozen people I know that run 5830 setups.  The decision is more along the lines of 'do I want to run the card cooler with a lower memory setting', vs 'do I want to run at 395mhz memory, but gain a few mhash?'.

I speak of 5830's exclusively.
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
OH THATS THE TRICK?!?! My 6870's "sweetspot" SEEMS to be 490 with the core at 990! That makes Quite alot of sense!, I was planning on look for a SweetER spot but i felt that 490 "was it" and that i wouldnt find anything better, So i didnt look.
legendary
Activity: 1162
Merit: 1000
DiabloMiner author
(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
hero member
Activity: 769
Merit: 500
I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Great you got it working, I only wanted to mention it's intended for 2.6+ Cheesy.

Dia

Btw.: The current kernel doesn't work with 7970 + GCN seems to dislike vectors for mining.
legendary
Activity: 3472
Merit: 1721
I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy
legendary
Activity: 1512
Merit: 1032
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
sr. member
Activity: 1428
Merit: 344
I get about 10-9MH/s increase. Thank you Diapolo!
zvs
legendary
Activity: 1680
Merit: 1000
https://web.archive.org/web/*/nogleg.com
Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs.

It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure. 
Pages:
Jump to: