Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 3. (Read 106900 times)

hero member
Activity: 769
Merit: 500
Ok, I'll let you first play around a bit, before asking for a performance comparison Cheesy.

I asked, what's happening, if only one card is mining in terms of GPU2 usage "bug", does it go up to 99% then?
Are the cards connected via Crossfirebridge? What OS and driver are you on?

Edit: By the way, did you try to lower mem clock even more via MSI Afterburner and unofficial overclocking mode?

Dia
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey


Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia
Deleted all .Elf's
VECTORS2 and WORKSIZE=128, 307.8mh/s
VECTORS4 and WORKSIZE=64, 314.2mh/s

Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot,
With my mem at 600

VECTORS2 and WORKSIZE=128, 283mh/s
VECTORS4 and WORKSIZE=64, 287mh/s

I currently have two different problems with running PhatkD
2nd gpu dances and messes around
and mem clock Must be at 1000<--Bullocks, That kills my cards
Shocked Huh Undecided
!!!!!!!!!!!
Just noticed... That after deleting all .elf's I've lost performance.... But it's like 2mhash/s and could simply just be that fact that im using my comp while doing these tests
Wtf.. Killing, those .elf's made 128 run better, aswell as made 64 run worse? Thats gotta be inaccurate on my part...
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey


Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia
Deleted all .Elf's
VECTORS2 and WORKSIZE=128, 305mh/s
VECTORS4 and WORKSIZE=64, 317mh/s

Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot,
With my mem at 600

VECTORS2 and WORKSIZE=128, 283mh/s
VECTORS4 and WORKSIZE=64, 287mh/s

I currently have two different problems with running PhatkD
2nd gpu dances and messes around
and mem clock Must be at 1000<--Bullocks, That kills my cards
hero member
Activity: 769
Merit: 500
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia
Well then sorry for the sad news :|

Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia
Well then sorry for the sad news :|
hero member
Activity: 769
Merit: 500
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
316.8mh/sec with previous kernal, 317.6mh/sec with new kernal
No this version does not Appear to be Noteably faster for me, But, On the other hand, My comp feels like its "mining cleaner" i cant really describe it.. Samespeeds.. Less desktoplag...
Yes my cards are Crossfired. When i set just the 2nd gpu to PhatkD, It does what it should, It goes to 99% and gives out exactly the same as what gpu 1 does, But as soon as i enable gpu 1 to mine at the same time as gpu 2 (with gpu2 starting first and running at 99%) Then it Drops off, to 98-92% fluttering, I'll add a pic. Both cards are on differnt cpu cores, just in case

I occasionally get smacked with a stale share RIGHT AWAY, But after that everything is normal.. and it's only occasional, and it's like "Star---OMFG INVALI--Running"

http://imageshack.us/f/718/28674354.png/<--Only useful info is the MSI window
hero member
Activity: 769
Merit: 500
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
hero member
Activity: 769
Merit: 500
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

highlights:
- the child has it's name, I call it phatk_dia - would be nice if you guys use this in discussions to be clear what your kernel is Wink
- faster on VLIW5 GPUs with VECTORS2 and VECTORS4
- more efficient on VLIW4 GPUs with VECTORS2 and a little faster with VECTORS4
- FASTLOOP defaults to false, so you don't need to supply FASTLOOP=false
- added an extended check for supplied WORKSIZE parameter
- removed a pyOpenCL finish() to reduce API overhead (could cause problems, but works here -> consider this beta till it proves stable)

Please report and give me all your coins :-D!

Edit: Please don't complain if this doesn't work good for non 2.6 SDK / Runtime versions, because this IS for 2.6 or later!

Dia
hero member
Activity: 769
Merit: 500
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!

I took a deep look into Phoenix, the initial number of nonces to run per execution is 1 << AGRESSION, so this currently seems to be a value, which is always evenly divisable by 64. That means it is NOT evenly divisable by 192, which makes 192 as WORKSIZE invalid. I'm not sure how to change this to allow for 192 as valid value, whithout breaking other things in the code.

Internal tests with my latest kernel show good results with "VECTORS2 WORKSIZE=128" and even with "VECTORS4 WORKSIZE=64" on VLIW5 GPUs, so perhaps 192 is not needed ... will see.

I'm currently working on release notes, stay tuned.

Dia
newbie
Activity: 46
Merit: 0
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Hohohoo!, Thank you guy what told me to drop Aggro to 10, No more cpu pegging!
I just have one more problem, My 2nd gpu uses about 91-98% usage.. While 1st gpu is at a Lovely 99%... 'Sup? Both gpus are on different cpu cores..
hero member
Activity: 769
Merit: 500
Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!
newbie
Activity: 46
Merit: 0
Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

legendary
Activity: 1344
Merit: 1004
So i've got cat12.1
Crossfire 6870's 1000core 600mem
Phoenix 1.7.3 And 1.7.2
Using Dia's most recent custom kernal
And it Pegs my cpu core at 100%
It also doesnt display hashrates correctly, Unless it actually is only getting 205mh/sec with
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=256
-k phatk AGGRESSION=11 FASTLOOP=false VECTORS2 WORKSIZE=128
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=64

It just seems to Shit all over my cards... and also Pegs my cpu core upto 100%

What the hell am i doing wrong?
Even the Default Phoenix kernal brings back the 100% cpu bug....

And yet, GUIminer Poclbm with -v -w128 -f1 gives me 298mh/sec and no cpu bug....
Really i WANT to use this kernal, But why the hell is it failing?

Poclbm is apparently Stupid Easy to use. And as such i would assume that it cannot do as much as other miners can.
So i came to Phoenix, It was either that or CG, And CG miner ALSO brings back 100%cpu usage, but atleast cg miner gives me 294mhash/sec

use aggression 10 or lower to avoid 100% cpu. you'll get about 40% cpu with 11, and 100 with 12+. 10 and lower will be maybe 4-5%
hero member
Activity: 769
Merit: 500
If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.

For my setup (6950 + 6550D) the strange thing is, that CGMINER (phatk2.x) is slower, no matter how it's configured. The 6950 is quite faster, for the 6550D the difference makes only a few MH/s.

Dia
legendary
Activity: 1512
Merit: 1036
If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.
legendary
Activity: 1526
Merit: 1001
Interesting. I have Sdk2.6 & 11.12 driver. With your kernel, I get back my better performance, but the cpu bug is back. Once I switch off phoenix and switch on poclbm again, it's gone.

I assume there is no way to get rid of the bug AND have the good performance back again?
member
Activity: 111
Merit: 10
NICE,

went from 243mhs to 245 on my 5830 on stock clock with parameters:

-k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=6 WORKSIZE=128 -a 1000
Pages:
Jump to: