Pages:
Author

Topic: *Catalyst 12.1 Preview* Decreased performance, anyone else confirm? - page 2. (Read 20763 times)

legendary
Activity: 2352
Merit: 1064
Bitcoin is antisemitic
I confirm a 30+ Mhs drop on both a 5870 and a 5830 with both CataCLYSM 11.12 and 12.1 compared with 11.6.
What is MUCH WORSE is that you cannot revert back effectively: uninstalling and driver sweeping before rebooting and reinstalling 11.6 will NOT return your previous performance.
Luckily, I had a fresh system backup made just yesterday that I could restore no prob.
Now I am back to mining happily with my 5870@970/300 hashing at 440Mhs+ and a 5830@940/300 hashing at 300Mhs+ with Phoenix 1.7 on Xp.
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

10% drop across all my cards. But no more cpu bug, Cant tell wich is worse.

So can anyone bring there hash rate back to normal with the cat 12.1? I love it's gaming performance, But im losing 10% of my speed so that i can save 17% of my cpu usage... I'd rather not burn out one of my cores, Atop of that it's a waste of power.

legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

10% drop across all my cards. But no more cpu bug, Cant tell wich is worse.
hero member
Activity: 772
Merit: 500
Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

Could you try this one (https://bitcointalksearch.org/topic/further-improved-phatkdia-kernel-for-phoenix-sdk-26-2012-01-13-25860) with Phoenix 1.7.1 and report your results?

Dia
member
Activity: 106
Merit: 10
Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk
sr. member
Activity: 256
Merit: 250
Not quite, python is not among my strong sides. I may rewrite my miner though, just for the experiment. Anyway I have more important projects right now.
hero member
Activity: 772
Merit: 500
Nope, preferred vector size does not always mean "best performance". Wider vectors mean more GPRs used and the more GPRs you use, the less wavefronts you can schedule on a CU, thus occupancy goes down. Also, on VLIW5 hardware, uint4 vectors are not optimal, it can happen that there are no 5 non-dependent instructions to fill the whole VLIW bundle. That depends on your code.

For example, with hash cracking you might end up with uint8 being much better than uint4 for kernels like the MD5 or NTLM one as ALUPacking goes up and the number of used GPRs is 10-20 at most. On the other hand, more complex algorithms like SHA512 are much better with uint2 or even a scalar implementation as the number of used GPRs greatly hampers the occupancy. With memory-intensive kernels like DES ones (thanks god bitcoin is not one), occupancy becomes even more important as more concurrency means memory access  latencies are more easily "hidden".

Back to uint2, problem with it is that it's even worse at utilizing all the slots in the VLIW bundle and your ALUPacking just always sucks. However, generally speaking with most bitcoin kernels it's a tradeoff worth having as bad occupancy in that particular case is worse than bad ALUPacking.

uint3 should provide a better balance between those, but it was broken in pre-2.6 APP SDK releases. Now they fixed it and I am curious about results...


PS as of why suddenly uint4 started performing better, it could be either that they iimproved scheduling or that they have improved the backend compiler to pack instructions better with uint4 / worse with uint2. It could be actually both.


I implemented uint3 quite a few months ago, but it was bugged (like you said) ... since 2.6 I'm able to compile my kernel via KernelAnalyzer and get no errors (the uint3 kernel is much longer by the way). But I guess I did something wrong in the init.py from Phoenix, which I can't solve by myself ... Phoenix crashes if kernel is started. Are you skilled enough to take a look at it?

Dia
member
Activity: 77
Merit: 10
Oh btw there are two different versions of 12.1 preview

The one I tried is this:
http://developer.amd.com/Downloads/OpenCL1.2-Static-Cplus-preview-drivers-Windows.exe

It has a newer openCL than the other 12.1 preview
sr. member
Activity: 256
Merit: 250
Nope, preferred vector size does not always mean "best performance". Wider vectors mean more GPRs used and the more GPRs you use, the less wavefronts you can schedule on a CU, thus occupancy goes down. Also, on VLIW5 hardware, uint4 vectors are not optimal, it can happen that there are no 5 non-dependent instructions to fill the whole VLIW bundle. That depends on your code.

For example, with hash cracking you might end up with uint8 being much better than uint4 for kernels like the MD5 or NTLM one as ALUPacking goes up and the number of used GPRs is 10-20 at most. On the other hand, more complex algorithms like SHA512 are much better with uint2 or even a scalar implementation as the number of used GPRs greatly hampers the occupancy. With memory-intensive kernels like DES ones (thanks god bitcoin is not one), occupancy becomes even more important as more concurrency means memory access  latencies are more easily "hidden".

Back to uint2, problem with it is that it's even worse at utilizing all the slots in the VLIW bundle and your ALUPacking just always sucks. However, generally speaking with most bitcoin kernels it's a tradeoff worth having as bad occupancy in that particular case is worse than bad ALUPacking.

uint3 should provide a better balance between those, but it was broken in pre-2.6 APP SDK releases. Now they fixed it and I am curious about results...


PS as of why suddenly uint4 started performing better, it could be either that they iimproved scheduling or that they have improved the backend compiler to pack instructions better with uint4 / worse with uint2. It could be actually both.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
For those who got less hash with 12.1 with cgminer

Try worksize 64 with vectors 4
That's interesting because each GPU will report what its "preferred vector size" is, and often it comes out to 4, yet despite that, virtually all GPUs had much better performance (with the older SDK) and 2 vectors. Perhaps they live up to their promise now?
member
Activity: 77
Merit: 10
For those who got less hash with 12.1 with cgminer

Try worksize 64 with vectors 4
hero member
Activity: 772
Merit: 500
Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.
Just as a data point, cgminer does not do this. It does not check the whole buffer.

Hi Con,

That is correct, but CGMINER uses Phatk2, which seems currently to not work that well with SDK / runtime 2.6.
So it's again a trial and error to get the best tradeoff.

Dia

Edit: I found out that a Worksize of 64 with Phoenix works great now, this was much slower before 2.6.
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
try with different settings cause when i use -v w256 i have a lot less performance
setting it to -v2 -w 128 should fit better on a 5850
donator
Activity: 1731
Merit: 1008
I get 280 instead of 334mhs with 12.1 (5850 at 825mhz 300ram poclbm -v -w256)

That is not some small drop ... this is 20%,  I was expecting ~5% and that it was worth it for the CPU bug.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.
Just as a data point, cgminer does not do this. It does not check the whole buffer.
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
AKA your new kernal mixed with these drivers is totally fucking pointless for miners

... so just don't use it.

Dia
+1
hero member
Activity: 772
Merit: 500
AKA your new kernal mixed with these drivers is totally fucking pointless for miners

... so just don't use it.

Dia
hero member
Activity: 772
Merit: 500
You could try my kernel with SDK 2.6! On my 6950 I get a 30 MH/s increase in Phoenix 1.7 over current CGMINER 2.0.8. I guess some of Phateus optimisations don't work anymore, because AMD updated the OpenCL compiler to generate better performing code.

Dia

*COUGH* perhaps if someone posted "my kernal" We could learn what the fuck "my kernal" is

WELL!, Nice work, However... Simply switching to your kernal, Gives me back the Mhash i "lost" But at the same time, Gives me back the CPU load bug....
Not pleasant... Also noticed using GUIminer w/Phoenix -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128

Since the CPU pegs at 100%, When i attempt to run a 2nd GPU on the same 100% pegged core, I lose 20mhash on both cards. This does not happen with the regular kernal.
So if i want to run two of my gpu's using this Nice -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128 then i need to PEG two cores at 100%, HOTHOTHOT NO THANKS

Just thought "when was the last time i installed an SDK, Dont they come with the drivers?" *clickclickclick* Oh that might be my problem.... *clickclick* installind SDK APP 2.6

Just installed 2.6sdk, No differance.
Guiminer+Phoenix+Phatk=CPU peg 100%, But the Mhash "loss" is gone, So this is like rolling back
Guiminer+Poclbm=10mhash loss percard, But no CPU 100% bug

Sorry for not posting a direct link - https://bitcointalksearch.org/topic/further-improved-phatkdia-kernel-for-phoenix-sdk-26-2012-01-13-25860
Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.

Dia
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
well lol i see no cpu bug anymore with cgminer and i see hardly any loss of performance but i start cgminer simple on my 5870, 5970 computers like :

cgminer v 2 w 128 -I D --temp-cutoff 95 --auto-gpu --queue 1 --gpu-powertune 20 --gpu-engine 850-920 --gpu-memclock 300 -o http://192.168.178.5:9332 -u user -p pass

And it runs like charm when i set it to i7 it really keeps a steady average, but i need the machine for other things then mining as well Wink
Setting intensity at higher levels has a negative effect and i do not see any difference using them.
With I D it runs at 420 Mh with i6,i7 or i8 its all the same just when i use other programs i get some loss of performance as expected with -I D

Edit: After some testing they go off again ... untill BTC reached $ 30 again
legendary
Activity: 1428
Merit: 1001
Okey Dokey Lokey
You could try my kernel with SDK 2.6! On my 6950 I get a 30 MH/s increase in Phoenix 1.7 over current CGMINER 2.0.8. I guess some of Phateus optimisations don't work anymore, because AMD updated the OpenCL compiler to generate better performing code.

Dia

*COUGH* perhaps if someone posted "my kernal" We could learn what the fuck "my kernal" is

WELL!, Nice work, However... Simply switching to your kernal, Gives me back the Mhash i "lost" But at the same time, Gives me back the CPU load bug....
Not pleasant... Also noticed using GUIminer w/Phoenix -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128

Since the CPU pegs at 100%, When i attempt to run a 2nd GPU on the same 100% pegged core, I lose 20mhash on both cards. This does not happen with the regular kernal.
So if i want to run two of my gpu's using this Nice -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128 then i need to PEG two cores at 100%, HOTHOTHOT NO THANKS

Just thought "when was the last time i installed an SDK, Dont they come with the drivers?" *clickclickclick* Oh that might be my problem.... *clickclick* installind SDK APP 2.6

Just installed 2.6sdk, No differance.
Guiminer+Phoenix+Phatk=CPU peg 100%, But the Mhash "loss" is gone, So this is like rolling back
Guiminer+Poclbm=10mhash loss percard, But no CPU 100% bug
Pages:
Jump to: