*Catalyst 12.1 Preview* Decreased performance, anyone else confirm? - page 2.

conspirosphere.tk

legendary

Activity: 2352

Merit: 1064

Bitcoin is antisemitic

I confirm a 30+ Mhs drop on both a 5870 and a 5830 with both CataCLYSM 11.12 and 12.1 compared with 11.6.
What is MUCH WORSE is that you cannot revert back effectively: uninstalling and driver sweeping before rebooting and reinstalling 11.6 will NOT return your previous performance.
Luckily, I had a fresh system backup made just yesterday that I could restore no prob.
Now I am back to mining happily with my 5870@970/300 hashing at 440Mhs+ and a 5830@940/300 hashing at 300Mhs+ with Phoenix 1.7 on Xp.

Fiyasko

legendary

Activity: 1428

Merit: 1001

Okey Dokey Lokey

Quote from: Fiyasko on December 29, 2011, 03:17:42 PM

Quote from: HolodeckJizzmopper on December 28, 2011, 11:31:35 PM

Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

10% drop across all my cards. But no more cpu bug, Cant tell wich is worse.

So can anyone bring there hash rate back to normal with the cat 12.1? I love it's gaming performance, But im losing 10% of my speed so that i can save 17% of my cpu usage... I'd rather not burn out one of my cores, Atop of that it's a waste of power.

Fiyasko

legendary

Activity: 1428

Merit: 1001

Okey Dokey Lokey

Quote from: HolodeckJizzmopper on December 28, 2011, 11:31:35 PM

Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

10% drop across all my cards. But no more cpu bug, Cant tell wich is worse.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: HolodeckJizzmopper on December 28, 2011, 11:31:35 PM

Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

Could you try this one (https://bitcointalksearch.org/topic/further-improved-phatkdia-kernel-for-phoenix-sdk-26-2012-01-13-25860) with Phoenix 1.7.1 and report your results?

Dia

HolodeckJizzmopper

member

Activity: 106

Merit: 10

Confirming a 20% drop in hashing performance across a variety of cards using 11.12.

Fuck.

Time to roll back to older drivers.

I hate ATI so damned much sometimes.

Edit: Using phoenix/phatk

gat3way

sr. member

Activity: 256

Merit: 250

Not quite, python is not among my strong sides. I may rewrite my miner though, just for the experiment. Anyway I have more important projects right now.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: gat3way on December 22, 2011, 05:58:56 PM

Nope, preferred vector size does not always mean "best performance". Wider vectors mean more GPRs used and the more GPRs you use, the less wavefronts you can schedule on a CU, thus occupancy goes down. Also, on VLIW5 hardware, uint4 vectors are not optimal, it can happen that there are no 5 non-dependent instructions to fill the whole VLIW bundle. That depends on your code.

For example, with hash cracking you might end up with uint8 being much better than uint4 for kernels like the MD5 or NTLM one as ALUPacking goes up and the number of used GPRs is 10-20 at most. On the other hand, more complex algorithms like SHA512 are much better with uint2 or even a scalar implementation as the number of used GPRs greatly hampers the occupancy. With memory-intensive kernels like DES ones (thanks god bitcoin is not one), occupancy becomes even more important as more concurrency means memory access latencies are more easily "hidden".

Back to uint2, problem with it is that it's even worse at utilizing all the slots in the VLIW bundle and your ALUPacking just always sucks. However, generally speaking with most bitcoin kernels it's a tradeoff worth having as bad occupancy in that particular case is worse than bad ALUPacking.

uint3 should provide a better balance between those, but it was broken in pre-2.6 APP SDK releases. Now they fixed it and I am curious about results...

PS as of why suddenly uint4 started performing better, it could be either that they iimproved scheduling or that they have improved the backend compiler to pack instructions better with uint4 / worse with uint2. It could be actually both.

I implemented uint3 quite a few months ago, but it was bugged (like you said) ... since 2.6 I'm able to compile my kernel via KernelAnalyzer and get no errors (the uint3 kernel is much longer by the way). But I guess I did something wrong in the init.py from Phoenix, which I can't solve by myself ... Phoenix crashes if kernel is started. Are you skilled enough to take a look at it?

Dia

BOARBEAR

member

Activity: 77

Merit: 10

Oh btw there are two different versions of 12.1 preview

The one I tried is this:
http://developer.amd.com/Downloads/OpenCL1.2-Static-Cplus-preview-drivers-Windows.exe

It has a newer openCL than the other 12.1 preview

gat3way

sr. member

Activity: 256

Merit: 250

Nope, preferred vector size does not always mean "best performance". Wider vectors mean more GPRs used and the more GPRs you use, the less wavefronts you can schedule on a CU, thus occupancy goes down. Also, on VLIW5 hardware, uint4 vectors are not optimal, it can happen that there are no 5 non-dependent instructions to fill the whole VLIW bundle. That depends on your code.

For example, with hash cracking you might end up with uint8 being much better than uint4 for kernels like the MD5 or NTLM one as ALUPacking goes up and the number of used GPRs is 10-20 at most. On the other hand, more complex algorithms like SHA512 are much better with uint2 or even a scalar implementation as the number of used GPRs greatly hampers the occupancy. With memory-intensive kernels like DES ones (thanks god bitcoin is not one), occupancy becomes even more important as more concurrency means memory access latencies are more easily "hidden".

Back to uint2, problem with it is that it's even worse at utilizing all the slots in the VLIW bundle and your ALUPacking just always sucks. However, generally speaking with most bitcoin kernels it's a tradeoff worth having as bad occupancy in that particular case is worse than bad ALUPacking.

uint3 should provide a better balance between those, but it was broken in pre-2.6 APP SDK releases. Now they fixed it and I am curious about results...

PS as of why suddenly uint4 started performing better, it could be either that they iimproved scheduling or that they have improved the backend compiler to pack instructions better with uint4 / worse with uint2. It could be actually both.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: BOARBEAR on December 22, 2011, 10:43:09 AM

For those who got less hash with 12.1 with cgminer

Try worksize 64 with vectors 4

That's interesting because each GPU will report what its "preferred vector size" is, and often it comes out to 4, yet despite that, virtually all GPUs had much better performance (with the older SDK) and 2 vectors. Perhaps they live up to their promise now?

BOARBEAR

member

Activity: 77

Merit: 10

For those who got less hash with 12.1 with cgminer

Try worksize 64 with vectors 4

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: -ck on December 21, 2011, 11:18:51 PM

Quote from: Diapolo on December 21, 2011, 03:43:09 PM

Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.

Just as a data point, cgminer does not do this. It does not check the whole buffer.

Hi Con,

That is correct, but CGMINER uses Phatk2, which seems currently to not work that well with SDK / runtime 2.6.
So it's again a trial and error to get the best tradeoff.

Dia

Edit: I found out that a Worksize of 64 with Phoenix works great now, this was much slower before 2.6.

bronan

hero member

Activity: 774

Merit: 500

Lazy Lurker Reads Alot

try with different settings cause when i use -v w256 i have a lot less performance
setting it to -v2 -w 128 should fit better on a 5850

Transisto

donator

Activity: 1731

Merit: 1008

I get 280 instead of 334mhs with 12.1 (5850 at 825mhz 300ram poclbm -v -w256)

That is not some small drop ... this is 20%, I was expecting ~5% and that it was worth it for the CPU bug.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: Diapolo on December 21, 2011, 03:43:09 PM

Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.

Just as a data point, cgminer does not do this. It does not check the whole buffer.

Fiyasko

legendary

Activity: 1428

Merit: 1001

Okey Dokey Lokey

Quote from: Diapolo on December 21, 2011, 05:32:21 PM

Quote from: ?? on ??

AKA your new kernal mixed with these drivers is totally fucking pointless for miners

... so just don't use it.

Dia

+1

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: ?? on ??

AKA your new kernal mixed with these drivers is totally fucking pointless for miners

... so just don't use it.

Dia

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Fiyasko on December 21, 2011, 01:20:33 PM

Quote from: Diapolo on December 21, 2011, 09:55:17 AM

You could try my kernel with SDK 2.6! On my 6950 I get a 30 MH/s increase in Phoenix 1.7 over current CGMINER 2.0.8. I guess some of Phateus optimisations don't work anymore, because AMD updated the OpenCL compiler to generate better performing code.

Dia

*COUGH* perhaps if someone posted "my kernal" We could learn what the fuck "my kernal" is

WELL!, Nice work, However... Simply switching to your kernal, Gives me back the Mhash i "lost" But at the same time, Gives me back the CPU load bug....
Not pleasant... Also noticed using GUIminer w/Phoenix -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128

Since the CPU pegs at 100%, When i attempt to run a 2nd GPU on the same 100% pegged core, I lose 20mhash on both cards. This does not happen with the regular kernal.
So if i want to run two of my gpu's using this Nice -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128 then i need to PEG two cores at 100%, HOTHOTHOT NO THANKS

Just thought "when was the last time i installed an SDK, Dont they come with the drivers?" *clickclickclick* Oh that might be my problem.... *clickclick* installind SDK APP 2.6

Just installed 2.6sdk, No differance.
Guiminer+Phoenix+Phatk=CPU peg 100%, But the Mhash "loss" is gone, So this is like rolling back
Guiminer+Poclbm=10mhash loss percard, But no CPU 100% bug

Sorry for not posting a direct link - https://bitcointalksearch.org/topic/further-improved-phatkdia-kernel-for-phoenix-sdk-26-2012-01-13-25860
Well I think the 100% CPU usage is not a fault in AMDs drivers but it comes from how I process the OpenCL buffer, which holds nonces.
To speed up kernel execution I removed control-flow (if-statements) from the kernel, but now check the whole buffer for valid nonces, even if there are none ... this leads (my guess) to an endless processing loop in Phoenix, which is the drawback of the kernel changes.

Dia

bronan

hero member

Activity: 774

Merit: 500

Lazy Lurker Reads Alot

well lol i see no cpu bug anymore with cgminer and i see hardly any loss of performance but i start cgminer simple on my 5870, 5970 computers like :

cgminer v 2 w 128 -I D --temp-cutoff 95 --auto-gpu --queue 1 --gpu-powertune 20 --gpu-engine 850-920 --gpu-memclock 300 -o http://192.168.178.5:9332 -u user -p pass

And it runs like charm when i set it to i7 it really keeps a steady average, but i need the machine for other things then mining as well Wink

Setting intensity at higher levels has a negative effect and i do not see any difference using them.
With I D it runs at 420 Mh with i6,i7 or i8 its all the same just when i use other programs i get some loss of performance as expected with -I D

Edit: After some testing they go off again ... untill BTC reached $ 30 again

Fiyasko

legendary

Activity: 1428

Merit: 1001

Okey Dokey Lokey

Quote from: Diapolo on December 21, 2011, 09:55:17 AM

You could try my kernel with SDK 2.6! On my 6950 I get a 30 MH/s increase in Phoenix 1.7 over current CGMINER 2.0.8. I guess some of Phateus optimisations don't work anymore, because AMD updated the OpenCL compiler to generate better performing code.

Dia

*COUGH* perhaps if someone posted "my kernal" We could learn what the fuck "my kernal" is

WELL!, Nice work, However... Simply switching to your kernal, Gives me back the Mhash i "lost" But at the same time, Gives me back the CPU load bug....
Not pleasant... Also noticed using GUIminer w/Phoenix -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128

Since the CPU pegs at 100%, When i attempt to run a 2nd GPU on the same 100% pegged core, I lose 20mhash on both cards. This does not happen with the regular kernal.
So if i want to run two of my gpu's using this Nice -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=128 then i need to PEG two cores at 100%, HOTHOTHOT NO THANKS

Just thought "when was the last time i installed an SDK, Dont they come with the drivers?" *clickclickclick* Oh that might be my problem.... *clickclick* installind SDK APP 2.6

Just installed 2.6sdk, No differance.
Guiminer+Phoenix+Phatk=CPU peg 100%, But the Mhash "loss" is gone, So this is like rolling back
Guiminer+Poclbm=10mhash loss percard, But no CPU 100% bug

Topic: *Catalyst 12.1 Preview* Decreased performance, anyone else confirm? - page 2. (Read 20782 times)

Topic: Catalyst 12.1 Preview Decreased performance, anyone else confirm? - page 2. (Read 20782 times)