Pages:
Author

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 (Read 51267 times)

hero member
Activity: 772
Merit: 500
Just a hint, this kernel is for SDK 2.6+, it doesn't work well with earlier versions / runtimes!

Dia
legendary
Activity: 1512
Merit: 1036

its cause of the aggresion at 16, when I put it on 15 it shows again but the miner is idle a lot according to the log and the khash/s are at 0 however its still mining and getting hashes accepted.
with 14 its working fine but slightly lower mhash/s then with the older one kernel at aggression 17. (417 compared to 418 mhash/s.)
This is because above 12 or 13 is insane aggression - the miner hogs the GPU so much that the Windows GUI can't even draw stuff like the status line on the screen.
It was fine with the older kernel.
Does that mean anything?
I think this kernel is slightly faster then the older one.
It is possible that the previous kernel and miner parameters didn't work the GPU so hard; as kernels are optimized, they use more of the GPU resources available, approaching 100%, leaving less chance that OS draw instructions will make it through in a timely fashion.
legendary
Activity: 1512
Merit: 1036

its cause of the aggresion at 16, when I put it on 15 it shows again but the miner is idle a lot according to the log and the khash/s are at 0 however its still mining and getting hashes accepted.
with 14 its working fine but slightly lower mhash/s then with the older one kernel at aggression 17. (417 compared to 418 mhash/s.)
This is because above 12 or 13 is insane aggression - the miner hogs the GPU so much that the Windows GUI can't even draw stuff like the status line on the screen.
hero member
Activity: 772
Merit: 500
The part with the stats you mention is blinking for me too, this seems to happen with high aggression and when it updates to new values.
You should try the latest version with the latest Phoenix 1.7.3 and it would be great to get some speed reports in here.

I have to say I'm a bit disappointed at least with the feedback to this release, not to mention simply nothing is coming in ... even if this version is NOT faster for some, it took many hours to do it and it's not satisfying that way. For me the current version IS faster than phatk2 with 6550D and the difference is huge, I don't understand why this seems to be not the case for any other user here. Guys please use 12.1a with Phoenix 1.7.3 and the settings mentioned on page 1 in this thread. If you complain that it's not faster only do this with some system infos like SDK, OS, driver, card, Phoenix version and used command switches, thank you!

Edit: You can also post the contents of the Phoenix window here.
Code:
[14/01/2012 14:43:18] using PyOpenCL version 0.92
[14/01/2012 14:43:18] checked nonces per kernel execution: 67108864
[14/01/2012 14:43:18] using VECTORS2, resulting global worksize is: 33554432
[14/01/2012 14:43:18] using local worksize of 128 (HW max. is 256)
[14/01/2012 14:43:18] cl_amd_media_ops ext. found - BFI_INT enabled

[14/01/2012 14:43:19] Finding inner ELF...
[14/01/2012 14:43:19] Patching inner ELF...
[14/01/2012 14:43:19] Patching instructions...
[14/01/2012 14:43:19] BFI-patched 472 instructions...
[14/01/2012 14:43:19] Patch complete, returning to kernel...
[14/01/2012 14:43:19] Applied BFI_INT patch
[14/01/2012 14:43:19] Phoenix v1.7.3 starting...
[14/01/2012 14:43:19] Connected to server
[14/01/2012 14:43:19] Server gave new work; passing to WorkQueue
[14/01/2012 14:43:19] New block (WorkQueue)
[14/01/2012 14:43:21] Server gave new work; passing to WorkQueue
[66.19 Mhash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]

Dia :-/
hero member
Activity: 772
Merit: 500
Uploaded a fixed version, which corrects an error with FASTLOOP=True:
Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg

There are no other changes in this version!

Dia
I no longer see the mhash/s. also I don't see the shares and other statistics.(basically the entire button line)

I do see the log which says at which time a share was accepted and that says other things.

That's not a helpful bug report ... sorry. What OS? Can you paste an output of the Phoenix window.
What's your command line? Was this introduced with the FASTLOOP fix or before?

Dia
hero member
Activity: 772
Merit: 500
Uploaded a fixed version, which corrects an error with FASTLOOP=True:
Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg

There are no other changes in this version!

Dia
hero member
Activity: 772
Merit: 500
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

Do you provide a git repository of this fork?

Sorry, but no ... I use Beyond Compare 3 for editting / comparing the files and manage the rest in my brain Wink. Guess that's kind of old school, but it works.
To be honest, I don't even know how to use a git repository.

Dia
newbie
Activity: 3
Merit: 0
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

Do you provide a git repository of this fork?
hero member
Activity: 772
Merit: 500
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

highlights:
- the child has it's name, I call it phatk_dia - would be nice if you guys use this in discussions to be clear what your kernel is Wink
- faster on VLIW5 GPUs with VECTORS2 and VECTORS4
- more efficient on VLIW4 GPUs with VECTORS2 and a little faster with VECTORS4
- FASTLOOP defaults to false, so you don't need to supply FASTLOOP=false
- added an extended check for supplied WORKSIZE parameter
- removed a pyOpenCL finish() to reduce API overhead (could cause problems, but works here -> consider this beta till it proves stable)

Please report and give me all your coins :-D!

Edit: Please don't complain if this doesn't work good for non 2.6 SDK / Runtime versions, because this IS for 2.6 or later!

Dia
hero member
Activity: 772
Merit: 500
Any news about 7970 support as that card is quite good at 666 mhash/s and $550 right now ?

cheers !

Currently I have none and the AMD KernelAnalyzer seems to currently not support GCN architecture, so it's hard to do any optimizsations for it. But I would be interested in results with 7970 and my kernel (new kernel is on it's way to release). AMD sais, that massive vectorisation would not be needed for optimal performance with GCN, so perhaps it would run well without the use of a VECTORSn parameter.

Dia
newbie
Activity: 41
Merit: 0
Any news about 7970 support as that card is quite good at 666 mhash/s and $550 right now ?

cheers !
legendary
Activity: 916
Merit: 1003
Hmm I made some calculations and I saw that the Mhash/s is incorrect.
it indicates I get around 418Mh/s
my pool indicates I get 403 Mh/s

7610 9-1-2012 17:18

7879  9-1-2012 18:01

time=2580 sec
shares=269

269 x 2^32 =1 155 346 202 624 / 2580 =447 808 606
447 808 606 / 1 000 000 = 447,808606 Mh/s

or am I doing something wrong here, why are they all reporting different speeds Huh

I've noticed this myself.  Guiminer reports I'm running a pretty solid 185 Mhash/sec but my speed as reported by deepbit is all over the place.
I've seen it as high as 260 and as low as 50.
hero member
Activity: 772
Merit: 500
Thanks for your infos.

The higher the AGRESSION, the more Desktop lag you observe, that's normal.

Dia
No problem, so aggression doesn't affect the speed of shares?

btw I'm having some connection issues every now and then duo to my isp and sometimes I get a warning that my work queue is empty for a few seconds although this only happens once a hour or so.
Could I prevent this from happening by using -q 2 or -q 5, or is this not a good idea?

A higher agression can lead to higher MH/s, while then having more desktop lag.
For your ISP stuff, yes you could try -q 2 or specify a backup pool to Phoenix via -b, which needs the same format as -u.

Dia
hero member
Activity: 772
Merit: 500
Thanks for your infos.

The higher the AGRESSION, the more Desktop lag you observe, that's normal.

Dia
hero member
Activity: 772
Merit: 500
Hi, I'm new to bitcoin mining and I'm not sure if I'm posting this in the correct forum (I can't post in the phoenix thread), I've used guiminer for a few months now and decided to give phoenix a try.
And its great, phoenix is slightly faster and has a stable hashrate.
However I noticed that phatk2 is much slower(50-60Mhashes/s) then pathk, and I believe this optimized kernel is supposed to replace the original pathk2 kernel, because when I overwrite the original pathk kernel with this one my hash rate is almost the same as with pathk2.

Am I using incorrect settings? I have an unlocked 6950 card with 910 mhz core and 1440 mhz memory.

The settings I use are: -k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

or if I want to run on phatk2:  -k phatk2 DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

I will only comment on my kernel here, which has to be used with:
Code:
-k phatk DEVICE=0 VECTORS2 AGGRESSION=11 WORKSIZE=128 FASTLOOP=false

The normal Phoenix kernel, which is in the default Phoenix download package doesn't have VECTORS2 and needs BFI_INT switch supplied (I activate this by myself if cl_amd_media_ops extension is available).

Dia
Thanks,I replaced the normal pathk kernel with yours, the performance is now the same in Mh/s although shares seem to come slightly quicker.
Hmm with a worksize of 64 I get slightly better performance(0.5 Mh/s more).
I'm now at ~408 MH/s drops to 407.91 sometimes, and with aggression on 12 I get 408+ Mh/s
Fine tuned aggression to 16 and I get 409+ Mh/s nearly 410.

What's your setup? Driver, OS, card?
hero member
Activity: 772
Merit: 500
Hi, I'm new to bitcoin mining and I'm not sure if I'm posting this in the correct forum (I can't post in the phoenix thread), I've used guiminer for a few months now and decided to give phoenix a try.
And its great, phoenix is slightly faster and has a stable hashrate.
However I noticed that phatk2 is much slower(50-60Mhashes/s) then pathk, and I believe this optimized kernel is supposed to replace the original pathk2 kernel, because when I overwrite the original pathk kernel with this one my hash rate is almost the same as with pathk2.

Am I using incorrect settings? I have an unlocked 6950 card with 910 mhz core and 1440 mhz memory.

The settings I use are: -k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

or if I want to run on phatk2:  -k phatk2 DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

I will only comment on my kernel here, which has to be used with:
Code:
-k phatk DEVICE=0 VECTORS2 AGGRESSION=11 WORKSIZE=128 FASTLOOP=false

The normal Phoenix kernel, which is in the default Phoenix download package doesn't have VECTORS2 and needs BFI_INT switch supplied (I activate this by myself if cl_amd_media_ops extension is available).

Dia
newbie
Activity: 26
Merit: 0
I also noticed that if the pools used the 7th 32 bit component of the hash ("g") rather than the first 32 bit ("a") for computing shares you could stop after the 61st round of computation rather than the 63rd.  That would be maybe a 2% efficiency for pools that implemented it.
hero member
Activity: 772
Merit: 500
Hi, I noticed a potential improvement

you can replace
Code:
W(121);
sharoundW(121);
W(122);
sharoundW(122);
W(123);
sharoundW(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];
with
Code:
W(121);
Vals[2] += t1W(121);
W(122);
Vals[1] += t1W(122);
W(123);
                Vals[0] += t1W(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];

Because you don't need Vals[4],Vals[5], and Vals[6] to compute the final Vals[7]

I'll have to look into this, during the first test all performance relevant numbers were identical to my latest kernel. But perhaps reordering of the operations will help.
Thanks for your input Smiley!

Dia
newbie
Activity: 26
Merit: 0
Hi, I noticed a potential improvement

you can replace
Code:
W(121);
sharoundW(121);
W(122);
sharoundW(122);
W(123);
sharoundW(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];
with
Code:
W(121);
Vals[2] += t1W(121);
W(122);
Vals[1] += t1W(122);
W(123);
                Vals[0] += t1W(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];

Because you don't need Vals[4],Vals[5], and Vals[6] to compute the final Vals[7]
hero member
Activity: 772
Merit: 500
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Edit: Guys, try a setting of 64 for the WORKSIZE, it showed good results for me, but still depends on the card!

Dia
Pages:
Jump to: