Pages:
Author

Topic: Modified Kernel for Phoenix 1.5 - page 4. (Read 96713 times)

newbie
Activity: 31
Merit: 0
August 11, 2011, 10:26:33 AM
As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".
This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out.  Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.

I'm using a 6770 @ 1.01Ghz with phatk 2.2.  When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps.  However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz).  I've reduced the WORKSIZE from 256 to 128 and 64 and peak around 213Mhps;  with these options, I can only achieve between 204 and 213 Mhps.
hero member
Activity: 772
Merit: 500
August 11, 2011, 06:09:00 AM
Just did a test:

Rig setup:
  Linuxcoin v0.2b (Linux version 2.6.38-2-amd64)
  Dual HD5970 (4 GPU cores in the rig)
  Mem clock @ 300Mhz
  Core clock @ 800Mhz
  VCore @ 1.125v
  AMD SDK 2.5
  Phoenix r100
  Phatk v2.2
  -v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false

Result:
  Overall Rig rate: 1484 MH/s
  Rate per core: 371 MH/s

This is ~4MH/s faster than Diapolo's latest.

On 5970, phatk 2.2 is current king of the hill.

For the world to be perfect, this kernel needs to be integrated into cgminer Smiley



The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on.

I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel.

Dia
full member
Activity: 160
Merit: 100
August 11, 2011, 04:49:54 AM
in guiminer, i keep getting invalid buffer, unable to write to file, wonder why
legendary
Activity: 1344
Merit: 1004
August 10, 2011, 02:52:14 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  Wink

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken

As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then

The behavior is as if it's not doing 4 nonces, but only doing 1 (i.e. no VECTORS option specified). My compute speed remained the same regardless of memory speed, which is exactly like your V1 result on the graph on page 1.
full member
Activity: 154
Merit: 100
August 10, 2011, 11:43:21 AM
Hi! Just used v2.2 and it increased my hashrate by 3 Mhash compared to Diapolo's. From 402 > 405. Vectors4 seemed to drop the hashrate significantly on my 5850 by 50 Mhash. Great work to you guys and we are very grateful =).

I think the mods should create a Child Board under Mining support and name it "Mods" or Tweaks I guess and put this thread there.
newbie
Activity: 52
Merit: 0
August 10, 2011, 11:21:30 AM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  Wink

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken

As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then
member
Activity: 77
Merit: 10
August 10, 2011, 04:40:16 AM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  Wink

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken
hero member
Activity: 560
Merit: 517
August 09, 2011, 11:35:50 PM
Updated my poclbm branch to support phatk2.2 through the --phatk2_2 command line option:

https://github.com/progranism/poclbm

full member
Activity: 182
Merit: 100
August 09, 2011, 07:32:41 PM
Catalyst 11.4 / SDK 2.4
Ref 5850 @ 920c/320m

-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12

2.1: 399.27 to 399.63 Mh/s
2.2: 399.87 to 400.17 Mh/s



Damn those are some good hashrates for the core.

I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell.

Yea beats me =/  I haven't been able to get my second 5850 (new 230SA Sapphire 5850 Xtreme) to achieve the same results.  In fact, it seems to hate SDK 2.4.
full member
Activity: 226
Merit: 100
August 09, 2011, 06:44:27 PM
 Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track.

+1
newbie
Activity: 16
Merit: 0
August 09, 2011, 06:00:04 PM
 Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track.
newbie
Activity: 52
Merit: 0
August 09, 2011, 05:43:16 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  Wink

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
full member
Activity: 140
Merit: 100
August 09, 2011, 05:40:58 PM
VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE.

Ubuntu 10.10
Catalyst 11.3
SDK 2.4
6970 @ 940,1375
Phatk 2.2

Quote
315.5MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS4 FASTLOOP=false
414.2MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS4 FASTLOOP=false
321.1MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS4 FASTLOOP=false

422.8MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false
423.5MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS FASTLOOP=false
420.9MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false
newbie
Activity: 16
Merit: 0
August 09, 2011, 04:16:34 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?

full member
Activity: 219
Merit: 120
August 09, 2011, 03:57:30 PM
I found that VECTER4 option does not work for version 2.2



Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380.

This is probably because of the increased GPR usage of the VECTORS4 code. According to KernelAnalyzer VECTORS4 uses 2707 ALU OPS and 33 GPRs. This is compared with VECTORS which is 1355 ALU OPS and only 23 GPRs. Theoretically VECTORS4 would be faster, since it tests twice the number of nonces using 3 fewer ALU OPS than 2 executions of VECTORS. However, if the GPU runs out of GPRs then this limits the number of threads that can be running at once, which is what causes the performance drop.

(Above ALU OPS and GPR numbers are for Cypress, AKA 58xx)

VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE.

EDIT: Just looked at the 2.1 version and it uses even more GPRs with VECTORS4 than 2.2 does. (35 GPRs, 1358 ALU OPS) I'm not quite sure how it can be faster than 2.2.
newbie
Activity: 52
Merit: 0
August 09, 2011, 03:33:43 PM
I found that VECTER4 option does not work for version 2.2



I optimize the code for VECTORS, so probably making it faster in 2.2 made VECTORS4 slower.  I can't really optimize the kernel for both, so I would just stick with version 2.1 if that is faster for you.

And everyone, thanks for your support, every little bit helps Smiley
legendary
Activity: 1344
Merit: 1004
August 09, 2011, 12:40:16 PM
I found that VECTER4 option does not work for version 2.2



Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380.
member
Activity: 77
Merit: 10
August 09, 2011, 12:06:12 PM
I found that VECTER4 option does not work for version 2.2

full member
Activity: 140
Merit: 100
August 09, 2011, 05:30:53 AM
Ubuntu 10.10
Cata 11.3
SDK 2.4
6970x2 OC 940,1375

Phoenix-r112 (Diapolo 7-17 w/ Vals[7] patch) 422.8MH/s
Phatk-2.2 423.3MH/s

So up 0.5MH/s, sent you my profits for the week.
member
Activity: 77
Merit: 10
August 09, 2011, 02:06:47 AM
something wrong with kernel 2.2

i get 330 MHs using 2.2
410 MHs using kernel 2.1

card AMD 5870 clock at 900 Mhz

using 11.8 beta driver with SDK2.5
Pages:
Jump to: