Pages:
Author

Topic: Modified Kernel for Phoenix 1.5 - page 7. (Read 96713 times)

legendary
Activity: 1855
Merit: 1016
August 04, 2011, 10:33:08 AM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325
legendary
Activity: 1344
Merit: 1004
August 04, 2011, 08:45:53 AM
So far tests indicate the first "sweet spot" is ~220 MHz with VECTORS WORKSIZE=128. The next "sweet spot" (and fastest one yet) is ~370-380MHz with VECTORS WORKSIZE=256. Will keep you guys posted as I run through more combos.
hero member
Activity: 772
Merit: 500
August 04, 2011, 07:32:11 AM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia
legendary
Activity: 1344
Merit: 1004
August 04, 2011, 04:08:03 AM
Since I really liked the graph on the front page but thought it lacked granularity, I'm going to take a shot at making a graph too. I'll be doing tests on a 5830 instead of a 5870 though (My 5830 seems a LOT more stable when it comes to memory speeds compared to my 5870).
They'll be based on...
GUIMiner v2011-07-01
Built-in Phoenix miner
11.7 Catalyst
2.5 SDK
phatk 2.1 kernel with..
BFI_INT FASTLOOP=false AGGRESSION=14 and varying worksizes, memory speeds, and VECTORS vs VECTORS4.

Stay tuned Smiley

Edit: Here's a work in progress spreadsheet. It's updated as I test more combos (need to manually test and update spreadhseet manually).
https://spreadsheets.google.com/spreadsheet/ccc?key=0AjXdY6gpvmJ4dEo4OXhwdTlyeS1Vc1hDWV94akJHZFE&hl=en_US

I was planning to put in worksizes of 192, 96, and 48 too, but phatk 2.1 doesn't seem to support it. Less work for me though Tongue
legendary
Activity: 1512
Merit: 1036
August 04, 2011, 01:23:13 AM
I have bad news to report - phatk 2.1 sends bad shares.

On pool mining hardware that consistently gets <2% rejects (and those only are stales within 5 seconds of a new block), I have only changed the phatk kernel:

2956/190 = 6.0% rejected
1944/290 = 13.0% rejected
2656/116 = 4.2% rejected
2615/184 = 6.6% rejected

Here's a log from this new kernel showing the atypical random rejects:
(old links)

We can see on the result line that the hashes are bad, by not starting with 00000000:
[03/08/2011 22:17:48] Result c877f46db0d6ab44... rejected

These do not give an "OpenCL error, hardware problem?", or a "didn't meet minimum difficulty, not sending", they are sent and rejected.

For an improvement in hashrate of 1% (333.58->336.53 typical) over Diapolo's 07-17 kernel, I get a 5% increase in rejects. I will have to revert. This is on WinXP/5830/11.6/SDK2.4 running phoenix.py 1.50 unmodified source on Python 2.6.6/numpy-1.6.0/... Two miner instances per GPU.

Command line is:
python phoenix.py -v -u http://xxx/ -k phatk VECTORS AGGRESSION=13 BFI_INT WORKSIZE=256 PLATFORM=0 DEVICE=0
legendary
Activity: 1855
Merit: 1016
August 03, 2011, 11:57:05 PM
My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
LOL
sr. member
Activity: 476
Merit: 250
moOo
August 03, 2011, 11:34:42 PM
so i coppied
Code:
#define rotC(x,n) (x<> (32-n))

and pasted it in my kernel file and nothing blew up

didnt really get any speed increases but of course I am flying blind here so perhaps that wasnt the right thing to do but hey i did it anyways.

My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
newbie
Activity: 52
Merit: 0
August 03, 2011, 11:25:03 PM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2

Awesome, thanks for the info, I'll definitely try it out.
newbie
Activity: 52
Merit: 0
August 03, 2011, 11:24:32 PM
Alright, new version 2.2 is coming out in the next couple days.

As the front page says, 1354 ALU Ops for the 5xxx series vs. 1359 for 2.1

Changes I've made in 2.2 are:
  • added a rotC function for constant values since the compiler apparently does not know how to perform rotate() on constants
Code:
#define rotC(x,n) (x<> (32-n))
    [/li]
  • Small tweaking of the order of certain functions and other random things that shouldn't really have done anything >Shocked

I will add anything else I think of the next couple days... Also, keep the bug reports coming, so I know if I need to fix anything.


-Phateus
hero member
Activity: 658
Merit: 500
August 03, 2011, 10:57:33 PM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2
newbie
Activity: 52
Merit: 0
August 03, 2011, 01:25:18 PM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
legendary
Activity: 1708
Merit: 1020
August 03, 2011, 10:57:03 AM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.
member
Activity: 67
Merit: 10
August 03, 2011, 10:51:42 AM
Awesome!
V. 2.1 kernel works flawlessly on my Linuxcoin 2.0 rigs (SDK 2.4 + 11.5 catalyst, HD5850/5830) generally + 3-4MH/s across the board compared to Diapolo 17-07
Excellent job!  Grin
newbie
Activity: 45
Merit: 0
August 02, 2011, 11:44:24 PM
my 6950 took a hit from 390 to 356 Mh/s

however all my 6870's all got a 7 mhs bump!

5830 up by 7 also to 327.9 mhs

Thanks!
hero member
Activity: 560
Merit: 517
August 02, 2011, 09:07:11 PM
Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease Smiley
Sure thing. All updated. Added --phatk2_1 option, and --vectors4 (which can only be used in combo with phatk2_1).

https://github.com/progranism/poclbm

Let me know how it works. I tested it on my 5850s. I tested with no vectors, vectors, and vectors4 and they all seemed to work.

For my own sake, I also added a special feature where you can use "-e -1" to force the hashing estimation algorithm to estimate hashing speed over the entire run-time of the miner, and include both accepted and rejected shares. I'm using it to check that the code is actually hashing at the reported rate; no duplicate nonces or other bugs.
member
Activity: 97
Merit: 10
August 02, 2011, 08:49:08 PM
Thanks, getting +1-3% on my cards... better improvement on 5800 series than 6900 series.
newbie
Activity: 52
Merit: 0
August 02, 2011, 04:26:38 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia

yes, it is declared as u (it was uint2 in 2.0, but have made it variable for efficiency)

Code:
#ifdef VECTORS4
typedef uint4 u;
#else
#ifdef VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif
#endif

u is uint2 when VECTORS is declared

Bah, I know all of this scattered code is confusing
hero member
Activity: 772
Merit: 500
August 02, 2011, 02:20:40 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia
legendary
Activity: 1344
Merit: 1004
August 02, 2011, 02:15:09 PM
in case some feedback was wanted for VECTORS4, I got about 20 mhash improvment on my 5870 when I have it set to stock speeds (850/1200) when using computer normally (360 -> 380 mhash)
I will continue to use VECTORS when I am AFK (1015/355) for 470.1 mhash.
sr. member
Activity: 476
Merit: 250
moOo
August 02, 2011, 12:42:57 PM
Quote
Woooo!, found the bug... it is in my kernel...


you rock sir phateus... all is working here and nice speed up.. especially over the stock phatk 1.0

but yeha faster than diablo 7-17 for me.. on a 5830 sdk 2.4 11.6 cat guiminer..
Pages:
Jump to: