further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 11.

Bert

full member

Activity: 126

Merit: 100

Quote from: conspirosphere.tk on July 07, 2011, 11:19:36 AM

... snip ...
BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.

I add "-a 50" to average the Mhash/sec over 50 samples, this overrides the default value of 10 and smooths out the jumps, but it is slower to converge to the real hash rate. So the jumps are 5 times smaller.

$ ./phoenix.py --help
Usage: phoenix.py -u URL [-k kernel] [kernel params]

Options:
  -h, --help show this help message and exit
  -v, --verbose show debug messages
  -k KERNEL, --kernel=KERNEL
   the name of the kernel to use
  -u URL, --url=URL the URL of the mining server to work for [REQUIRED]
  -q QUEUESIZE, --queuesize=QUEUESIZE
   how many work units to keep queued at all times
-a AVGSAMPLES, --avgsamples=AVGSAMPLES
   how many samples to use for hashrate average
$

gominoa

newbie

Activity: 17

Merit: 0

New version 2011-07-07 works on SDK 2.1 w/ VECTORS.

Thanks

conspirosphere.tk

legendary

Activity: 2352

Merit: 1064

Bitcoin is antisemitic

This cause the immediate crash and closing of Phoenix miner 1.50 for me, so I'm reverting to your previous patch.
Donation sent.

update: it was my -f flag. Without it, it now works.

BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.

Diapolo

hero member

Activity: 772

Merit: 500

New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM

.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: SeriousWorm on July 07, 2011, 05:32:27 AM

Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.

Latest kernel seems to be sensitive to higher Mem clock, thanks for verifying.

Dia

SeriousWorm

newbie

Activity: 54

Merit: 0

Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: gominoa on July 07, 2011, 02:16:09 AM

This doesnt compile when VECTORS is defined.

Quote

Build on :

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[19] = P4(19) + 0x11002000 + P1(19);
^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.

I'm looking into this, it seems to only happen for SDK 2.1!
In the other thread, we try to nail it down ... if I find a solution to this a fixed version will be upped.
If you have no problem with a bit fiddling in the code, you can try to change a few lines.

Code:

W[19] = P4(19) + (u)0x11002000 + P1(19);

W[30] = P3(30) + (u)0xA00055 + P1(30);

Vals[3] = (u)L + W[64];

W[81] = P4(81) + P2(81) + (u)0xA00000;

W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);

W[94] = P3(94) + (u)0x400022 + P1(94);

Dia

Vrekk

newbie

Activity: 3

Merit: 0

Got an increase from 425 to 435 :-) Thanks a bunch!! Sent a little something something your way/

gominoa

newbie

Activity: 17

Merit: 0

This doesnt compile when VECTORS is defined.

Quote

Build on :

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
unless up-convertable(scalar-type=>vector-element-type)
W[19] = P4(19) + 0x11002000 + P1(19);
^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: zmcgrew on July 06, 2011, 07:54:13 PM

Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.

Could you raise your Mem clock to ~350 MHz and report back. What about Worksize of 256, for 5830 cards this helps a lot.

Dia

swivel

newbie

Activity: 17

Merit: 0

Nice work! Plugged in the 2011-07-06 kernel to phoenix and saw my 5850 jump from 348 Mhash/s to 354 Mhash/s.

Debian sid 64-bit
Catalyst 11.6 and AMD APP 2.4 SDK
phoenix 1.50 with VECTORS BFI_INT WORKSIZE=256 AGGRESSION=12
XFX 5850 BE 860 core 300 memory stock voltage fan speed at 55% temp at 61C

zmcgrew

newbie

Activity: 4

Merit: 0

Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.

conspirosphere.tk

legendary

Activity: 2352

Merit: 1064

Bitcoin is antisemitic

I don't have a benchmark, but according phoenix miner I passed from about 160Mhs to 180+ using your patch with my 5770.
Good work!
Many thanks

Diapolo

hero member

Activity: 772

Merit: 500

I really like this "let's do better"-game

. But for now I say good n8!

Dia

1MLyg5WVFSMifFjkrZiyGW2nw

newbie

Activity: 28

Merit: 0

Quote from: Diapolo on July 06, 2011, 02:30:15 PM

Thank YOU

another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).

Dia

Yes I haven't changed those defines yet, mostly added my own intermediate functions:

Code:

// Ma can also be implemented in terms of Ch...
u Ma(u x, u y, u z) { return Ch(z^x, y, x); }

// Various intermediate calculations for each SHA round

u xrot2(u n, const uint r1, const uint r2) {
        return rot(n, r1) ^ rot(n, r2);
}

u xrot3(u n, const uint r1, const uint r2, const uint r3) {
        return xrot2(n, r1, r2) ^ rot(n, r3);
}

u xrrs(u n, const uint r1, const uint r2, const uint r3) {
        return xrot2(n, r1, r2) ^ (n >> r3);
}

#define s0(n) xrot3(Vals[(128-n) % 8], 30, 19, 10)
#define s1(n) xrot3(Vals[(132-n) % 8], 26, 21, 7)
#define ch(n) Ch(Vals[(132 - n) % 8], Vals[(133 - n) % 8], Vals[(134 - n) % 8])
#define ma(n) Ma(Vals[(129 - n) % 8], Vals[(130 - n) % 8], Vals[(128 - n) % 8])
#define t1(n) (K[n % 64] + Vals[(135 - n) % 8] + W[n] + s1(n) + ch(n))

// intermediate W calculations
#define P1(x) xrrs(W[x - 2], 15, 13, 10)
#define P2(x) xrrs(W[x - 15], 25, 14, 3)

Since there is no noticeable drop in hashrate, I assume the compiler is inlining these functions.

Also, you can eliminate one extra assignment to Vals[4]:

Code:

//Vals[4] = PreVal4;
//...
#ifdef VECTORS.
        Vals[4] = (W[3] = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W[3] = base + get_global_id(0)) + PreVal4;
#endif
//...
//Vals[4] += W[3];

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: 1MLyg5WVFSMifFjkrZiyGW2nw on July 06, 2011, 12:23:55 PM

Thanks, best version yet Cheesy

Still not reached the 40 MHash/sec the wiki says my card could do Huh

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Also, OpenCL has a builtin Ch function, not faster for me but maybe for someone else:
#define Ch(x, y, z) bitselect(z, y, x)

Thank YOU

another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).

Dia

1MLyg5WVFSMifFjkrZiyGW2nw

newbie

Activity: 28

Merit: 0

Quote from: Diapolo on July 06, 2011, 08:08:32 AM

New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate Cheesy

.

Thanks,
Dia

Thanks, best version yet Cheesy

Still not reached the 40 MHash/sec the wiki says my card could do Huh

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Also, OpenCL has a builtin Ch function, not faster for me but maybe for someone else:
#define Ch(x, y, z) bitselect(z, y, x)

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: dewon on July 06, 2011, 10:20:00 AM

Went from 322 to 329 with hd 5830

Great, seems liket HD5830 scales really well with my mod

.

Dia

dewon

newbie

Activity: 55

Merit: 0

Went from 322 to 329 with hd 5830

Diapolo

hero member

Activity: 772

Merit: 500

New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate Cheesy

.

Thanks,
Dia

Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13 - page 11. (Read 51278 times)