Thank YOU
another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).
Dia
Yes I haven't changed those defines yet, mostly added my own intermediate functions:
// Ma can also be implemented in terms of Ch...
u Ma(u x, u y, u z) { return Ch(z^x, y, x); }
// Various intermediate calculations for each SHA round
u xrot2(u n, const uint r1, const uint r2) {
return rot(n, r1) ^ rot(n, r2);
}
u xrot3(u n, const uint r1, const uint r2, const uint r3) {
return xrot2(n, r1, r2) ^ rot(n, r3);
}
u xrrs(u n, const uint r1, const uint r2, const uint r3) {
return xrot2(n, r1, r2) ^ (n >> r3);
}
#define s0(n) xrot3(Vals[(128-n) % 8], 30, 19, 10)
#define s1(n) xrot3(Vals[(132-n) % 8], 26, 21, 7)
#define ch(n) Ch(Vals[(132 - n) % 8], Vals[(133 - n) % 8], Vals[(134 - n) % 8])
#define ma(n) Ma(Vals[(129 - n) % 8], Vals[(130 - n) % 8], Vals[(128 - n) % 8])
#define t1(n) (K[n % 64] + Vals[(135 - n) % 8] + W[n] + s1(n) + ch(n))
// intermediate W calculations
#define P1(x) xrrs(W[x - 2], 15, 13, 10)
#define P2(x) xrrs(W[x - 15], 25, 14, 3)
Since there is no noticeable drop in hashrate, I assume the compiler is inlining these functions.
Also, you can eliminate one extra assignment to Vals[4]:
//Vals[4] = PreVal4;
//...
#ifdef VECTORS.
Vals[4] = (W[3] = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
Vals[4] = (W[3] = base + get_global_id(0)) + PreVal4;
#endif
//...
//Vals[4] += W[3];