Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 8. (Read 61242 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)

Haha, you ain't seen impressive yet! Check the thread, I'm about to post again!

OMG, this means a lot less reading and TV this week for me LoL!
legendary
Activity: 2716
Merit: 1094
Black Belt Developer

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)
newbie
Activity: 36
Merit: 0
Hello pallas and thank you for your kernel.
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series. But it is not adapted for sgminer and I have no skills to do this job.

Code:
#define CONSTANT __constant
#define LOCAL __local
#define GLOBAL __global
#define RESTRICT restrict
#define GLOBALID (uint)(get_global_id(0))
#define LOCALID get_local_id(0)

#define EXT_BYTE32_0(n) ((uint)(as_uchar4((uint)(n)).x))
#define EXT_BYTE32_1(n) ((uint)(as_uchar4((uint)(n)).y))
#define EXT_BYTE32_2(n) ((uint)(as_uchar4((uint)(n)).z))
#define EXT_BYTE32_3(n) ((uint)(as_uchar4((uint)(n)).w))

#define groestl_EXT_BYTE_0(n) EXT_BYTE32_0(n)
#define groestl_EXT_BYTE_1(n) EXT_BYTE32_1(n)
#define groestl_EXT_BYTE_2(n) EXT_BYTE32_2(n)
#define groestl_EXT_BYTE_3(n) EXT_BYTE32_3(n)


#define groestl_PMIX(src, dst, r)\
src[ 0] ^= (r);\
src[ 2] ^= 0x00000010u^(r);\
src[ 4] ^= 0x00000020u^(r);\
src[ 6] ^= 0x00000030u^(r);\
src[ 8] ^= 0x00000040u^(r);\
src[10] ^= 0x00000050u^(r);\
src[12] ^= 0x00000060u^(r);\
src[14] ^= 0x00000070u^(r);\
src[16] ^= 0x00000080u^(r);\
src[18] ^= 0x00000090u^(r);\
src[20] ^= 0x000000a0u^(r);\
src[22] ^= 0x000000b0u^(r);\
src[24] ^= 0x000000c0u^(r);\
src[26] ^= 0x000000d0u^(r);\
src[28] ^= 0x000000e0u^(r);\
src[30] ^= 0x000000f0u^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];

#define groestl_QMIX(src, dst, r)\
src[ 0] = ~src[ 0];\
src[ 1] ^= ~(r);\
src[ 2] = ~src[ 2];\
src[ 3] ^= 0xefffffffu^(r);\
src[ 4] = ~src[ 4];\
src[ 5] ^= 0xdfffffffu^(r);\
src[ 6] = ~src[ 6];\
src[ 7] ^= 0xcfffffffu^(r);\
src[ 8] = ~src[ 8];\
src[ 9] ^= 0xbfffffffu^(r);\
src[10] = ~src[10];\
src[11] ^= 0xafffffffu^(r);\
src[12] = ~src[12];\
src[13] ^= 0x9fffffffu^(r);\
src[14] = ~src[14];\
src[15] ^= 0x8fffffffu^(r);\
src[16] = ~src[16];\
src[17] ^= 0x7fffffffu^(r);\
src[18] = ~src[18];\
src[19] ^= 0x6fffffffu^(r);\
src[20] = ~src[20];\
src[21] ^= 0x5fffffffu^(r);\
src[22] = ~src[22];\
src[23] ^= 0x4fffffffu^(r);\
src[24] = ~src[24];\
src[25] ^= 0x3fffffffu^(r);\
src[26] = ~src[26];\
src[27] ^= 0x2fffffffu^(r);\
src[28] = ~src[28];\
src[29] ^= 0x1fffffffu^(r);\
src[30] = ~src[30];\
src[31] ^= 0x0fffffffu^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];

// global
const CONSTANT UINT32 groestl_T_init[256*8] =
{
0xa5f432c6UL,0x84976ff8UL,0x99b05eeeUL,0x8d8c7af6UL,0x0d17e8ffUL,0xbddc0ad6UL,0xb1c816deUL,0x54fc6d91UL,0x50f09060UL,0x03050702UL,0xa9e02eceUL,0x7d87d156UL,0x192bcce7UL,0x62a613b5UL,0xe6317c4dUL,0x9ab559ecUL,0x45cf408fUL,0x9dbca31fUL,0x40c04989UL,0x879268faUL,0x153fd0efUL,0xeb2694b2UL,0xc940ce8eUL,0x0b1de6fbUL,0xec2f6e41UL,0x67a91ab3UL,0xfd1c435fUL,0xea256045UL,0xbfdaf923UL,0xf7025153UL,0x96a145e4UL,0x5bed769bUL,0xc25d2875UL,0x1c24c5e1UL,0xaee9d43dUL,0x6abef24cUL,0x5aee826cUL,0x41c3bd7eUL,0x0206f3f5UL,0x4fd15283UL,0x5ce48c68UL,0xf4075651UL,0x345c8dd1UL,0x0818e1f9UL,0x93ae4ce2UL,0x73953eabUL,0x53f59762UL,0x3f416b2aUL,0x0c141c08UL,0x52f66395UL,0x65afe946UL,0x5ee27f9dUL,0x28784830UL,0xa1f8cf37UL,0x0f111b0aUL,0xb5c4eb2fUL,0x091b150eUL,0x365a7e24UL,0x9bb6ad1bUL,0x3d4798dfUL,0x266aa7cdUL,0x69bbf54eUL,0xcd4c337fUL,0x9fba50eaUL,0x1b2d3f12UL,0x9eb9a41dUL,0x749cc458UL,0x2e724634UL,0x2d774136UL,0xb2cd11dcUL,0xee299db4UL,0xfb164d5bUL,0xf601a5a4UL,0x4dd7a176UL,0x61a314b7UL,0xce49347dUL,0x7b8ddf52UL,0x3e429fddUL,0x7193cd5eUL,0x97a2b113UL,0xf504a2a6UL,0x68b801b9UL,0x00000000UL,0x2c74b5c1UL,0x60a0e040UL,0x1f21c2e3UL,0xc8433a79UL,0xed2c9ab6UL,0xbed90dd4UL,0x46ca478dUL,0xd9701767UL,0x4bddaf72UL,0xde79ed94UL,0xd467ff98UL,0xe82393b0UL,0x4ade5b85UL,0x6bbd06bbUL,0x2a7ebbc5UL,0xe5347b4fUL,0x163ad7edUL,0xc554d286UL,0xd762f89aUL,0x55ff9966UL,0x94a7b611UL,0xcf4ac08aUL,0x1030d9e9UL,0x060a0e04UL,0x819866feUL,0xf00baba0UL,0x44ccb478UL,0xbad5f025UL,0xe33e754bUL,0xf30eaca2UL,0xfe19445dUL,0xc05bdb80UL,0x8a858005UL,0xadecd33fUL,0xbcdffe21UL,0x48d8a870UL,0x040cfdf1UL,0xdf7a1963UL,0xc1582f77UL,0x759f30afUL,0x63a5e742UL,0x30507020UL,0x1a2ecbe5UL,0x0e12effdUL,0x6db708bfUL,0x4cd45581UL,0x143c2418UL,0x355f7926UL,0x2f71b2c3UL,0xe13886beUL,0xa2fdc835UL,0xcc4fc788UL,0x394b652eUL,0x57f96a93UL,0xf20d5855UL,0x829d61fcUL,0x47c9b37aUL,0xacef27c8UL,0xe73288baUL,0x2b7d4f32UL,0x95a442e6UL,0xa0fb3bc0UL,0x98b3aa19UL,0xd168f69eUL,0x7f8122a3UL,0x66aaee44UL,0x7e82d654UL,0xabe6dd3bUL,0x839e950bUL,0xca45c98cUL,0x297bbcc7UL,0xd36e056bUL,0x3c446c28UL,0x798b2ca7UL,0xe23d81bcUL,0x1d273116UL,0x769a37adUL,0x3b4d96dbUL,0x56fa9e64UL,0x4ed2a674UL,0x1e223614UL,0xdb76e492UL,0x0a1e120cUL,0x6cb4fc48UL,0xe4378fb8UL,0x5de7789fUL,0x6eb20fbdUL,0xef2a6943UL,0xa6f135c4UL,0xa8e3da39UL,0xa4f7c631UL,0x37598ad3UL,0x8b8674f2UL,0x325683d5UL,0x43c54e8bUL,0x59eb856eUL,0xb7c218daUL,0x8c8f8e01UL,0x64ac1db1UL,0xd26df19cUL,0xe03b7249UL,0xb4c71fd8UL,0xfa15b9acUL,0x0709faf3UL,0x256fa0cfUL,0xafea20caUL,0x8e897df4UL,0xe9206747UL,0x18283810UL,0xd5640b6fUL,0x888373f0UL,0x6fb1fb4aUL,0x7296ca5cUL,0x246c5438UL,0xf1085f57UL,0xc7522173UL,0x51f36497UL,0x2365aecbUL,0x7c8425a1UL,0x9cbf57e8UL,0x21635d3eUL,0xdd7cea96UL,0xdc7f1e61UL,0x86919c0dUL,0x85949b0fUL,0x90ab4be0UL,0x42c6ba7cUL,0xc4572671UL,0xaae529ccUL,0xd873e390UL,0x050f0906UL,0x0103f4f7UL,0x12362a1cUL,0xa3fe3cc2UL,0x5fe18b6aUL,0xf910beaeUL,0xd06b0269UL,0x91a8bf17UL,0x58e87199UL,0x2769533aUL,0xb9d0f727UL,0x384891d9UL,0x1335deebUL,0xb3cee52bUL,0x33557722UL,0xbbd604d2UL,0x709039a9UL,0x89808707UL,0xa7f2c133UL,0xb6c1ec2dUL,0x22665a3cUL,0x92adb815UL,0x2060a9c9UL,0x49db5c87UL,0xff1ab0aaUL,0x7888d850UL,0x7a8e2ba5UL,0x8f8a8903UL,0xf8134a59UL,0x809b9209UL,0x1739231aUL,0xda751065UL,0x315384d7UL,0xc651d584UL,0xb8d303d0UL,0xc35edc82UL,0xb0cbe229UL,0x7799c35aUL,0x11332d1eUL,0xcb463d7bUL,0xfc1fb7a8UL,0xd6610c6dUL,0x3a4e622cUL,
0xf432c6c6UL,0x976ff8f8UL,0xb05eeeeeUL,0x8c7af6f6UL,0x17e8ffffUL,0xdc0ad6d6UL,0xc816dedeUL,0xfc6d9191UL,0xf0906060UL,0x05070202UL,0xe02ececeUL,0x87d15656UL,0x2bcce7e7UL,0xa613b5b5UL,0x317c4d4dUL,0xb559ececUL,0xcf408f8fUL,0xbca31f1fUL,0xc0498989UL,0x9268fafaUL,0x3fd0efefUL,0x2694b2b2UL,0x40ce8e8eUL,0x1de6fbfbUL,0x2f6e4141UL,0xa91ab3b3UL,0x1c435f5fUL,0x25604545UL,0xdaf92323UL,0x02515353UL,0xa145e4e4UL,0xed769b9bUL,0x5d287575UL,0x24c5e1e1UL,0xe9d43d3dUL,0xbef24c4cUL,0xee826c6cUL,0xc3bd7e7eUL,0x06f3f5f5UL,0xd1528383UL,0xe48c6868UL,0x07565151UL,0x5c8dd1d1UL,0x18e1f9f9UL,0xae4ce2e2UL,0x953eababUL,0xf5976262UL,0x416b2a2aUL,0x141c0808UL,0xf6639595UL,0xafe94646UL,0xe27f9d9dUL,0x78483030UL,0xf8cf3737UL,0x111b0a0aUL,0xc4eb2f2fUL,0x1b150e0eUL,0x5a7e2424UL,0xb6ad1b1bUL,0x4798dfdfUL,0x6aa7cdcdUL,0xbbf54e4eUL,0x4c337f7fUL,0xba50eaeaUL,0x2d3f1212UL,0xb9a41d1dUL,0x9cc45858UL,0x72463434UL,0x77413636UL,0xcd11dcdcUL,0x299db4b4UL,0x164d5b5bUL,0x01a5a4a4UL,0xd7a17676UL,0xa314b7b7UL,0x49347d7dUL,0x8ddf5252UL,0x429fddddUL,0x93cd5e5eUL,0xa2b11313UL,0x04a2a6a6UL,0xb801b9b9UL,0x00000000UL,0x74b5c1c1UL,0xa0e04040UL,0x21c2e3e3UL,0x433a7979UL,0x2c9ab6b6UL,0xd90dd4d4UL,0xca478d8dUL,0x70176767UL,0xddaf7272UL,0x79ed9494UL,0x67ff9898UL,0x2393b0b0UL,0xde5b8585UL,0xbd06bbbbUL,0x7ebbc5c5UL,0x347b4f4fUL,0x3ad7ededUL,0x54d28686UL,0x62f89a9aUL,0xff996666UL,0xa7b61111UL,0x4ac08a8aUL,0x30d9e9e9UL,0x0a0e0404UL,0x9866fefeUL,0x0baba0a0UL,0xccb47878UL,0xd5f02525UL,0x3e754b4bUL,0x0eaca2a2UL,0x19445d5dUL,0x5bdb8080UL,0x85800505UL,0xecd33f3fUL,0xdffe2121UL,0xd8a87070UL,0x0cfdf1f1UL,0x7a196363UL,0x582f7777UL,0x9f30afafUL,0xa5e74242UL,0x50702020UL,0x2ecbe5e5UL,0x12effdfdUL,0xb708bfbfUL,0xd4558181UL,0x3c241818UL,0x5f792626UL,0x71b2c3c3UL,0x3886bebeUL,0xfdc83535UL,0x4fc78888UL,0x4b652e2eUL,0xf96a9393UL,0x0d585555UL,0x9d61fcfcUL,0xc9b37a7aUL,0xef27c8c8UL,0x3288babaUL,0x7d4f3232UL,0xa442e6e6UL,0xfb3bc0c0UL,0xb3aa1919UL,0x68f69e9eUL,0x8122a3a3UL,0xaaee4444UL,0x82d65454UL,0xe6dd3b3bUL,0x9e950b0bUL,0x45c98c8cUL,0x7bbcc7c7UL,0x6e056b6bUL,0x446c2828UL,0x8b2ca7a7UL,0x3d81bcbcUL,0x27311616UL,0x9a37adadUL,0x4d96dbdbUL,0xfa9e6464UL,0xd2a67474UL,0x22361414UL,0x76e49292UL,0x1e120c0cUL,0xb4fc4848UL,0x378fb8b8UL,0xe7789f9fUL,0xb20fbdbdUL,0x2a694343UL,0xf135c4c4UL,0xe3da3939UL,0xf7c63131UL,0x598ad3d3UL,0x8674f2f2UL,0x5683d5d5UL,0xc54e8b8bUL,0xeb856e6eUL,0xc218dadaUL,0x8f8e0101UL,0xac1db1b1UL,0x6df19c9cUL,0x3b724949UL,0xc71fd8d8UL,0x15b9acacUL,0x09faf3f3UL,0x6fa0cfcfUL,0xea20cacaUL,0x897df4f4UL,0x20674747UL,0x28381010UL,0x640b6f6fUL,0x8373f0f0UL,0xb1fb4a4aUL,0x96ca5c5cUL,0x6c543838UL,0x085f5757UL,0x52217373UL,0xf3649797UL,0x65aecbcbUL,0x8425a1a1UL,0xbf57e8e8UL,0x635d3e3eUL,0x7cea9696UL,0x7f1e6161UL,0x919c0d0dUL,0x949b0f0fUL,0xab4be0e0UL,0xc6ba7c7cUL,0x57267171UL,0xe529ccccUL,0x73e39090UL,0x0f090606UL,0x03f4f7f7UL,0x362a1c1cUL,0xfe3cc2c2UL,0xe18b6a6aUL,0x10beaeaeUL,0x6b026969UL,0xa8bf1717UL,0xe8719999UL,0x69533a3aUL,0xd0f72727UL,0x4891d9d9UL,0x35deebebUL,0xcee52b2bUL,0x55772222UL,0xd604d2d2UL,0x9039a9a9UL,0x80870707UL,0xf2c13333UL,0xc1ec2d2dUL,0x665a3c3cUL,0xadb81515UL,0x60a9c9c9UL,0xdb5c8787UL,0x1ab0aaaaUL,0x88d85050UL,0x8e2ba5a5UL,0x8a890303UL,0x134a5959UL,0x9b920909UL,0x39231a1aUL,0x75106565UL,0x5384d7d7UL,0x51d58484UL,0xd303d0d0UL,0x5edc8282UL,0xcbe22929UL,0x99c35a5aUL,0x332d1e1eUL,0x463d7b7bUL,0x1fb7a8a8UL,0x610c6d6dUL,0x4e622c2cUL,
0x32c6c6a5UL,0x6ff8f884UL,0x5eeeee99UL,0x7af6f68dUL,0xe8ffff0dUL,0x0ad6d6bdUL,0x16dedeb1UL,0x6d919154UL,0x90606050UL,0x07020203UL,0x2ececea9UL,0xd156567dUL,0xcce7e719UL,0x13b5b562UL,0x7c4d4de6UL,0x59ecec9aUL,0x408f8f45UL,0xa31f1f9dUL,0x49898940UL,0x68fafa87UL,0xd0efef15UL,0x94b2b2ebUL,0xce8e8ec9UL,0xe6fbfb0bUL,0x6e4141ecUL,0x1ab3b367UL,0x435f5ffdUL,0x604545eaUL,0xf92323bfUL,0x515353f7UL,0x45e4e496UL,0x769b9b5bUL,0x287575c2UL,0xc5e1e11cUL,0xd43d3daeUL,0xf24c4c6aUL,0x826c6c5aUL,0xbd7e7e41UL,0xf3f5f502UL,0x5283834fUL,0x8c68685cUL,0x565151f4UL,0x8dd1d134UL,0xe1f9f908UL,0x4ce2e293UL,0x3eabab73UL,0x97626253UL,0x6b2a2a3fUL,0x1c08080cUL,0x63959552UL,0xe9464665UL,0x7f9d9d5eUL,0x48303028UL,0xcf3737a1UL,0x1b0a0a0fUL,0xeb2f2fb5UL,0x150e0e09UL,0x7e242436UL,0xad1b1b9bUL,0x98dfdf3dUL,0xa7cdcd26UL,0xf54e4e69UL,0x337f7fcdUL,0x50eaea9fUL,0x3f12121bUL,0xa41d1d9eUL,0xc4585874UL,0x4634342eUL,0x4136362dUL,0x11dcdcb2UL,0x9db4b4eeUL,0x4d5b5bfbUL,0xa5a4a4f6UL,0xa176764dUL,0x14b7b761UL,0x347d7dceUL,0xdf52527bUL,0x9fdddd3eUL,0xcd5e5e71UL,0xb1131397UL,0xa2a6a6f5UL,0x01b9b968UL,0x00000000UL,0xb5c1c12cUL,0xe0404060UL,0xc2e3e31fUL,0x3a7979c8UL,0x9ab6b6edUL,0x0dd4d4beUL,0x478d8d46UL,0x176767d9UL,0xaf72724bUL,0xed9494deUL,0xff9898d4UL,0x93b0b0e8UL,0x5b85854aUL,0x06bbbb6bUL,0xbbc5c52aUL,0x7b4f4fe5UL,0xd7eded16UL,0xd28686c5UL,0xf89a9ad7UL,0x99666655UL,0xb6111194UL,0xc08a8acfUL,0xd9e9e910UL,0x0e040406UL,0x66fefe81UL,0xaba0a0f0UL,0xb4787844UL,0xf02525baUL,0x754b4be3UL,0xaca2a2f3UL,0x445d5dfeUL,0xdb8080c0UL,0x8005058aUL,0xd33f3fadUL,0xfe2121bcUL,0xa8707048UL,0xfdf1f104UL,0x196363dfUL,0x2f7777c1UL,0x30afaf75UL,0xe7424263UL,0x70202030UL,0xcbe5e51aUL,0xeffdfd0eUL,0x08bfbf6dUL,0x5581814cUL,0x24181814UL,0x79262635UL,0xb2c3c32fUL,0x86bebee1UL,0xc83535a2UL,0xc78888ccUL,0x652e2e39UL,0x6a939357UL,0x585555f2UL,0x61fcfc82UL,0xb37a7a47UL,0x27c8c8acUL,0x88babae7UL,0x4f32322bUL,0x42e6e695UL,0x3bc0c0a0UL,0xaa191998UL,0xf69e9ed1UL,0x22a3a37fUL,0xee444466UL,0xd654547eUL,0xdd3b3babUL,0x950b0b83UL,0xc98c8ccaUL,0xbcc7c729UL,0x056b6bd3UL,0x6c28283cUL,0x2ca7a779UL,0x81bcbce2UL,0x3116161dUL,0x37adad76UL,0x96dbdb3bUL,0x9e646456UL,0xa674744eUL,0x3614141eUL,0xe49292dbUL,0x120c0c0aUL,0xfc48486cUL,0x8fb8b8e4UL,0x789f9f5dUL,0x0fbdbd6eUL,0x694343efUL,0x35c4c4a6UL,0xda3939a8UL,0xc63131a4UL,0x8ad3d337UL,0x74f2f28bUL,0x83d5d532UL,0x4e8b8b43UL,0x856e6e59UL,0x18dadab7UL,0x8e01018cUL,0x1db1b164UL,0xf19c9cd2UL,0x724949e0UL,0x1fd8d8b4UL,0xb9acacfaUL,0xfaf3f307UL,0xa0cfcf25UL,0x20cacaafUL,0x7df4f48eUL,0x674747e9UL,0x38101018UL,0x0b6f6fd5UL,0x73f0f088UL,0xfb4a4a6fUL,0xca5c5c72UL,0x54383824UL,0x5f5757f1UL,0x217373c7UL,0x64979751UL,0xaecbcb23UL,0x25a1a17cUL,0x57e8e89cUL,0x5d3e3e21UL,0xea9696ddUL,0x1e6161dcUL,0x9c0d0d86UL,0x9b0f0f85UL,0x4be0e090UL,0xba7c7c42UL,0x267171c4UL,0x29ccccaaUL,0xe39090d8UL,0x09060605UL,0xf4f7f701UL,0x2a1c1c12UL,0x3cc2c2a3UL,0x8b6a6a5fUL,0xbeaeaef9UL,0x026969d0UL,0xbf171791UL,0x71999958UL,0x533a3a27UL,0xf72727b9UL,0x91d9d938UL,0xdeebeb13UL,0xe52b2bb3UL,0x77222233UL,0x04d2d2bbUL,0x39a9a970UL,0x87070789UL,0xc13333a7UL,0xec2d2db6UL,0x5a3c3c22UL,0xb8151592UL,0xa9c9c920UL,0x5c878749UL,0xb0aaaaffUL,0xd8505078UL,0x2ba5a57aUL,0x8903038fUL,0x4a5959f8UL,0x92090980UL,0x231a1a17UL,0x106565daUL,0x84d7d731UL,0xd58484c6UL,0x03d0d0b8UL,0xdc8282c3UL,0xe22929b0UL,0xc35a5a77UL,0x2d1e1e11UL,0x3d7b7bcbUL,0xb7a8a8fcUL,0x0c6d6dd6UL,0x622c2c3aUL,
0xc6c6a597UL,0xf8f884ebUL,0xeeee99c7UL,0xf6f68df7UL,0xffff0de5UL,0xd6d6bdb7UL,0xdedeb1a7UL,0x91915439UL,0x606050c0UL,0x02020304UL,0xcecea987UL,0x56567dacUL,0xe7e719d5UL,0xb5b56271UL,0x4d4de69aUL,0xecec9ac3UL,0x8f8f4505UL,0x1f1f9d3eUL,0x89894009UL,0xfafa87efUL,0xefef15c5UL,0xb2b2eb7fUL,0x8e8ec907UL,0xfbfb0bedUL,0x4141ec82UL,0xb3b3677dUL,0x5f5ffdbeUL,0x4545ea8aUL,0x2323bf46UL,0x5353f7a6UL,0xe4e496d3UL,0x9b9b5b2dUL,0x7575c2eaUL,0xe1e11cd9UL,0x3d3dae7aUL,0x4c4c6a98UL,0x6c6c5ad8UL,0x7e7e41fcUL,0xf5f502f1UL,0x83834f1dUL,0x68685cd0UL,0x5151f4a2UL,0xd1d134b9UL,0xf9f908e9UL,0xe2e293dfUL,0xabab734dUL,0x626253c4UL,0x2a2a3f54UL,0x08080c10UL,0x95955231UL,0x4646658cUL,0x9d9d5e21UL,0x30302860UL,0x3737a16eUL,0x0a0a0f14UL,0x2f2fb55eUL,0x0e0e091cUL,0x24243648UL,0x1b1b9b36UL,0xdfdf3da5UL,0xcdcd2681UL,0x4e4e699cUL,0x7f7fcdfeUL,0xeaea9fcfUL,0x12121b24UL,0x1d1d9e3aUL,0x585874b0UL,0x34342e68UL,0x36362d6cUL,0xdcdcb2a3UL,0xb4b4ee73UL,0x5b5bfbb6UL,0xa4a4f653UL,0x76764decUL,0xb7b76175UL,0x7d7dcefaUL,0x52527ba4UL,0xdddd3ea1UL,0x5e5e71bcUL,0x13139726UL,0xa6a6f557UL,0xb9b96869UL,0x00000000UL,0xc1c12c99UL,0x40406080UL,0xe3e31fddUL,0x7979c8f2UL,0xb6b6ed77UL,0xd4d4beb3UL,0x8d8d4601UL,0x6767d9ceUL,0x72724be4UL,0x9494de33UL,0x9898d42bUL,0xb0b0e87bUL,0x85854a11UL,0xbbbb6b6dUL,0xc5c52a91UL,0x4f4fe59eUL,0xeded16c1UL,0x8686c517UL,0x9a9ad72fUL,0x666655ccUL,0x11119422UL,0x8a8acf0fUL,0xe9e910c9UL,0x04040608UL,0xfefe81e7UL,0xa0a0f05bUL,0x787844f0UL,0x2525ba4aUL,0x4b4be396UL,0xa2a2f35fUL,0x5d5dfebaUL,0x8080c01bUL,0x05058a0aUL,0x3f3fad7eUL,0x2121bc42UL,0x707048e0UL,0xf1f104f9UL,0x6363dfc6UL,0x7777c1eeUL,0xafaf7545UL,0x42426384UL,0x20203040UL,0xe5e51ad1UL,0xfdfd0ee1UL,0xbfbf6d65UL,0x81814c19UL,0x18181430UL,0x2626354cUL,0xc3c32f9dUL,0xbebee167UL,0x3535a26aUL,0x8888cc0bUL,0x2e2e395cUL,0x9393573dUL,0x5555f2aaUL,0xfcfc82e3UL,0x7a7a47f4UL,0xc8c8ac8bUL,0xbabae76fUL,0x32322b64UL,0xe6e695d7UL,0xc0c0a09bUL,0x19199832UL,0x9e9ed127UL,0xa3a37f5dUL,0x44446688UL,0x54547ea8UL,0x3b3bab76UL,0x0b0b8316UL,0x8c8cca03UL,0xc7c72995UL,0x6b6bd3d6UL,0x28283c50UL,0xa7a77955UL,0xbcbce263UL,0x16161d2cUL,0xadad7641UL,0xdbdb3badUL,0x646456c8UL,0x74744ee8UL,0x14141e28UL,0x9292db3fUL,0x0c0c0a18UL,0x48486c90UL,0xb8b8e46bUL,0x9f9f5d25UL,0xbdbd6e61UL,0x4343ef86UL,0xc4c4a693UL,0x3939a872UL,0x3131a462UL,0xd3d337bdUL,0xf2f28bffUL,0xd5d532b1UL,0x8b8b430dUL,0x6e6e59dcUL,0xdadab7afUL,0x01018c02UL,0xb1b16479UL,0x9c9cd223UL,0x4949e092UL,0xd8d8b4abUL,0xacacfa43UL,0xf3f307fdUL,0xcfcf2585UL,0xcacaaf8fUL,0xf4f48ef3UL,0x4747e98eUL,0x10101820UL,0x6f6fd5deUL,0xf0f088fbUL,0x4a4a6f94UL,0x5c5c72b8UL,0x38382470UL,0x5757f1aeUL,0x7373c7e6UL,0x97975135UL,0xcbcb238dUL,0xa1a17c59UL,0xe8e89ccbUL,0x3e3e217cUL,0x9696dd37UL,0x6161dcc2UL,0x0d0d861aUL,0x0f0f851eUL,0xe0e090dbUL,0x7c7c42f8UL,0x7171c4e2UL,0xccccaa83UL,0x9090d83bUL,0x0606050cUL,0xf7f701f5UL,0x1c1c1238UL,0xc2c2a39fUL,0x6a6a5fd4UL,0xaeaef947UL,0x6969d0d2UL,0x1717912eUL,0x99995829UL,0x3a3a2774UL,0x2727b94eUL,0xd9d938a9UL,0xebeb13cdUL,0x2b2bb356UL,0x22223344UL,0xd2d2bbbfUL,0xa9a97049UL,0x0707890eUL,0x3333a766UL,0x2d2db65aUL,0x3c3c2278UL,0x1515922aUL,0xc9c92089UL,0x87874915UL,0xaaaaff4fUL,0x505078a0UL,0xa5a57a51UL,0x03038f06UL,0x5959f8b2UL,0x09098012UL,0x1a1a1734UL,0x6565dacaUL,0xd7d731b5UL,0x8484c613UL,0xd0d0b8bbUL,0x8282c31fUL,0x2929b052UL,0x5a5a77b4UL,0x1e1e113cUL,0x7b7bcbf6UL,0xa8a8fc4bUL,0x6d6dd6daUL,0x2c2c3a58UL,
0xc6a597f4UL,0xf884eb97UL,0xee99c7b0UL,0xf68df78cUL,0xff0de517UL,0xd6bdb7dcUL,0xdeb1a7c8UL,0x915439fcUL,0x6050c0f0UL,0x02030405UL,0xcea987e0UL,0x567dac87UL,0xe719d52bUL,0xb56271a6UL,0x4de69a31UL,0xec9ac3b5UL,0x8f4505cfUL,0x1f9d3ebcUL,0x894009c0UL,0xfa87ef92UL,0xef15c53fUL,0xb2eb7f26UL,0x8ec90740UL,0xfb0bed1dUL,0x41ec822fUL,0xb3677da9UL,0x5ffdbe1cUL,0x45ea8a25UL,0x23bf46daUL,0x53f7a602UL,0xe496d3a1UL,0x9b5b2dedUL,0x75c2ea5dUL,0xe11cd924UL,0x3dae7ae9UL,0x4c6a98beUL,0x6c5ad8eeUL,0x7e41fcc3UL,0xf502f106UL,0x834f1dd1UL,0x685cd0e4UL,0x51f4a207UL,0xd134b95cUL,0xf908e918UL,0xe293dfaeUL,0xab734d95UL,0x6253c4f5UL,0x2a3f5441UL,0x080c1014UL,0x955231f6UL,0x46658cafUL,0x9d5e21e2UL,0x30286078UL,0x37a16ef8UL,0x0a0f1411UL,0x2fb55ec4UL,0x0e091c1bUL,0x2436485aUL,0x1b9b36b6UL,0xdf3da547UL,0xcd26816aUL,0x4e699cbbUL,0x7fcdfe4cUL,0xea9fcfbaUL,0x121b242dUL,0x1d9e3ab9UL,0x5874b09cUL,0x342e6872UL,0x362d6c77UL,0xdcb2a3cdUL,0xb4ee7329UL,0x5bfbb616UL,0xa4f65301UL,0x764decd7UL,0xb76175a3UL,0x7dcefa49UL,0x527ba48dUL,0xdd3ea142UL,0x5e71bc93UL,0x139726a2UL,0xa6f55704UL,0xb96869b8UL,0x00000000UL,0xc12c9974UL,0x406080a0UL,0xe31fdd21UL,0x79c8f243UL,0xb6ed772cUL,0xd4beb3d9UL,0x8d4601caUL,0x67d9ce70UL,0x724be4ddUL,0x94de3379UL,0x98d42b67UL,0xb0e87b23UL,0x854a11deUL,0xbb6b6dbdUL,0xc52a917eUL,0x4fe59e34UL,0xed16c13aUL,0x86c51754UL,0x9ad72f62UL,0x6655ccffUL,0x119422a7UL,0x8acf0f4aUL,0xe910c930UL,0x0406080aUL,0xfe81e798UL,0xa0f05b0bUL,0x7844f0ccUL,0x25ba4ad5UL,0x4be3963eUL,0xa2f35f0eUL,0x5dfeba19UL,0x80c01b5bUL,0x058a0a85UL,0x3fad7eecUL,0x21bc42dfUL,0x7048e0d8UL,0xf104f90cUL,0x63dfc67aUL,0x77c1ee58UL,0xaf75459fUL,0x426384a5UL,0x20304050UL,0xe51ad12eUL,0xfd0ee112UL,0xbf6d65b7UL,0x814c19d4UL,0x1814303cUL,0x26354c5fUL,0xc32f9d71UL,0xbee16738UL,0x35a26afdUL,0x88cc0b4fUL,0x2e395c4bUL,0x93573df9UL,0x55f2aa0dUL,0xfc82e39dUL,0x7a47f4c9UL,0xc8ac8befUL,0xbae76f32UL,0x322b647dUL,0xe695d7a4UL,0xc0a09bfbUL,0x199832b3UL,0x9ed12768UL,0xa37f5d81UL,0x446688aaUL,0x547ea882UL,0x3bab76e6UL,0x0b83169eUL,0x8cca0345UL,0xc729957bUL,0x6bd3d66eUL,0x283c5044UL,0xa779558bUL,0xbce2633dUL,0x161d2c27UL,0xad76419aUL,0xdb3bad4dUL,0x6456c8faUL,0x744ee8d2UL,0x141e2822UL,0x92db3f76UL,0x0c0a181eUL,0x486c90b4UL,0xb8e46b37UL,0x9f5d25e7UL,0xbd6e61b2UL,0x43ef862aUL,0xc4a693f1UL,0x39a872e3UL,0x31a462f7UL,0xd337bd59UL,0xf28bff86UL,0xd532b156UL,0x8b430dc5UL,0x6e59dcebUL,0xdab7afc2UL,0x018c028fUL,0xb16479acUL,0x9cd2236dUL,0x49e0923bUL,0xd8b4abc7UL,0xacfa4315UL,0xf307fd09UL,0xcf25856fUL,0xcaaf8feaUL,0xf48ef389UL,0x47e98e20UL,0x10182028UL,0x6fd5de64UL,0xf088fb83UL,0x4a6f94b1UL,0x5c72b896UL,0x3824706cUL,0x57f1ae08UL,0x73c7e652UL,0x975135f3UL,0xcb238d65UL,0xa17c5984UL,0xe89ccbbfUL,0x3e217c63UL,0x96dd377cUL,0x61dcc27fUL,0x0d861a91UL,0x0f851e94UL,0xe090dbabUL,0x7c42f8c6UL,0x71c4e257UL,0xccaa83e5UL,0x90d83b73UL,0x06050c0fUL,0xf701f503UL,0x1c123836UL,0xc2a39ffeUL,0x6a5fd4e1UL,0xaef94710UL,0x69d0d26bUL,0x17912ea8UL,0x995829e8UL,0x3a277469UL,0x27b94ed0UL,0xd938a948UL,0xeb13cd35UL,0x2bb356ceUL,0x22334455UL,0xd2bbbfd6UL,0xa9704990UL,0x07890e80UL,0x33a766f2UL,0x2db65ac1UL,0x3c227866UL,0x15922aadUL,0xc9208960UL,0x874915dbUL,0xaaff4f1aUL,0x5078a088UL,0xa57a518eUL,0x038f068aUL,0x59f8b213UL,0x0980129bUL,0x1a173439UL,0x65daca75UL,0xd731b553UL,0x84c61351UL,0xd0b8bbd3UL,0x82c31f5eUL,0x29b052cbUL,0x5a77b499UL,0x1e113c33UL,0x7bcbf646UL,0xa8fc4b1fUL,0x6dd6da61UL,0x2c3a584eUL,
0xa597f4a5UL,0x84eb9784UL,0x99c7b099UL,0x8df78c8dUL,0x0de5170dUL,0xbdb7dcbdUL,0xb1a7c8b1UL,0x5439fc54UL,0x50c0f050UL,0x03040503UL,0xa987e0a9UL,0x7dac877dUL,0x19d52b19UL,0x6271a662UL,0xe69a31e6UL,0x9ac3b59aUL,0x4505cf45UL,0x9d3ebc9dUL,0x4009c040UL,0x87ef9287UL,0x15c53f15UL,0xeb7f26ebUL,0xc90740c9UL,0x0bed1d0bUL,0xec822fecUL,0x677da967UL,0xfdbe1cfdUL,0xea8a25eaUL,0xbf46dabfUL,0xf7a602f7UL,0x96d3a196UL,0x5b2ded5bUL,0xc2ea5dc2UL,0x1cd9241cUL,0xae7ae9aeUL,0x6a98be6aUL,0x5ad8ee5aUL,0x41fcc341UL,0x02f10602UL,0x4f1dd14fUL,0x5cd0e45cUL,0xf4a207f4UL,0x34b95c34UL,0x08e91808UL,0x93dfae93UL,0x734d9573UL,0x53c4f553UL,0x3f54413fUL,0x0c10140cUL,0x5231f652UL,0x658caf65UL,0x5e21e25eUL,0x28607828UL,0xa16ef8a1UL,0x0f14110fUL,0xb55ec4b5UL,0x091c1b09UL,0x36485a36UL,0x9b36b69bUL,0x3da5473dUL,0x26816a26UL,0x699cbb69UL,0xcdfe4ccdUL,0x9fcfba9fUL,0x1b242d1bUL,0x9e3ab99eUL,0x74b09c74UL,0x2e68722eUL,0x2d6c772dUL,0xb2a3cdb2UL,0xee7329eeUL,0xfbb616fbUL,0xf65301f6UL,0x4decd74dUL,0x6175a361UL,0xcefa49ceUL,0x7ba48d7bUL,0x3ea1423eUL,0x71bc9371UL,0x9726a297UL,0xf55704f5UL,0x6869b868UL,0x00000000UL,0x2c99742cUL,0x6080a060UL,0x1fdd211fUL,0xc8f243c8UL,0xed772cedUL,0xbeb3d9beUL,0x4601ca46UL,0xd9ce70d9UL,0x4be4dd4bUL,0xde3379deUL,0xd42b67d4UL,0xe87b23e8UL,0x4a11de4aUL,0x6b6dbd6bUL,0x2a917e2aUL,0xe59e34e5UL,0x16c13a16UL,0xc51754c5UL,0xd72f62d7UL,0x55ccff55UL,0x9422a794UL,0xcf0f4acfUL,0x10c93010UL,0x06080a06UL,0x81e79881UL,0xf05b0bf0UL,0x44f0cc44UL,0xba4ad5baUL,0xe3963ee3UL,0xf35f0ef3UL,0xfeba19feUL,0xc01b5bc0UL,0x8a0a858aUL,0xad7eecadUL,0xbc42dfbcUL,0x48e0d848UL,0x04f90c04UL,0xdfc67adfUL,0xc1ee58c1UL,0x75459f75UL,0x6384a563UL,0x30405030UL,0x1ad12e1aUL,0x0ee1120eUL,0x6d65b76dUL,0x4c19d44cUL,0x14303c14UL,0x354c5f35UL,0x2f9d712fUL,0xe16738e1UL,0xa26afda2UL,0xcc0b4fccUL,0x395c4b39UL,0x573df957UL,0xf2aa0df2UL,0x82e39d82UL,0x47f4c947UL,0xac8befacUL,0xe76f32e7UL,0x2b647d2bUL,0x95d7a495UL,0xa09bfba0UL,0x9832b398UL,0xd12768d1UL,0x7f5d817fUL,0x6688aa66UL,0x7ea8827eUL,0xab76e6abUL,0x83169e83UL,0xca0345caUL,0x29957b29UL,0xd3d66ed3UL,0x3c50443cUL,0x79558b79UL,0xe2633de2UL,0x1d2c271dUL,0x76419a76UL,0x3bad4d3bUL,0x56c8fa56UL,0x4ee8d24eUL,0x1e28221eUL,0xdb3f76dbUL,0x0a181e0aUL,0x6c90b46cUL,0xe46b37e4UL,0x5d25e75dUL,0x6e61b26eUL,0xef862aefUL,0xa693f1a6UL,0xa872e3a8UL,0xa462f7a4UL,0x37bd5937UL,0x8bff868bUL,0x32b15632UL,0x430dc543UL,0x59dceb59UL,0xb7afc2b7UL,0x8c028f8cUL,0x6479ac64UL,0xd2236dd2UL,0xe0923be0UL,0xb4abc7b4UL,0xfa4315faUL,0x07fd0907UL,0x25856f25UL,0xaf8feaafUL,0x8ef3898eUL,0xe98e20e9UL,0x18202818UL,0xd5de64d5UL,0x88fb8388UL,0x6f94b16fUL,0x72b89672UL,0x24706c24UL,0xf1ae08f1UL,0xc7e652c7UL,0x5135f351UL,0x238d6523UL,0x7c59847cUL,0x9ccbbf9cUL,0x217c6321UL,0xdd377cddUL,0xdcc27fdcUL,0x861a9186UL,0x851e9485UL,0x90dbab90UL,0x42f8c642UL,0xc4e257c4UL,0xaa83e5aaUL,0xd83b73d8UL,0x050c0f05UL,0x01f50301UL,0x12383612UL,0xa39ffea3UL,0x5fd4e15fUL,0xf94710f9UL,0xd0d26bd0UL,0x912ea891UL,0x5829e858UL,0x27746927UL,0xb94ed0b9UL,0x38a94838UL,0x13cd3513UL,0xb356ceb3UL,0x33445533UL,0xbbbfd6bbUL,0x70499070UL,0x890e8089UL,0xa766f2a7UL,0xb65ac1b6UL,0x22786622UL,0x922aad92UL,0x20896020UL,0x4915db49UL,0xff4f1affUL,0x78a08878UL,0x7a518e7aUL,0x8f068a8fUL,0xf8b213f8UL,0x80129b80UL,0x17343917UL,0xdaca75daUL,0x31b55331UL,0xc61351c6UL,0xb8bbd3b8UL,0xc31f5ec3UL,0xb052cbb0UL,0x77b49977UL,0x113c3311UL,0xcbf646cbUL,0xfc4b1ffcUL,0xd6da61d6UL,0x3a584e3aUL,
0x97f4a5f4UL,0xeb978497UL,0xc7b099b0UL,0xf78c8d8cUL,0xe5170d17UL,0xb7dcbddcUL,0xa7c8b1c8UL,0x39fc54fcUL,0xc0f050f0UL,0x04050305UL,0x87e0a9e0UL,0xac877d87UL,0xd52b192bUL,0x71a662a6UL,0x9a31e631UL,0xc3b59ab5UL,0x05cf45cfUL,0x3ebc9dbcUL,0x09c040c0UL,0xef928792UL,0xc53f153fUL,0x7f26eb26UL,0x0740c940UL,0xed1d0b1dUL,0x822fec2fUL,0x7da967a9UL,0xbe1cfd1cUL,0x8a25ea25UL,0x46dabfdaUL,0xa602f702UL,0xd3a196a1UL,0x2ded5bedUL,0xea5dc25dUL,0xd9241c24UL,0x7ae9aee9UL,0x98be6abeUL,0xd8ee5aeeUL,0xfcc341c3UL,0xf1060206UL,0x1dd14fd1UL,0xd0e45ce4UL,0xa207f407UL,0xb95c345cUL,0xe9180818UL,0xdfae93aeUL,0x4d957395UL,0xc4f553f5UL,0x54413f41UL,0x10140c14UL,0x31f652f6UL,0x8caf65afUL,0x21e25ee2UL,0x60782878UL,0x6ef8a1f8UL,0x14110f11UL,0x5ec4b5c4UL,0x1c1b091bUL,0x485a365aUL,0x36b69bb6UL,0xa5473d47UL,0x816a266aUL,0x9cbb69bbUL,0xfe4ccd4cUL,0xcfba9fbaUL,0x242d1b2dUL,0x3ab99eb9UL,0xb09c749cUL,0x68722e72UL,0x6c772d77UL,0xa3cdb2cdUL,0x7329ee29UL,0xb616fb16UL,0x5301f601UL,0xecd74dd7UL,0x75a361a3UL,0xfa49ce49UL,0xa48d7b8dUL,0xa1423e42UL,0xbc937193UL,0x26a297a2UL,0x5704f504UL,0x69b868b8UL,0x00000000UL,0x99742c74UL,0x80a060a0UL,0xdd211f21UL,0xf243c843UL,0x772ced2cUL,0xb3d9bed9UL,0x01ca46caUL,0xce70d970UL,0xe4dd4bddUL,0x3379de79UL,0x2b67d467UL,0x7b23e823UL,0x11de4adeUL,0x6dbd6bbdUL,0x917e2a7eUL,0x9e34e534UL,0xc13a163aUL,0x1754c554UL,0x2f62d762UL,0xccff55ffUL,0x22a794a7UL,0x0f4acf4aUL,0xc9301030UL,0x080a060aUL,0xe7988198UL,0x5b0bf00bUL,0xf0cc44ccUL,0x4ad5bad5UL,0x963ee33eUL,0x5f0ef30eUL,0xba19fe19UL,0x1b5bc05bUL,0x0a858a85UL,0x7eecadecUL,0x42dfbcdfUL,0xe0d848d8UL,0xf90c040cUL,0xc67adf7aUL,0xee58c158UL,0x459f759fUL,0x84a563a5UL,0x40503050UL,0xd12e1a2eUL,0xe1120e12UL,0x65b76db7UL,0x19d44cd4UL,0x303c143cUL,0x4c5f355fUL,0x9d712f71UL,0x6738e138UL,0x6afda2fdUL,0x0b4fcc4fUL,0x5c4b394bUL,0x3df957f9UL,0xaa0df20dUL,0xe39d829dUL,0xf4c947c9UL,0x8befacefUL,0x6f32e732UL,0x647d2b7dUL,0xd7a495a4UL,0x9bfba0fbUL,0x32b398b3UL,0x2768d168UL,0x5d817f81UL,0x88aa66aaUL,0xa8827e82UL,0x76e6abe6UL,0x169e839eUL,0x0345ca45UL,0x957b297bUL,0xd66ed36eUL,0x50443c44UL,0x558b798bUL,0x633de23dUL,0x2c271d27UL,0x419a769aUL,0xad4d3b4dUL,0xc8fa56faUL,0xe8d24ed2UL,0x28221e22UL,0x3f76db76UL,0x181e0a1eUL,0x90b46cb4UL,0x6b37e437UL,0x25e75de7UL,0x61b26eb2UL,0x862aef2aUL,0x93f1a6f1UL,0x72e3a8e3UL,0x62f7a4f7UL,0xbd593759UL,0xff868b86UL,0xb1563256UL,0x0dc543c5UL,0xdceb59ebUL,0xafc2b7c2UL,0x028f8c8fUL,0x79ac64acUL,0x236dd26dUL,0x923be03bUL,0xabc7b4c7UL,0x4315fa15UL,0xfd090709UL,0x856f256fUL,0x8feaafeaUL,0xf3898e89UL,0x8e20e920UL,0x20281828UL,0xde64d564UL,0xfb838883UL,0x94b16fb1UL,0xb8967296UL,0x706c246cUL,0xae08f108UL,0xe652c752UL,0x35f351f3UL,0x8d652365UL,0x59847c84UL,0xcbbf9cbfUL,0x7c632163UL,0x377cdd7cUL,0xc27fdc7fUL,0x1a918691UL,0x1e948594UL,0xdbab90abUL,0xf8c642c6UL,0xe257c457UL,0x83e5aae5UL,0x3b73d873UL,0x0c0f050fUL,0xf5030103UL,0x38361236UL,0x9ffea3feUL,0xd4e15fe1UL,0x4710f910UL,0xd26bd06bUL,0x2ea891a8UL,0x29e858e8UL,0x74692769UL,0x4ed0b9d0UL,0xa9483848UL,0xcd351335UL,0x56ceb3ceUL,0x44553355UL,0xbfd6bbd6UL,0x49907090UL,0x0e808980UL,0x66f2a7f2UL,0x5ac1b6c1UL,0x78662266UL,0x2aad92adUL,0x89602060UL,0x15db49dbUL,0x4f1aff1aUL,0xa0887888UL,0x518e7a8eUL,0x068a8f8aUL,0xb213f813UL,0x129b809bUL,0x34391739UL,0xca75da75UL,0xb5533153UL,0x1351c651UL,0xbbd3b8d3UL,0x1f5ec35eUL,0x52cbb0cbUL,0xb4997799UL,0x3c331133UL,0xf646cb46UL,0x4b1ffc1fUL,0xda61d661UL,0x584e3a4eUL,
0xf4a5f432UL,0x9784976fUL,0xb099b05eUL,0x8c8d8c7aUL,0x170d17e8UL,0xdcbddc0aUL,0xc8b1c816UL,0xfc54fc6dUL,0xf050f090UL,0x05030507UL,0xe0a9e02eUL,0x877d87d1UL,0x2b192bccUL,0xa662a613UL,0x31e6317cUL,0xb59ab559UL,0xcf45cf40UL,0xbc9dbca3UL,0xc040c049UL,0x92879268UL,0x3f153fd0UL,0x26eb2694UL,0x40c940ceUL,0x1d0b1de6UL,0x2fec2f6eUL,0xa967a91aUL,0x1cfd1c43UL,0x25ea2560UL,0xdabfdaf9UL,0x02f70251UL,0xa196a145UL,0xed5bed76UL,0x5dc25d28UL,0x241c24c5UL,0xe9aee9d4UL,0xbe6abef2UL,0xee5aee82UL,0xc341c3bdUL,0x060206f3UL,0xd14fd152UL,0xe45ce48cUL,0x07f40756UL,0x5c345c8dUL,0x180818e1UL,0xae93ae4cUL,0x9573953eUL,0xf553f597UL,0x413f416bUL,0x140c141cUL,0xf652f663UL,0xaf65afe9UL,0xe25ee27fUL,0x78287848UL,0xf8a1f8cfUL,0x110f111bUL,0xc4b5c4ebUL,0x1b091b15UL,0x5a365a7eUL,0xb69bb6adUL,0x473d4798UL,0x6a266aa7UL,0xbb69bbf5UL,0x4ccd4c33UL,0xba9fba50UL,0x2d1b2d3fUL,0xb99eb9a4UL,0x9c749cc4UL,0x722e7246UL,0x772d7741UL,0xcdb2cd11UL,0x29ee299dUL,0x16fb164dUL,0x01f601a5UL,0xd74dd7a1UL,0xa361a314UL,0x49ce4934UL,0x8d7b8ddfUL,0x423e429fUL,0x937193cdUL,0xa297a2b1UL,0x04f504a2UL,0xb868b801UL,0x00000000UL,0x742c74b5UL,0xa060a0e0UL,0x211f21c2UL,0x43c8433aUL,0x2ced2c9aUL,0xd9bed90dUL,0xca46ca47UL,0x70d97017UL,0xdd4bddafUL,0x79de79edUL,0x67d467ffUL,0x23e82393UL,0xde4ade5bUL,0xbd6bbd06UL,0x7e2a7ebbUL,0x34e5347bUL,0x3a163ad7UL,0x54c554d2UL,0x62d762f8UL,0xff55ff99UL,0xa794a7b6UL,0x4acf4ac0UL,0x301030d9UL,0x0a060a0eUL,0x98819866UL,0x0bf00babUL,0xcc44ccb4UL,0xd5bad5f0UL,0x3ee33e75UL,0x0ef30eacUL,0x19fe1944UL,0x5bc05bdbUL,0x858a8580UL,0xecadecd3UL,0xdfbcdffeUL,0xd848d8a8UL,0x0c040cfdUL,0x7adf7a19UL,0x58c1582fUL,0x9f759f30UL,0xa563a5e7UL,0x50305070UL,0x2e1a2ecbUL,0x120e12efUL,0xb76db708UL,0xd44cd455UL,0x3c143c24UL,0x5f355f79UL,0x712f71b2UL,0x38e13886UL,0xfda2fdc8UL,0x4fcc4fc7UL,0x4b394b65UL,0xf957f96aUL,0x0df20d58UL,0x9d829d61UL,0xc947c9b3UL,0xefacef27UL,0x32e73288UL,0x7d2b7d4fUL,0xa495a442UL,0xfba0fb3bUL,0xb398b3aaUL,0x68d168f6UL,0x817f8122UL,0xaa66aaeeUL,0x827e82d6UL,0xe6abe6ddUL,0x9e839e95UL,0x45ca45c9UL,0x7b297bbcUL,0x6ed36e05UL,0x443c446cUL,0x8b798b2cUL,0x3de23d81UL,0x271d2731UL,0x9a769a37UL,0x4d3b4d96UL,0xfa56fa9eUL,0xd24ed2a6UL,0x221e2236UL,0x76db76e4UL,0x1e0a1e12UL,0xb46cb4fcUL,0x37e4378fUL,0xe75de778UL,0xb26eb20fUL,0x2aef2a69UL,0xf1a6f135UL,0xe3a8e3daUL,0xf7a4f7c6UL,0x5937598aUL,0x868b8674UL,0x56325683UL,0xc543c54eUL,0xeb59eb85UL,0xc2b7c218UL,0x8f8c8f8eUL,0xac64ac1dUL,0x6dd26df1UL,0x3be03b72UL,0xc7b4c71fUL,0x15fa15b9UL,0x090709faUL,0x6f256fa0UL,0xeaafea20UL,0x898e897dUL,0x20e92067UL,0x28182838UL,0x64d5640bUL,0x83888373UL,0xb16fb1fbUL,0x967296caUL,0x6c246c54UL,0x08f1085fUL,0x52c75221UL,0xf351f364UL,0x652365aeUL,0x847c8425UL,0xbf9cbf57UL,0x6321635dUL,0x7cdd7ceaUL,0x7fdc7f1eUL,0x9186919cUL,0x9485949bUL,0xab90ab4bUL,0xc642c6baUL,0x57c45726UL,0xe5aae529UL,0x73d873e3UL,0x0f050f09UL,0x030103f4UL,0x3612362aUL,0xfea3fe3cUL,0xe15fe18bUL,0x10f910beUL,0x6bd06b02UL,0xa891a8bfUL,0xe858e871UL,0x69276953UL,0xd0b9d0f7UL,0x48384891UL,0x351335deUL,0xceb3cee5UL,0x55335577UL,0xd6bbd604UL,0x90709039UL,0x80898087UL,0xf2a7f2c1UL,0xc1b6c1ecUL,0x6622665aUL,0xad92adb8UL,0x602060a9UL,0xdb49db5cUL,0x1aff1ab0UL,0x887888d8UL,0x8e7a8e2bUL,0x8a8f8a89UL,0x13f8134aUL,0x9b809b92UL,0x39173923UL,0x75da7510UL,0x53315384UL,0x51c651d5UL,0xd3b8d303UL,0x5ec35edcUL,0xcbb0cbe2UL,0x997799c3UL,0x3311332dUL,0x46cb463dUL,0x1ffc1fb7UL,0x61d6610cUL,0x4e3a4e62UL
};

// local table
LOCAL UINT32 groestl_T_local[256*8];
const UINT32 LOCAL *groestl_T0 = &groestl_T_local[0 * 256];
const UINT32 LOCAL *groestl_T1 = &groestl_T_local[1 * 256];
const UINT32 LOCAL *groestl_T2 = &groestl_T_local[2 * 256];
const UINT32 LOCAL *groestl_T3 = &groestl_T_local[3 * 256];
const UINT32 LOCAL *groestl_T4 = &groestl_T_local[4 * 256];
const UINT32 LOCAL *groestl_T5 = &groestl_T_local[5 * 256];
const UINT32 LOCAL *groestl_T6 = &groestl_T_local[6 * 256];
const UINT32 LOCAL *groestl_T7 = &groestl_T_local[7 * 256];

// init, once per kernel
UINT32 nLocalId = LOCALID;
{
for(i = 0; i < 256 * 8; i += WORKSIZE)
groestl_T_local[i + nLocalId ] = groestl_T_init[i + nLocalId];
}

// declarations
UINT32 hash[32]; // hash[16..31] - scratch buffer

UINT32 groestl_BuffB[32];
UINT32 groestl_BuffC[32];
unsigned groestl_i;
unsigned index;

// inlined function body
groestl_BuffC[16] = hash[16] = 0x80;
groestl_BuffC[17] = hash[17] = 0;
groestl_BuffC[18] = hash[18] = 0;
groestl_BuffC[19] = hash[19] = 0;
groestl_BuffC[20] = hash[20] = 0;
groestl_BuffC[21] = hash[21] = 0;
groestl_BuffC[22] = hash[22] = 0;
groestl_BuffC[23] = hash[23] = 0;
groestl_BuffC[24] = hash[24] = 0;
groestl_BuffC[25] = hash[25] = 0;
groestl_BuffC[26] = hash[26] = 0;
groestl_BuffC[27] = hash[27] = 0;
groestl_BuffC[28] = hash[28] = 0;
groestl_BuffC[29] = hash[29] = 0;
groestl_BuffC[30] = hash[30] = 0;
hash[31] = 0x01000000;
groestl_BuffC[31] = 0x01020000L;

#pragma unroll 16
for (groestl_i = 0; groestl_i < 16; groestl_i++)
{
groestl_BuffC[groestl_i] = hash[groestl_i];
}

for(groestl_i=0; groestl_i < 0x0d000000u; groestl_i+=0x01000000u)
{
groestl_QMIX(hash, groestl_BuffB, groestl_i)
groestl_i+=0x01000000u;
groestl_QMIX(groestl_BuffB, hash, groestl_i)
}

for(groestl_i=0; groestl_i<13; ++groestl_i)
{
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
}

#pragma unroll 32
for(groestl_i = 0; groestl_i < 32-1; groestl_i++)
{
hash[groestl_i] ^= groestl_BuffC[groestl_i];
groestl_BuffB[groestl_i] = hash[groestl_i];
}
hash[31] ^= 0x00020000UL ^ groestl_BuffC[31];
groestl_BuffB[31] = hash[31];

for(groestl_i = 0; groestl_i < 14;)
{
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
}

#pragma unroll 16
for(groestl_i = 0; groestl_i < 16; ++groestl_i)
{
hash[groestl_i] = groestl_BuffB[16+groestl_i] ^ hash[16+groestl_i];
}
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...

I think it's multi-algo including myr-groestl, skein etc.
hero member
Activity: 630
Merit: 500
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
HR
legendary
Activity: 1176
Merit: 1011
Transparency & Integrity

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR
hero member
Activity: 630
Merit: 500
OK that makes sense Smiley
So I can still play with WS in your new OCL? (I think WS may be card specific tuning).
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I believe multiple threads help with algos which use gpu ram: groestl does not. Only WS and intensity matter. TC is a buffer in ram so not relevant as well.
hero member
Activity: 630
Merit: 500
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7

14.12 is the first version making Hawaii specific code which, in some cases, may bring sensible improvements.
On Tahiti, the compiler simply can't make code capable of running 2 wavefronts. Or maybe it can but I'm not able to make it do it, on Hawaii I can instead.
Hawaii is not just Tahiti with more shaders...
What are the differences, 280x (Tahiti)  can do multiple gpu-threads on many other coins (up to 4 gpu-threads on x11) with great efficiency, I do not understand why groesl can not.
Forgive me for asking such questions, but like my question about neoscrypt (which performs best with only 1 gpu-thread)  WS being totally tuned by amount of shaders ...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7

14.12 is the first version making Hawaii specific code which, in some cases, may bring sensible improvements.
On Tahiti, the compiler simply can't make code capable of running 2 wavefronts. Or maybe it can but I'm not able to make it do it, on Hawaii I can instead.
Hawaii is not just Tahiti with more shaders...
hero member
Activity: 630
Merit: 500
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
legendary
Activity: 1904
Merit: 1003
Very nice results with 290(X), any chance for 280x gain?
hero member
Activity: 630
Merit: 500
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.

the binary can work without the sources.
check that you are running a 64 bit miner (the official diamond miner is 32 bit), that you are using worksize 128 and that you are setting the correct bin file name.
and of course that you have a hawaii card! :-)
Good reason to have OCL source LOL, I am 64 bit OS but my miner is 32 bit ...
In my experience blackscreen or just hung miner indicates too much O/C or memclock not set as recommended (GPU crash before it can be reported) ...

Without the OCL source I can not test on Tahiti properly so I have no definte answer ... u did not specify OS, config etc so a bit hard to troubleshoot ...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.

the binary can work without the sources.
check that you are running a 64 bit miner (the official diamond miner is 32 bit), that you are using worksize 128 and that you are setting the correct bin file name.
and of course that you have a hawaii card! :-)
member
Activity: 89
Merit: 10
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.
hero member
Activity: 630
Merit: 500
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).
Indeed 280x only here, I am in talks with Pallas to work on this further Smiley
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).

the new kernel should make no difference on tahiti cards, but we will eventually make some tests later anyway: the reason is compatibility with newer drivers.
member
Activity: 109
Merit: 13
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).
Pages:
Jump to: