Hi guys!
The optimizations week doesn't end here! :-)
I changed the existing kernel a bit and removed some unnecessary stuff.
I got 8% speedup on R9 290.
Try for yourself:
https://mega.co.nz/#!9MAAHQJL!cqvdNfIeqcD39AZgvalgcGRKvS5U44xYTtlAZXqUi1UJust update the groestlcoin.cl file, remove your bin files and restart sph-sgminer.
If you got a faster running rig with my kernel, please consider donating:
* change you donation percentage from 0% to 2% on grs.cryptohunger.com
* donate GRS to Groestlcoin project: FYEAbFBG3xY5VUtE9GXC56v5UKt4xkoUkb
* send BTC to my private account: 1NpEXJvoLSG99m3d1vpM7tHx8uXkksf7wL
This is 2x R9 290 and one R9 270X on Windows7 with 14.6 beta drivers:
JFYI, here is what 14.6 compiles a single 'as_ulong(as_uchar8(r0).s76543210)' for GCN in Linux to:
s_bfe_u32 s5, s0, 0x00080008
s_bfe_u32 s6, s0, 0x00080010
s_bfe_i32 s7, s0, 0x00080000
s_bfe_i32 s5, s5, 0x00080000
s_bfe_u32 s8, s1, 0x00080000
s_bfe_u32 s9, s1, 0x00080008
s_bfe_i32 s6, s6, 0x00080000
s_ashr_i32 s0, s0, 24
s_lshl_b32 s7, s7, 8
v_mov_b32 v0, 0x0000ff00
v_mov_b32 v1, s5
s_bfe_u32 s5, s1, 0x00080010
s_bfe_i32 s8, s8, 0x00080000
s_bfe_i32 s9, s9, 0x00080000
s_load_dwordx4 s[12:15], s[2:3], 0x68
s_lshl_b32 s2, s6, 8
v_mov_b32 v2, s0
v_bfi_b32 v1, v0, s7, v1
s_bfe_i32 s0, s5, 0x00080000
s_ashr_i32 s1, s1, 24
s_lshl_b32 s3, s8, 8
v_mov_b32 v3, s9
v_bfi_b32 v2, v0, s2, v2
v_lshlrev_b32 v1, 16, v1
s_mov_b32 s2, 0xffff0000
s_lshl_b32 s0, s0, 8
v_mov_b32 v4, s1
v_bfi_b32 v3, v0, s3, v3
v_bfi_b32 v1, s2, v1, v2
v_mov_b32 v2, s4
v_bfi_b32 v0, v0, s0, v4
v_lshlrev_b32 v3, 16, v3
v_bfi_b32 v0, s2, v3, v0
Scary!