profile result of this commit :
Time(%) Time Calls Avg Min Max Name
20.82% 2.96411s 74 40.056ms 39.851ms 44.742ms x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
18.80% 2.67690s 75 35.692ms 35.574ms 39.812ms quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*)
12.79% 1.82197s 75 24.293ms 24.166ms 27.024ms x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
11.06% 1.57506s 75 21.001ms 20.943ms 23.427ms x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*)
7.63% 1.08677s 75 14.490ms 14.377ms 16.185ms x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
5.26% 749.42ms 75 9.9923ms 9.9294ms 11.130ms quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
4.96% 706.48ms 75 9.4197ms 9.2415ms 10.595ms x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
3.05% 434.68ms 75 5.7958ms 5.7581ms 5.8440ms x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
3.01% 427.94ms 75 5.7058ms 5.6788ms 6.3479ms quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
2.78% 395.99ms 75 5.2799ms 5.1751ms 5.3694ms x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
2.72% 387.33ms 75 5.1644ms 5.1375ms 5.7555ms quark_blake512_gpu_hash_80(int, unsigned int, void*)
2.67% 380.04ms 75 5.0671ms 5.0126ms 5.6239ms quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
2.56% 364.92ms 75 4.8655ms 4.8348ms 5.4331ms quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
1.60% 228.42ms 75 3.0456ms 3.0165ms 3.3860ms x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
0.28% 39.825ms 74 538.17us 535.38us 591.64us cuda_check_gpu_hash_64(int, unsigned int, unsigned int*, unsigned int*, unsigned int*)
Time(%) Time Calls Avg Min Max Name
20.69% 2.94306s 75 39.241ms 39.084ms 43.784ms x11_echo512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
18.78% 2.67148s 76 35.151ms 35.039ms 39.165ms quark_groestl512_gpu_hash_64_quad(int, unsigned int, unsigned int*, unsigned int*)
12.98% 1.84595s 76 24.289ms 24.188ms 27.135ms x11_shavite512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
11.07% 1.57478s 75 20.997ms 20.934ms 23.428ms x11_simd512_gpu_expand_64(int, unsigned int, unsigned long*, unsigned int*, uint4*)
7.22% 1.02751s 76 13.520ms 13.483ms 15.065ms x11_cubehash512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
5.34% 759.31ms 76 9.9909ms 9.9291ms 11.134ms quark_jh512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
5.04% 716.24ms 76 9.4243ms 9.2858ms 10.407ms x11_luffa512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
3.08% 438.16ms 76 5.7653ms 5.7369ms 6.4045ms quark_bmw512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
3.06% 434.61ms 75 5.7948ms 5.7602ms 5.8804ms x11_simd512_gpu_compress2_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
2.78% 395.22ms 75 5.2696ms 5.1818ms 5.4436ms x11_simd512_gpu_compress1_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
2.76% 392.47ms 76 5.1640ms 5.1345ms 5.7447ms quark_blake512_gpu_hash_80(int, unsigned int, void*)
2.70% 384.16ms 76 5.0548ms 5.0094ms 5.6282ms quark_skein512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
2.60% 369.59ms 76 4.8630ms 4.8242ms 5.4206ms quark_keccak512_gpu_hash_64(int, unsigned int, unsigned long*, unsigned int*)
1.61% 228.50ms 75 3.0466ms 3.0221ms 3.3762ms x11_simd512_gpu_final_64(int, unsigned int, unsigned long*, unsigned int*, uint4*, int*)
0.28% 40.323ms 75 537.64us 534.49us 589.79us cuda_check_gpu_hash_64(int, unsigned int, unsigned int*, unsigned int*, unsigned int*)
This time indeed, i see real improvements.
For information, my builds were faster because i was not using multi-arch support for binaries (i guess the current arch check take some time)
i will pick echo and cube changes, i need to analyse groestl one (looks weird) and aes one could break compatibilty with other archs + no real improvement