I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.
This is very interesting feedback. I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz.
Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both?
None of the following should cause that much of a difference, but it helps to quantify.
AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again
with AVX to compare.
4way uses 4 time the memory of plain AVX2. This will expose any cache performance issues. Try running fewer
threads to see if performance (total, not just per thread) improves.
Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.
Thanks for the reply.
About Tribus (3.7.7 version):
Tribus AVX 16 threads:
[2017-12-17 15:45:48] tribus block 449382, diff 297.717
[2017-12-17 15:45:48] CPU #3: 73.32 kH, 226.66 kH/s
[2017-12-17 15:45:48] CPU #2: 60.95 kH, 225.42 kH/s
[2017-12-17 15:45:48] CPU #1: 68.89 kH, 228.54 kH/s
[2017-12-17 15:45:48] CPU #0: 59.57 kH, 220.31 kH/s
[2017-12-17 15:45:48] CPU #7: 71.66 kH, 226.42 kH/s
[2017-12-17 15:45:48] CPU #4: 47.67 kH, 206.94 kH/s
[2017-12-17 15:45:48] CPU #14: 69.70 kH, 228.19 kH/s
[2017-12-17 15:45:48] CPU #6: 66.07 kH, 226.71 kH/s
[2017-12-17 15:45:48] CPU #12: 36.67 kH, 223.24 kH/s
[2017-12-17 15:45:48] CPU #15: 69.95 kH, 228.24 kH/s
[2017-12-17 15:45:48] CPU #11: 66.53 kH, 225.95 kH/s
[2017-12-17 15:45:48] CPU #5: 70.96 kH, 227.81 kH/s
[2017-12-17 15:45:48] CPU #10: 312.06 kH, 275.75 kH/s
[2017-12-17 15:45:48] CPU #8: 43.73 kH, 172.57 kH/s
[2017-12-17 15:45:48] CPU #9: 68.83 kH, 238.64 kH/s
[2017-12-17 15:45:48] CPU #13: 72.51 kH, 228.39 kH/s
Tribus AVX2 16 threads:
[2017-12-17 15:45:48][2017-12-17 15:49:10] tribus block 449390, diff 254.451
[2017-12-17 15:49:10] CPU #4: 97.38 kH, 211.38 kH/s
[2017-12-17 15:49:10] CPU #6: 110.08 kH, 237.92 kH/s
[2017-12-17 15:49:10] CPU #7: 110.38 kH, 238.04 kH/s
[2017-12-17 15:49:10] CPU #0: 103.07 kH, 221.32 kH/s
[2017-12-17 15:49:10] CPU #1: 109.05 kH, 234.17 kH/s
[2017-12-17 15:49:10] CPU #9: 109.41 kH, 238.00 kH/s
[2017-12-17 15:49:10] CPU #8: 108.26 kH, 234.98 kH/s
[2017-12-17 15:49:10] CPU #13: 109.99 kH, 238.22 kH/s
[2017-12-17 15:49:10] CPU #5: 112.40 kH, 241.36 kH/s
[2017-12-17 15:49:10] CPU #11: 111.49 kH, 239.40 kH/s
[2017-12-17 15:49:10] CPU #3: 111.29 kH, 238.97 kH/s
[2017-12-17 15:49:10] CPU #15: 110.46 kH, 238.21 kH/s
[2017-12-17 15:49:10] CPU #2: 110.69 kH, 237.67 kH/s
[2017-12-17 15:49:10] CPU #10: 111.39 kH, 239.19 kH/s
[2017-12-17 15:49:10] CPU #14: 110.70 kH, 237.20 kH/s
[2017-12-17 15:49:10] CPU #12: 94.46 kH, 199.39 kH/s
[2017-12-17 15:49:15] CPU #12: 836.08 kH, 196.43 kH/s
[2017-12-17 15:49:15] Accepted 1/1 (100%), 2472.11 kH, 3722.47 kH/s
Tribus 4way 16 threads:
[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 15:50:38] tribus block 449392, diff 221.049
[2017-12-17 15:50:38] CPU #0: 2552.29 kH, 340.11 kH/s
[2017-12-17 15:50:38] CPU #1: 3076.95 kH, 410.02 kH/s
[2017-12-17 15:50:38] CPU #12: 2199.45 kH, 293.25 kH/s
[2017-12-17 15:50:38] CPU #8: 2508.86 kH, 334.41 kH/s
[2017-12-17 15:50:38] CPU #14: 2807.39 kH, 374.11 kH/s
[2017-12-17 15:50:38] CPU #9: 3002.02 kH, 400.25 kH/s
[2017-12-17 15:50:38] CPU #2: 2978.50 kH, 396.85 kH/s
[2017-12-17 15:50:38] CPU #3: 2993.07 kH, 398.79 kH/s
[2017-12-17 15:50:38] CPU #5: 2997.27 kH, 399.67 kH/s
[2017-12-17 15:50:38] CPU #4: 2927.24 kH, 390.44 kH/s
[2017-12-17 15:50:38] CPU #6: 2954.16 kH, 393.72 kH/s
[2017-12-17 15:50:38] CPU #7: 2983.57 kH, 397.69 kH/s
[2017-12-17 15:50:38] CPU #11: 3005.27 kH, 400.79 kH/s
[2017-12-17 15:50:38] CPU #15: 2946.88 kH, 393.06 kH/s
[2017-12-17 15:50:38] CPU #10: 2947.45 kH, 392.77 kH/s
[2017-12-17 15:50:38] CPU #13: 2742.90 kH, 365.66 kH/s
Tribus 4way 8 threads:
[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578
[2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s
[2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s
[2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s
[2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s
[2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s
[2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s
[2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s
[2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s
Apparently Tribus 4way likes SMT/HT here.