Looks like there is some performance problem with yescript16 implementation on coffee-lake cpus
Intel i5 4440 (stock 3.3GHz) @4 threads generates ~600h/s
Intel i7 5820k (no overclock, 3.6GHz) @12 threads generates ~1200h/s in pool and up to 1400 in solo mining (6 threads generate little less) with both cpuminer-opt (3.7.6 and 3.7.7v2) under windows64 and under ubuntu64
Intel i7 8700k (stock 4.3GHz) @12 threads generates only 950h/s (both pool and solo), overclocking to 5Ghz (50x100) with cache overclock to 4.6Ghz (stock is 4.2) gives no profit, even more usually performance degrade (power limit disabled, core temperatures are ~75C so no throttling involved), 1-2-3-4-5-6 threads gives less results, overclocking bus to 130Mhz (also with ram) gives no result - maximum is about 950h/s
Even more funny - on stock frequency, switching from AVX to SSE2 gives some performance boost from 950 to 1000-1050h/s
I understand that 8700k lacks quad-channel RAM and has little bit less L3 cache (12 vs 15Mb), compared to 5820, but bottleneck is obviously something different because ram overclock gives no result (so double channel is not a problem, we should see performance boost when overclocking bus and ram) and cache is also not a problem (25% cache is gone but we gain >30% frequency bonus (when overclocked) so our smaller cache works at higher speeds together with cpu cores - we can put less but more frequent and calc it in less time - ) also, compared to 4440, if cache was a bottleneck, we have twice more (12 vs 6Mb), taking in mind much higher speed and optimized pipelane, if cache only matters, we should have 2x gain, compared to 4440
I hope for a fix
That's quite a first post, you did your homework.
Your results are concerning but not a software issue. If Coffeelake has a design quirk that can be worked around in software
such a workaround would probably have a negative effect on other models. If it's a coffee lake issue it needs a Coffeelake fix.
On the technical side it's difficult to compare the 8700K with either of the two other CPUs you tested. I have a 6700K @ 4 GHz
and it gets 780 H/s. With a projected linear increase the 8700K should get around 1170. Clearly the 8700k has a problem.
I haven't done a deep dive into the architecture to see if there is a design change that could have an effect. As a new CPU
it could still have a few issues that need to be ironed out.
I noticed in my brief test that reducing the thread count by half on my 6700K had no effect on total hash rate. This tells me the bottleneck
is memory access (cache or main). You stated lower performance with fewer threads. That may be a clue. If your CPU is not
I/O bound when mining an I/O bound algo there may be a problem on the compute side.
Your observation that SSE2 build is faster than AVX is very interesting and deserves more testing. There is no AVX specific code in
yescrypt so there should be no difference in hash speed. Yescrypt is also very self contained, ie it doesn't use any libraries. The only effect
the AVX flag would have is on the compiler. It may compile code differently but there are no big gains between SSE2 and AVX, they are
both mostly limited to 128 bit vectors. It's only with AVX2 that there is a quantum leap to 256 bit vectors. Again I speculate but maybe the
compiler isn't yet tweaked for Codffeelake. What version did you compile with?
I suggest you try other algos with 6 and 12 threads to get a more complete profile. If some algos are affected more than others
it may reveal a pattern.
It would also be interesting to see if other Coffelake owners see the same issues.