That's a Piledriver, i haven't one but have an Excavator, i could use it as a test base for that archi. Current optimized assembly is for Ryzen only. And i never tested Intel Aes yet, but got tester reports of mitigate improvement, very close to xmrig.
On forks, jce is better, but on pure Cryptonight aes 64 i beat the compilers by 1%, so if you use one thread only, i'm not suprised the gain is only 1h. When i implement double hash, i may be better.
P.S. I don't see any substantial difference in speed from XMRig on 32-bit nonAES CPU Core i3.
You were right, i re-read my assembly and found a bit optimization mistake. Retested i got a big
+5% perf increase. My test core2 gives 93 instead of 88. You can expect 0.19 to be significantly faster on non-aes 32, and slightly on non-aes 64.
Don't start several jce on the same computer, rather use parameter -t or config file to enable more threads. Just there's a limit at 32 threads per jce instance.