* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.
If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing
If it's not the case, it might be something in the recent changes
In both cases, I'm using the official Windows binaries.
A few points. and questions:
What's your CPU and OS?
Summary: Changes were made to support Windows CPU groups
There have been no changes to yescrypt code.
There have been no recent changes to Windows buld process
Edit: please use -D to display affinity debug info.
Long version:
CPU limit and affinity:
A change was made initially in 3.9.0, later tweaked to add CPU groups support to Windows.
This may be responsible for that issue. I can have a look at the code in light of the specific symtoms you saw.
Do you use CPU groups? Which version of Windows? You said you affine the process seperately and that
causes problems. That could be related to CPU groups if the process is in a different group from the miner
threads.
General performance degredation:
The binaries are still made the same way using mingw,specifically using the winbuild-cross.sh script.
The compiler was upgraded (evident in the startup messages showing the compiler version) prior
to 3.9.
However, I have been making some architectural changes that may have a small impact on performance,
though 5% seems a bit much. I'm making them due to issues in preparation for AVX512 where
up to 16 lanes can run parallel in a single CPU tread. The overhead for interleaving and deinterleaving the data,
the increase in memory usage, etc, don't scale well.
Some of those changes affect the locally displayed hashrate, both in volume of thread hash reports and
their values. I have reduced the latency between detecting a solution and submitting it to the pool. As
a side effect there are fewer hash meter reports and the reported hashrate is actually from the previous
block. Another side effect is the reduced latency is not reflected in the hash rate reported by the miner.
I considered it an acceptible compromise as it's just optics. The acumulated share difficulty over time is
what the pool uses. In both the miner and the pool the hashrate is an artificial metric.
The changes result in less deinterleaving of final hash (check for solution before interleaving instead of
after), and submitting a share immediately when found while continuing the scan instead of aborting the scan
to submit the share an start a new scan. On their face it is obviously more efficient but I measured no
discernable difference in reprted hash rate.
These changes are being migrated slowly and can be confirmed by a more detailed share submitted message
indicating which thread and lane found the solution.
Sorry for the ramble but there's a lot going on at the same time. I appreciate the testing and reports of any deviations
from previous versions, especially the unintended ones.