That's just you. Don't assume everyone disables HT, most don't because it helps compute bound algos.
I'm still not sure what HT has to do with this. Whether it's on or off doesn't really matter since we're talking E5 CPUs and xmrig. Every E5 -EP CPU from v1 to v4 is the same in regards to cache sizes: 256KB L2 per every core, and 2+MB L3 per every core. It's like this for literally every single E5 Xeon: Sandy Bridge-EP (v1), Ivy Bridge-EP (v2), Haswell-EP (v3) and Broadwell-EP (v4) - only Skylake has brought change to this, but that's also when they dropped "E5" name. So when we're talking E5 Xeons, they're all the same and they're all limited by L2 cache. L3 cache is basically irrelevant, yet everyone and their mom keeps talking about 2MB of L3 cache like it matters - it doesn't.
Whether by design or by coincidence most CPUs have around 2MB of cache per physical core
But they don't. 99% of E5 CPUs have exactly 2.5MB of L3 cache per core, not 2MB, only a few oddballs like E5-1650 or E5-4607 have 2MB per core. Some have even 3+ MB of L3 per core, like E5-2667v2, 2673v2, 2687wv2 etc. It doesn't matter anyway, cause all of them are limited by L2 cache size: it's 256KB per physical core, and therefore number of threads for the miner is exactly the same as the number of physical cores.
The number of physical cores is irrelevant, it's the number of miner threads, whether HT is enabled or not.
The number of physical cores is not irrelevant, it's everything actually, cause each miner's thread needs 256KB of L2, and all E5 CPUs have only 256KB of L2 per physical core. Which means that if it's a 6-core CPU, then it's gonna be 6 threads in the miner, if it's a 10-core, then it's 10 miner threads etc. Even though almost all of them have 2.5MB of L3 per core, and some have even more (like those 8-cores with 3.125MB per core) - it doesn't matter cause they're all still limited by L2. A 10-core E5-2680v2 has 25MB of L3 cache, so if one would blindly follow the "2MB of L3 per thread" rule, and tried to run 12 threads - the hashrate would not be higher than with 10 threads. Same with something like E5-2667v2 - it also has 25MB of L3 cache, so up to 12 miner threads is ok then? No - like every other E5, it's limited by L2, and thus the highest hashrate is gonna be with 8 threads.
Number of miner threads = number of physical cores - that's the rule for every single E5 Xeon, cause they're all limited by those 256KB of L2, and not by L3 size (since none of them have less than 2MB of L3 per core). Hyper-Threading is something that is completely irrelevant here, it doesn't matter whether it's on or off, highest hashrate is achieved with number of miner threads = number of physical cores and mining software (at least xmrig) automatically detects it and sets the proper number of threads based on the cpu model, whether HT is on or off.
None of that is new, it's been said in this thread before. But my question has not been "answered early on in this thread", as you said. The question is - why the 10-core E5-2680v2 (3.1GHz all-core turbo) shows the same hashrate as the 12-core E5-2696v2 (3.1 GHz all-core turbo)? Why is there linear scaling from 4 cores to 10 cores, but not from 10 cores to 12 cores? Both
E5-2696v2 and
E5-2697v2 show about the same (or even lower) hashrates than
E5-2680v2, so their single-thread performance is lower for some reason. I just thought maybe someone here have an idea why.