Author

Topic: RANDOM-X on XEON... CACHE, FREQ'S OR CORES? (Read 530 times)

member
Activity: 236
Merit: 16
March 13, 2021, 05:45:03 AM
#31
E5-2667v2 is still pretty good for RandomX (especially on lighter versions like for Wownero) - 5 KH/s per CPU is possible
full member
Activity: 1424
Merit: 225
That's overthinking if mining is not the primary purpose. I certainly wouldn't rely on uncontrolled
mining benchmarks posted by random users.

Also keep in mind the 2MB rule only applies to RandomX and Cryptonight, in case you might want to mine
something else.

Regardless, the decision isn't between one Xeon model vs another, or even Xeon vs Core, but Intel vs AMD.
You could probably get the same level of performnce for a much lower price, regardless of the application,
with 12 or 16 core Ryzens.
legendary
Activity: 1106
Merit: 1014
You've got the CPUs, do the testing yourself instead of speculating about what they "would" do.
I don't have the CPUs, that's why I'm here, as I said in my first post - I'm trying to choose processors for a few home servers, and current candidates are the 10-core 2680v2 and 12-core 2696v2. I do have a few E5 systems around, but none of them have 12-core CPUs, and the only 10-core models I've got are slow 2648Lv2 - so I can't really test anything comparable right now. I'll probably end up with 6 servers in total, so it's 12 CPUs, and the price difference adds up. One of the ways to get back at least some part of the money spent is to use them for mining while it's profitable, hence the question about the hashrates. The benchmarks on xmrig site show ~ the same hashrates between these 2, and I just don't understand why. Thought someone here might have an idea.
full member
Activity: 1424
Merit: 225
You've got the CPUs, do the testing yourself instead of speculating about what they "would" do.
legendary
Activity: 1106
Merit: 1014
That's just you. Don't assume everyone disables HT, most don't because it helps compute bound algos.
I'm still not sure what HT has to do with this. Whether it's on or off doesn't really matter since we're talking E5 CPUs and xmrig. Every E5 -EP CPU from v1 to v4 is the same in regards to cache sizes: 256KB L2 per every core, and 2+MB L3 per every core. It's like this for literally every single E5 Xeon: Sandy Bridge-EP (v1), Ivy Bridge-EP (v2), Haswell-EP (v3) and Broadwell-EP (v4) - only Skylake has brought change to this, but that's also when they dropped "E5" name. So when we're talking E5 Xeons, they're all the same and they're all limited by L2 cache. L3 cache is basically irrelevant, yet everyone and their mom keeps talking about 2MB of L3 cache like it matters - it doesn't.

Whether by design or by coincidence most CPUs have around 2MB of cache per physical core
But they don't. 99% of E5 CPUs have exactly 2.5MB of L3 cache per core, not 2MB, only a few oddballs like E5-1650 or E5-4607 have 2MB per core. Some have even 3+ MB of L3 per core, like E5-2667v2, 2673v2, 2687wv2 etc. It doesn't matter anyway, cause all of them are limited by L2 cache size: it's 256KB per physical core, and therefore number of threads for the miner is exactly the same as the number of physical cores.

The number of physical cores is irrelevant, it's the number of miner threads, whether HT is enabled or not.
The number of physical cores is not irrelevant, it's everything actually, cause each miner's thread needs 256KB of L2, and all E5 CPUs have only 256KB of L2 per physical core. Which means that if it's a 6-core CPU, then it's gonna be 6 threads in the miner, if it's a 10-core, then it's 10 miner threads etc. Even though almost all of them have 2.5MB of L3 per core, and some have even more (like those 8-cores with 3.125MB per core) - it doesn't matter cause they're all still limited by L2. A 10-core E5-2680v2 has 25MB of L3 cache, so if one would blindly follow the "2MB of L3 per thread" rule, and tried to run 12 threads - the hashrate would not be higher than with 10 threads. Same with something like E5-2667v2 - it also has 25MB of L3 cache, so up to 12 miner threads is ok then? No - like every other E5, it's limited by L2, and thus the highest hashrate is gonna be with 8 threads.

Number of miner threads = number of physical cores - that's the rule for every single E5 Xeon, cause they're all limited by those 256KB of L2, and not by L3 size (since none of them have less than 2MB of L3 per core). Hyper-Threading is something that is completely irrelevant here, it doesn't matter whether it's on or off, highest hashrate is achieved with number of miner threads = number of physical cores and mining software (at least xmrig) automatically detects it and sets the proper number of threads based on the cpu model, whether HT is on or off.

None of that is new, it's been said in this thread before. But my question has not been "answered early on in this thread", as you said. The question is - why the 10-core E5-2680v2 (3.1GHz all-core turbo) shows the same hashrate as the 12-core E5-2696v2 (3.1 GHz all-core turbo)? Why is there linear scaling from 4 cores to 10 cores, but not from 10 cores to 12 cores? Both E5-2696v2 and E5-2697v2 show about the same (or even lower) hashrates than E5-2680v2, so their single-thread performance is lower for some reason. I just thought maybe someone here have an idea why.
full member
Activity: 1424
Merit: 225
It's never used for mining with E5 v1/v2 - either it's disabled in BIOS, or the miner's threads are bound to the physical cores.

That's just you. Don't assume everyone disables HT, most don't because it helps compute bound algos. The number of physical cores
is irrelevant, it's the number of miner threads, whether HT is enabled or not.

Whether by design or by coincidence most CPUs have around 2MB of cache per physical core and RandomX has been
engineered to that spec.

legendary
Activity: 1106
Merit: 1014
You're counting physical cores but the CPUs are hyperthreaded.

Divide the L3 cache size by 2M and that's the optimum number of threads to run.
Any more and total hashrate starts to drop.

The question was answered early on in this thread.
I'm not following. What does HT have to do with this? It's never used for mining with E5 v1/v2 - either it's disabled in BIOS, or the miner's threads are bound to the physical cores. Of course I'm "counting physical cores", cause that's what matters in RandomX mining with these Xeons. Every single E5 V2 CPU in existence has 256KB of L2 and 2M+ of L3 per physical core, so that's how they're used for mining - with the number of miner's threads equal to the number of cores. All the benchmarks out there are like that, number of threads = number of physical cores. I don't understand your post and my question was not answered in this thread, it has nothing to do with HT whatsoever.
full member
Activity: 1424
Merit: 225
You're counting physical cores but the CPUs are hyperthreaded.

Divide the L3 cache size by 2M and that's the optimum number of threads to run.
Any more and total hashrate starts to drop.

The question was answered early on in this thread.
legendary
Activity: 1106
Merit: 1014
It's something to do with L3 cache on Intel CPUs, it doesn't scale well with more cores. I observed similar problems with 6-core vs 4-core Xeons.
What could it have to do with L3 cache? RandomX needs 256KB of L2 and 2MB of L3 for every thread. None of the Ivy Bridge EP CPUs are limited by L3 as far as I can see, every single E5 v2 CPU has more than 2MB of L3 per core, so should scale with increased core count just fine. I'm looking at 4-cores vs 6-cores v2, and the scale is pretty much linear.

Here's a quad 2637v2 (3.5GHz base, 3.6GHz turbo):
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2637+v2+%40+3.50GHz
And here's a hexa 2643v2 (also 3.5GHz base, 3.6GHz turbo):
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2643+v2+%40+3.50GHz

Each thread hashes at ~ 550-600 H/s, resulting in ~ 4.5Kh for a pair of quads and 6.8-7Kh for a pair of hexa CPUs. Compare these to the 8-core 2667V2, and it also does ~ 550H per thread, or 8.7-9KH for a pair:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2667+v2+%40+3.30GHz

None of the 10-core Ivy EP CPUs are clocked as high as 3.6 GHz, so no direct comparison can be done here, the fastest 2690V2 is 3.0GHz base / 3.3GHz turbo, and it does 460H per thread, or 9.2KH for a pair:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2690+v2+%40+3.00GHz
So it also seems to scale, especially if the turbo wasn't working right on that one (and the memory was at 1066MHz). A slower 2680v2 10-core does up to 515H per thread, and it's only 2.8/3.1GHz cpu:
https://xmrig.com/benchmark?cpu=Intel%28R%29+Xeon%28R%29+CPU+E5-2680+v2+%40+2.80GHz

So at least up to 10 cores the scaling looks pretty much linear. It's the 12-core CPUs that aren't any faster for some reason, at least by looking at those benchmarks. 2696V2 is supposed to work at the same 3.1GHz turbo as 2680V2. If the latter does 10KH for a pair, then why wouldn't a pair of 2696V2 do 12KH? Do they throttle and not reach 3.1GHz all-core turbo? It's only 5W difference in TDP on paper between 2696v2 and 2680v2, but most likely a bit more in real power draw. Too bad xmrig benchmarks don't show the actual clocks. Sad
member
Activity: 116
Merit: 66
It's something to do with L3 cache on Intel CPUs, it doesn't scale well with more cores. I observed similar problems with 6-core vs 4-core Xeons.
legendary
Activity: 1106
Merit: 1014
Could someone explain why 10-core Ivy Bridge CPUs seem to show the same hashrates as the 12-core CPUs? I'm trying to pick processors for a couple of simple home servers (dual 2011), that I'd also use for mining while it's profitable, and looking at the benchmarks on the xmrig website - I'm confused. E5-2680V2 is a 115W TDP 10-core 2.8 GHz base that's supposed to hit 3.1 GHz all-core turbo, and E5-2696V2 is a 120W TDP 12-core 2.5 GHz that is also supposed to hit 3.1 GHz all-core turbo. So they look identical other than 10 threads vs 12 threads, but xmrig benchmarks suggest that they both only hash at ~ 5k each. I figured if the 10-core does 5k, then the 12-core should be about 6k?

Am I missing something here? I thought maybe 2696V2 doesn't reach the 3.1 GHz turbo during mining for some reason, and that's why it shows ~ the same hashrate as 2680V2 - cause the latter has higher base clock. But then I looked at the benchmarks of 2697V2 (which has higher than 2696V2 base of 2.7 GHz, but lower turbo of 3.0 GHz), and it's the same thing - also hovers around 5k per CPU. They're all 256KB L2 cache, 25/30MB L3 cache, yet extra 2 cores don't seem to bring any hashrate improvements? Is this a RandomX thing, or something with XMRig miner or benchmarks?
newbie
Activity: 1
Merit: 0
February 13, 2020, 02:45:56 PM
#20
is Xeon X5650 or X5670 better for mining Monero compare to the latest Xeon CPUs in the market right now? any where I can see hashrate comparison?

2*Xeon E5440 = 1200 H/s
2*Xeon X5670 = 3400 H/s
2*Xeon E5-2670 = 6700 H/s
legendary
Activity: 1470
Merit: 1114
February 12, 2020, 01:53:37 PM
#19

Your nick looks familiar.Smiley

You got me on a technicality, I don't consider Pemtium and Celeron true X86_64 specifically
because they are crippled implementations.

In practical terms I was correct but let me rephrase to be technically correct.

Every mainstream x86_64 CPU, excluding very low cost CPUs, like Pentiums and Celerons,
built in the last 10 years has AES.



 
newbie
Activity: 23
Merit: 6
legendary
Activity: 1470
Merit: 1114
February 12, 2020, 06:22:08 AM
#17
Intel Xeon CPUs are easier to configure for me than Ryzen, I still don't understand how Ryzen CPUs are made, one other thing is Xeon CPUs support AES, with AES you will definitely get higher hashrate compare to the default hashrate

Every x86_64 CPU built in the last 10 years has AES.

I don't know what you mean that yoy don't unerstand how Ryzen's are made.
The only thing that affects mining is you need to set cpu affinity on Ryzen when using
fewer threads than avaiable. On Intel it isn't necessary.

On Intel thread 1 is on core 1 while on Ryzen thread 2 is on core 1 SMT with thread 0.
 
To use 8 of 16 threads on intel the default affinity 0x00ff assures only one thread per
physical core.
 
With Ryzen you need to specify affinity 0x5555 or 0xaaaa (alternating 0 & 1) to avoid putting 2
threads on the same physical core.
member
Activity: 434
Merit: 19
February 12, 2020, 04:52:35 AM
#16
Intel Xeon CPUs are easier to configure for me than Ryzen, I still don't understand how Ryzen CPUs are made, one other thing is Xeon CPUs support AES, with AES you will definitely get higher hashrate compare to the default hashrate
legendary
Activity: 1470
Merit: 1114
February 11, 2020, 02:18:02 AM
#15
So added more ram, now its quad channel. no change in hash rate.

That caught my attention. I was thinking Ryzen was pushing 2 channel with too many threads
but apparently not.

The only algos I can think of that are memory bottlenecked are srcyptn2 and lyra2z330.
full member
Activity: 1148
Merit: 116
February 09, 2020, 03:30:04 AM
#13
is Xeon X5650 or X5670 better for mining Monero compare to the latest Xeon CPUs in the market right now? any where I can see hashrate comparison?
legendary
Activity: 1176
Merit: 1015
February 08, 2020, 05:23:04 PM
#12
Quote
-exploiting 'xeon v3 turbo hack' can boost mining performance easily 20-30%.
- never heard of this, can you point me?

Looking at your signature, hehe...

Out for another year or so.
full member
Activity: 241
Merit: 100
To Hash or not to Hash, that's what the question
February 08, 2020, 04:35:43 PM
#11
Quote
-exploiting 'xeon v3 turbo hack' can boost mining performance easily 20-30%.
- never heard of this, can you point me?

I got my e5 2070 v3 installed. temps are good. I Get 5800hs out of both. Mem runs on two channels, bought more sticks, will get them next week installed, curious if 4 channels will do any good...
Comparing to two E5 2665 V2, the v3's are less powerful. I get 6200-6300Hs out of V2s
legendary
Activity: 1176
Merit: 1015
February 08, 2020, 03:38:51 PM
#10
Excellent info, very informative. I don't have any Xeons but it was still quite interesting.

Thanks for identifying the 256K L2 dependency, I wasn't previously aware of it.

To make matters worse latest xeons have more than enough of L2 but guess what... some L3 is missing.

To be on topic, XEON... CACHE, FREQ'S OR CORES?, perfect answer is FREQ x CORES without HT.

OT: Most cpu-algos are not L2 limited, must Do Your Own Research.


legendary
Activity: 1470
Merit: 1114
February 08, 2020, 12:53:47 PM
#9

Had a chance to test many different v3 xeons and compare randomx performance to specs, some observations:


Excellent info, very informative. I don't have any Xeons but it was still quite interesting.

Thanks for identifying the 256K L2 dependency, I wasn't previously aware of it.

With improvements to turbo boost, boosting more cores, the benefits of mining with
N/2 threads is increased.
legendary
Activity: 1176
Merit: 1015
February 08, 2020, 05:25:02 AM
#8
- this is very reasonable, where did you get numbers from?

i have found locally 2670 v3, will install soon and see what they are capable of

Had a chance to test many different v3 xeons and compare randomx performance to specs, some observations:

-algo needs 256Kb L2 for every thread so hyperthreading is basically useless, disable it to save power or use cpu affinity to bind threads to 'real' cores.
-turbo bins, models with highest all core turbo are the best.
-tdp, on higher core count models after optimal tweaking you start hitting the tdp wall and throttling starts, negative vcore offset helps to a point where cpu crashes due to too low voltage.
-xeons run really cool, with a decent cooler heat is never a problem.
-exploiting 'xeon v3 turbo hack' can boost mining performance easily 20-30%.

What I forgot to try was some REALLY tight memory timings like with ryzens, v3 supports only DDR4-2133 and depending on how many memory sticks you have it runs on 1-4 way mode.

For example, looking at 2670 v3 I see it is a 12c/24t cpu with tdp of 120w, 2.3GHz base clock and turbo 3/3/3/3/3/3/3/4/5/6/8/8 meaning it will run on 2.6-3.1 GHz depending on how many threads you are using. 12 threads @2.6GHz will end up somewhere in 5k range.

Most likely you are limited also on your INTEL S2600CW2SR motherboard and the lack of tuning options and/ or possibility to use cheap ES/QS processors.

Do a search for 'List_of_Intel_Haswell-based_Xeon_microprocessors', really nice listing which clearly shows the most important specs for mining.



full member
Activity: 241
Merit: 100
To Hash or not to Hash, that's what the question
February 07, 2020, 04:45:35 PM
#7
Quote
2x 2696v3 ($800?) 36 threads @2.8GHz gives you 18k consuming 290w
- this is a bit too much of $ in...

Quote
2x 4627v3 ($100?) 20 threads @3GHz gives you 10k consuming 260w
- this is very reasonable, where did you get numbers from?

i have found locally 2670 v3, will install soon and see what they are capable of
legendary
Activity: 1176
Merit: 1015
February 06, 2020, 03:23:50 PM
#6
v3 and v4 Xeons are L2 limited on randomx no matter how much L3 they have.

2x 4627v3 ($100?) 20 threads @3GHz gives you 10k consuming 260w.

2x 2696v3 ($800?) 36 threads @2.8GHz gives you 18k consuming 290w.

To compare, L3 limited ryzen 3600 ($200) 12 threads @4.2GHz gives 7k consuming 100w.
legendary
Activity: 1470
Merit: 1114
February 05, 2020, 02:45:04 PM
#5
Adding some detail to phillipma's reply...

so for example E5 2670 V3 ( 12core 30mb 2,3ghz) vs E7 8893 (4core 45mb 3,2ghz)  
2670 should be better right?

devide core into cache.

so   30/12 = 2.5   this works  since the 2.5 is bigger then 2 by just a bit

joblo> this is a good match.

and

45/4 = 11.25   this works less since 11.25 is way bigger then   2

you can set threads used lower

if  you have say 40mb and 32 cores that is

40/32 = 1.25  this is under 2

so you would set this to

20 of the 32 cores

joblo> you have more cache than the  cores can use, need morecore with this much cache


and get

40/20 = 2

get cheap cpu's with big cache


A ratio of 2MB cache / core(thread) is optimum.
legendary
Activity: 4326
Merit: 8899
'The right to privacy matters'
February 05, 2020, 02:14:28 PM
#4
so for example E5 2670 V3 ( 12core 30mb 2,3ghz) vs E7 8893 (4core 45mb 3,2ghz)  
2670 should be better right?

devide core into cache.

so   30/12 = 2.5   this works  since the 2.5 is bigger then 2 by just a bit


and

45/4 = 11.25   this works less since 11.25 is way bigger then   2


you can set threads used lower

if  you have say 40mb and 32 cores that is

40/32 = 1.25  this is under 2

so you would set this to

20 of the 32 cores


and get

40/20 = 2

get cheap cpu's with big cache
full member
Activity: 241
Merit: 100
To Hash or not to Hash, that's what the question
February 05, 2020, 01:57:37 PM
#3
so for example E5 2670 V3 ( 12core 30mb 2,3ghz) vs E7 8893 (4core 45mb 3,2ghz)  
2670 should be better right?
legendary
Activity: 1470
Merit: 1114
February 04, 2020, 09:47:11 PM
#2
1. cache / cores

I believe RandomX use the same cache as the old Cryptonight, 2 MB/thread.
You only need as many cores as you have cache to support them.

2. Frequency

3. Cores, see 1.
full member
Activity: 241
Merit: 100
To Hash or not to Hash, that's what the question
February 04, 2020, 09:37:26 PM
#1
INTEL S2600CW2SR with dual 2011 sockets that support Xeon V3 and V4 cpus.Trying to figure out what is the best watt/$ processor for this board to run XMRIG? Would CACHE be the first priority then NUMBER OF CORES and then FREQUENCY or how should i prioritize factors involved>?
Board supports up to 140W per socket.
Thanks
Jump to: