Pages:
Author

Topic: XMR RandomX mining with Ryzen 3900x (Read 429 times)

sr. member
Activity: 2142
Merit: 353
Xtreme Monster
April 05, 2021, 05:12:12 AM
#28
I just bought a 5900x to replace my old 3600, I found it to be stable 24/7 at 4.4ghz 1.14v, temps around 80c, I know, insane temp, I have a dual 2400rpm fans, 240mm watercooling setup, reviewers even said that with their triple fan watercooling setup 360mm, their temps were around 86c, insane, anyway, up to 80c is all right, more is a no go. I used to keep my 3600 around 70c full load, this one 10 degrees Celsius more, mining monero now for a bit cause is a very good stability test.

So with those settings, 24 threads 15300 h/s, memory 20cl and as most of you said latency is important, I found memory cl20 3333mhz to be a lot better than cl16 2666mhz, for cl16 I had to downclock it to 2666mhz. I guess in my system bandwidth was more important than latency, yeah buying a new binned 3600 cl16 memory could improve this hashrate but i don't think by a lot, 32gb of ram 2133mhz cl14 1.2v, overclocked all the way to 3333mhz cl20 1.36v is pretty good. Also mining monero is just stability a test, to me is a waste of time and efforts mining monero 24/7 at moment, not worth making 1.30 usd per day, I paid 600 usd for it, if profit was starting at 5 usd per day then I would think about it. Monero is and has always been a botnet cave.
newbie
Activity: 5
Merit: 0
April 03, 2021, 03:31:30 PM
#27
I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.

You know what, I realise I've made a big mistake  Roll Eyes.... I've confused the MSR Mod and huge page options! I'll edit my existing post and test with huge pages off tomorrow.

I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

EDIT: Added some results to original results post.

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Seems you are right, enabling huge pages is the biggest performance factor here. Then RAM timings, then MSR mod.


fyi, checked my friend's rig with 3900X and run using all 32 with huge pages ON (well, RAM timings is default) produce better performance than Tuned RAM timings first. i can verify that your approach enabling the huge pages made a better performance  Wink
member
Activity: 116
Merit: 66
March 29, 2021, 07:17:36 PM
#26
Randomx is dependant on latency which is why cache performance is more important than DRAM speed. There's no way
increasing DRAM performance, even by 25%, will increase the hashrate by the same amount. Huge pages, on the other
hand, can have that much of an effect by reducing effective memory latency with fewer TLB lookups.

I stand by my advice. There's nothing wrong with faster DRAM, it's just low on the priority list in this case.
If it's bottlenecked by RAM like 3900X/3950X then performance increase from lower latency RAM is almost 1:1. Slower CPUs like 6 and 8 core Ryzens are less affected but they also benefit from low latency RAM.
full member
Activity: 1397
Merit: 221
March 29, 2021, 06:16:47 PM
#25
I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

Yeah, I forgot Ryzen doubled the cache with Zen2, a 3950x can definitely use all threads.
My only Ryzen is a 1700.

Regarding the gain by fine tuning the DRAM timing, I was thinking more of DRAM OC which only provides
a marginal improvement in the best of cases. I think of DRAM timing as more of a penalty when it's wrong.
But the difference between wrong and right can be significant.
hero member
Activity: 682
Merit: 500
March 29, 2021, 04:51:04 PM
#24
I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.

You know what, I realise I've made a big mistake  Roll Eyes.... I've confused the MSR Mod and huge page options! I'll edit my existing post and test with huge pages off tomorrow.

I'm definitely using all 32, same with my 3950x. As you suggest though, this isn't optimal for all CPUs, my 9900k performs better using 8 threads(as opposed to 16) preferably with affinity set to 1 per physical core.

It seems to be a Ryzen thing where performance is better with all threads used.

EDIT: Added some results to original results post.

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Seems you are right, enabling huge pages is the biggest performance factor here. Then RAM timings, then MSR mod.
full member
Activity: 1397
Merit: 221
March 29, 2021, 04:07:58 PM
#23
I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

Thanks, RAM timing did better than I expected, however, one has to know what one is doing to get it right.

Next line prefetch is when the CPU speculatively prefetches the next cache line to optimize sequential memory access.
It's no good for random access. It's reported as "MSR MOD" by xmrig, need to run as admin.

Xmrig is smart enough to optimize the number of threads, are you sure you were using all 32?

The best performance is usually 1 thread for every 2MB of cache. This matches up with the number
of physical cores on most mainstream CPUs, meaning hyperthreading isn't necessary or helpful.
hero member
Activity: 682
Merit: 500
March 29, 2021, 03:21:46 PM
#22
I've found noticeable uplifts in performance from tuning RAM timings on my 9900K, 3950X and 5950X systems.

I'll fish out some exact numbers later from the 5950X system. I'm not sure it's 25%, but it's quite a hefty bump, I'm sure it's over 10%.

A breakdown would be interesting to measure the effects of enabling huge pages, disabling next line prefetch,
and adjusting DRAM timing.

For 24/7 running of XMRig I set a fixed CPU Vcore of 0.975v. This allows fixed clocks of 4.2GHz on the first 8 cores and 3.85GHz on the last 8 cores and a relatively low CPU package power reading of 133w in HWINFO64. RAM frequency is 1866MHz(3733mt/s) and IF is set to 1:1 with RAM frequency. There shouldn't be any random core boost behaviour affecting these results.

I'm using XMRig 6.10.0 with all 32 threads activated as this provides the best performance.

My 24/7 settings, tuned RAM timings + huge pages ON - MSR Mod ON - 19,830 h/s

Tuned RAM Timings + huge pages ON - MSR Mod OFF - 18,350 h/s

Tuned RAM timings + huge pages OFF - MSR Mod On - 16,250 h/s

Tuned RAM timings + huge pages OFF - MSR Mod OFF - 10,750 h/s

Stock RAM timings + MSR Mod On - 17,480 h/s

Stock RAM timings + MSR Mod OFF -  14,275 h/s

Ordering the performance uplift from each change combination
MSR Mod off to on for Tuned RAM timings gives a 1480h/s uplift, 8%

Stock RAM timings to Tuned RAM timings with MSR Mod on gives a 2350h/s uplift, 13.4%

MSR Mod off to on for Stock RAM timings gives a 3205h/s uplift, 22.5%

Stock RAM timings to Tuned RAM timings with MSR Mod off gives a 4075h/s uplift, 28.5%


It seems for me, tuned RAM timings provide a bigger performance uplift than using MSR Mod. Turning MSR Mod on gives a bigger benefit where stock/poor RAM timings are used.

You're probably wondering what RAM timings I've tuned. I change nearly all primaries and secondaries, some tertiaries too. Command Rate is 1t and GDM is on in both cases.

Stock primaries are 18,20,20,44,92(RC). Tuned I'm running 14,14,14,30,44.

I can't remember stock for the following, but I tune tRRDS 4, tRRDL 6, tWTRS 4, tWTRL 12, tWR 12, tCWL 14, tFAW 16, tRTP 8, tRFC 308

You've probably guessed I'm running RAM sticks with Samsung B-die Wink I'm running 4 sticks as well so Rank interleaving is enabled also. Not sure how much that is helping me.

I wasn't 100% sure what you meant by next line prefetch. Did you mean disabling hardware prefetch in the BIOS or setting scratchpad prefetch mode in the XMRig config.json file to 0 from 1?

EDIT: something that just came to mind, I used to use XMR-Stak, it was a lot more sensitive to RAM timings than XMRig, this was why I switched.
EDIT2: Swapped "huge pages" to "MSR Mod" and added some real huge page on/off results.
full member
Activity: 1397
Merit: 221
March 29, 2021, 01:00:32 PM
#21
I've found noticeable uplifts in performance from tuning RAM timings on my 9900K, 3950X and 5950X systems.

I'll fish out some exact numbers later from the 5950X system. I'm not sure it's 25%, but it's quite a hefty bump, I'm sure it's over 10%.

A breakdown would be interesting to measure the effects of enabling huge pages, disabling next line prefetch,
and adjusting DRAM timing.
hero member
Activity: 682
Merit: 500
March 29, 2021, 12:26:20 PM
#20
I don't know MSI either but RandomX isn't dependant on DRAM speed so it's a waste of time to tweak it.
Don't spread lies. It is dependant on RAM speed and latency, we're talking about the difference between 12 kh/s and 15 kh/s on 3900X.

Randomx is dependant on latency which is why cache performance is more important than DRAM speed. There's no way
increasing DRAM performance, even by 25%, will increase the hashrate by the same amount. Huge pages, on the other
hand, can have that much of an effect by reducing effective memory latency with fewer TLB lookups.

I stand by my advice. There's nothing wrong with faster DRAM, it's just low on the priority list in this case.

I've found noticeable uplifts in performance from tuning RAM timings on my 9900K, 3950X and 5950X systems.

I'll fish out some exact numbers later from the 5950X system. I'm not sure it's 25%, but it's quite a hefty bump, I'm sure it's over 10%.
full member
Activity: 1397
Merit: 221
March 29, 2021, 11:59:39 AM
#19
I don't know MSI either but RandomX isn't dependant on DRAM speed so it's a waste of time to tweak it.
Don't spread lies. It is dependant on RAM speed and latency, we're talking about the difference between 12 kh/s and 15 kh/s on 3900X.

Randomx is dependant on latency which is why cache performance is more important than DRAM speed. There's no way
increasing DRAM performance, even by 25%, will increase the hashrate by the same amount. Huge pages, on the other
hand, can have that much of an effect by reducing effective memory latency with fewer TLB lookups.

I stand by my advice. There's nothing wrong with faster DRAM, it's just low on the priority list in this case.
member
Activity: 116
Merit: 66
March 29, 2021, 06:29:58 AM
#18
I'm willing to try with cl 14 and 2666mhz to see how it goes. I know for a fact that bandwidth is more important than latency for most use cases, for mining monero I have no idea.
Latency is the most important for RandomX, bandwidth is not the bottleneck there. That said, 3200 CL14 will be better than 2666 CL14 because same timings at higher speed = lower latency.
sr. member
Activity: 2142
Merit: 353
Xtreme Monster
March 29, 2021, 05:47:51 AM
#17
I don't know MSI either but RandomX isn't dependant on DRAM speed so it's a waste of time to tweak it.
Don't spread lies. It is dependant on RAM speed and latency, we're talking about the difference between 12 kh/s and 15 kh/s on 3900X.

I'm willing to try with cl 14 and 2666mhz to see how it goes. I know for a fact that bandwidth is more important than latency for most use cases, for mining monero I have no idea.
member
Activity: 116
Merit: 66
March 29, 2021, 05:29:37 AM
#16
I don't know MSI either but RandomX isn't dependant on DRAM speed so it's a waste of time to tweak it.
Don't spread lies. It is dependant on RAM speed and latency, we're talking about the difference between 12 kh/s and 15 kh/s on 3900X.
sr. member
Activity: 2142
Merit: 353
Xtreme Monster
March 29, 2021, 02:42:13 AM
#15
your ram does have very high cl 18

I have 14,15,16 on my miners.

I use 3000 or 3200

if you crash 💥 when you open ryzen

something is set wrong in the bios.



For reference mine ryzen 5 3600 4ghz 1.25v, 12 threads does 7k with memory cl 20, 3333mhz. I guess it needs bandwidth more than anything, I would try 3600mhz cl 20 if your memory chip does.
full member
Activity: 1397
Merit: 221
March 28, 2021, 11:58:23 PM
#14
your ram does have very high cl 18

I have 14,15,16 on my miners.

I use 3000 or 3200

if you crash 💥 when you open ryzen

something is set wrong in the bios.



Do you think a Ram change to CL16 or CL14 is needed? 

I don’t know msi mobos.

had bad luck them and don’t use them any more.

so i don’t know how to set the bios.

if you botch bios settings you run low.

you have something wrong besides the ram cause you cant get ryzen to run.

I don't know MSI either but RandomX isn't dependant on DRAM speed so it's a waste of time to tweak it.
CPU OC will help but Ryzens are notorious for little OC potential.

You might have to tweak the number of threads and affinity to get best performance.
You might also get better speed if you run as administrator, but I hesitate to recommend it for security reasons.
legendary
Activity: 4116
Merit: 7849
'The right to privacy matters'
March 28, 2021, 11:31:25 PM
#13
your ram does have very high cl 18

I have 14,15,16 on my miners.

I use 3000 or 3200

if you crash 💥 when you open ryzen

something is set wrong in the bios.



Do you think a Ram change to CL16 or CL14 is needed? 

I don’t know msi mobos.

had bad luck them and don’t use them any more.

so i don’t know how to set the bios.

if you botch bios settings you run low.

you have something wrong besides the ram cause you cant get ryzen to run.
jr. member
Activity: 155
Merit: 6
March 28, 2021, 11:19:42 PM
#12
your ram does have very high cl 18

I have 14,15,16 on my miners.

I use 3000 or 3200

if you crash 💥 when you open ryzen

something is set wrong in the bios.



Do you think a Ram change to CL16 or CL14 is needed? 
legendary
Activity: 4116
Merit: 7849
'The right to privacy matters'
March 28, 2021, 11:11:44 PM
#11
your ram does have very high cl 18

I have 14,15,16 on my miners.

I use 3000 or 3200

if you crash 💥 when you open ryzen

something is set wrong in the bios.

jr. member
Activity: 155
Merit: 6
March 28, 2021, 11:01:50 PM
#10
I am at wits end with this cpu miner.  latest bios for MSI x470 gaming plus max.  i've attempted bios oc via DRAM calculator.  no success. It just crashes when appling adjusted dram timing and I have to reset to basic.  Ryzen Master closes with error anytime I attempt to open and make oc adjustments.  I attempted Clocktuner 2.0.  Immediately I receive messages of "operating system does not see CPPC core tags, CPPC, CBP and Cool and quiet and perferred cores should be enabled.  I've gone into bios and enabled CPPC, can't locate CBP or Cool and quiet mode.  Even subtle OC of CPU and Voltage only lets me achieve 9100Khs on xmrig.   I'm not sure if Windows if not verifiing CPPC or what.  Just know I am at a loss.  Any suggestions would be very appreciated.  New Windows 10 version, latested mobo bios.  This thing should be atleast doing 11000 khs without any OC.  Super puzzled.
sr. member
Activity: 433
Merit: 254
March 05, 2021, 06:01:50 AM
#9
Ok latest Bios.
Read also how to use ClockTuner.

And for DRAM Calc look up either the nicehash guide and read other guides like this one: https://www.techpowerup.com/review/amd-ryzen-memory-tweaking-overclocking-guide/
Its more than just pressing a button and entering timings.
Pages:
Jump to: