[XMR] JCE Miner Cryptonight/forks, now with GPU! - page 94.

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: 4ward on June 23, 2018, 12:47:08 AM

Quote from: UnclWish on June 22, 2018, 06:03:51 PM

Quote from: s0ftcorn on June 22, 2018, 05:12:07 PM

Quote from: UnclWish on June 22, 2018, 04:50:25 PM

Quote from: Lonnegan64 on June 22, 2018, 01:33:38 AM

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.

I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...

Not sure which Wikipedia you are looking at, but on https://en.wikipedia.org/wiki/Epyc you can see L2 cache
or even better, on https://en.wikichip.org/wiki/amd/epyc you have even more information

And if you use logic, there is no way a cpu will have L3 cache and not have L2/L1, since in that case L3 becomes L1...

But in both your links there is no pointed L1 cache... i know that L1/L2 must be, I just wanted to know amount...

JCE-Miner

member

Activity: 350

Merit: 22

0.29e available

Quote

Fix 32+ cpu support
Fix JSON syntax

If you have less than 32 cpu and don't use JSON output, no need to update.

4ward

member

Activity: 473

Merit: 18

Quote from: UnclWish on June 22, 2018, 06:03:51 PM

Quote from: s0ftcorn on June 22, 2018, 05:12:07 PM

Quote from: UnclWish on June 22, 2018, 04:50:25 PM

Quote from: Lonnegan64 on June 22, 2018, 01:33:38 AM

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.

I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...

Not sure which Wikipedia you are looking at, but on https://en.wikipedia.org/wiki/Epyc you can see L2 cache
or even better, on https://en.wikichip.org/wiki/amd/epyc you have even more information

And if you use logic, there is no way a cpu will have L3 cache and not have L2/L1, since in that case L3 becomes L1...

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: s0ftcorn on June 22, 2018, 05:12:07 PM

Quote from: UnclWish on June 22, 2018, 04:50:25 PM

Quote from: Lonnegan64 on June 22, 2018, 01:33:38 AM

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.

I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...

s0ftcorn

newbie

Activity: 70

Merit: 0

Quote from: UnclWish on June 22, 2018, 04:50:25 PM

Quote from: Lonnegan64 on June 22, 2018, 01:33:38 AM

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: Lonnegan64 on June 22, 2018, 01:33:38 AM

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

JCE-Miner

member

Activity: 350

Merit: 22

Impressive processor!

And yes i've IPBC-specific assembly for ryzen/threadripper so JCE is to be faster. That's why the binary is so big, it contains optimizations for all possible combinations.

I look at the ghost CPU 32 bug, your log is very helpful, as you might expect, i don't own any epyc myself.
I'm fixing the JSON regression too.

edit: both bugs fixed. The CPU32 bug was due to an overflow in my CPU counter. It's somehow luck the remaining code was functional. The biggest thread flood i had tested so far was my ryzen (12 logical cpu) plus the five double-mem GPUs of my rig (2x 5GPU) total 22 threads.

Now rebuilding version 0.29e, and 0.29d will be removed.

Lonnegan64

jr. member

Activity: 37

Merit: 5

When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file. Huh

Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s. Cheesy

JCE-Miner

member

Activity: 350

Merit: 22

I admit i focused my test on Haven for the 0.29d to save time to release the gpu version asap. ok i note the problem with huge page release.

siroliver

newbie

Activity: 23

Merit: 0

29c not releasing hugepages

JCE-Miner

member

Activity: 350

Merit: 22

there's a very light optim between 0.29b and 0.29c but gain is barely noticeable. not between 0.29c and d. d is a bugfix version.
On the g4560 the best config should be -t 2 if you mine cn-v7 or -t 4 if cryptolight/turtle/ipbc/aeon

json : right, a typo in my code… how lame, i'm good to rebuild it again, thanks for report!

KriptoGuruTR

member

Activity: 564

Merit: 19

Intel G4560 - Stock
DDR 2133
Linux JCE 0.29c w/hugepages

61 Kh/s

4ward

member

Activity: 473

Merit: 18

0.29d - api is returning invalid json (missing "," before the max speed)
xmr-stak api mode is fine

JCE-Miner

member

Activity: 350

Merit: 22

0.29d online - Windows and Linux

Code:

Max hashrate when you press r
BLOC added
Bixbite removed
Haven algo selection fixed

This is probably the last CPU-only version

robminer80

newbie

Activity: 20

Merit: 0

Quote from: JCE-Miner on June 19, 2018, 04:47:53 PM

impressive tip from robminer80 who gives better advices than the dev !
Probably thanks to the repartition on cores 1 and 3

I postpone 0.29d to make more tests and add coin BLOC (another CN-Heavy coin)

Apu's L2 cache is 2MB + 2MB in two modules, Cpu 0 and 1 have access to the first 2MB, Cpu 2 and 3 to the other 2MB

JCE-Miner

member

Activity: 350

Merit: 22

impressive tip from robminer80 who gives better advices than the dev !
Probably thanks to the repartition on cores 1 and 3

I postpone 0.29d to make more tests and add coin BLOC (another CN-Heavy coin)

whotheff

member

Activity: 762

Merit: 35

Quote from: robminer80 on June 19, 2018, 08:36:42 AM

Quote from: whotheff on June 17, 2018, 04:53:12 AM

Quote from: JCE-Miner on June 17, 2018, 03:29:52 AM

thanks for the test, it seems my autoconfig is bad on the A10, it allocates too many threads. I'll fix it, thanks.

The multihash (double hash is the 2-case, you can set from 1- to 6- ) is also called low-power in stak IIRC. It's about using, on one CPU core, twice the register and twice the cache to get sometimes twice the speed. The trick is that it let the other cores free, so it consume less power and allows the Turbo to enable, for CPU with turbo.

Technically, it's good, sure when you want to save power, but also when you run out of cores and not of cache. If you have a CPU with 2 cores but 8M cache, normal config would give only 2x2M = 4M cache used.

you may enable double-hash to use 2x2x2M cache = 8M of cache, and get some extra perf.
It works more or less depending on the CPU. It's very efficient on Ryzen, and not at all on Core2.

I looked closer at the A10, and yeah that's a little APU with little cache.
I give you an experimental config that could let you get some extra perf, but not sure, i cannot test, i've no A10.

"cpu_threads_conf" :
[
   { "cpu_architecture" : "auto", "affine_to_cpu" : 0, "use_cache" : true },
   { "cpu_architecture" : "auto", "affine_to_cpu" : 1, "use_cache" : false },
   { "cpu_architecture" : "auto", "affine_to_cpu" : 2, "use_cache" : true },
   { "cpu_architecture" : "auto", "affine_to_cpu" : 3, "use_cache" : false },
]

now finishing 0.29c, the last CPU-only version, with some updates.

Try:

"cpu_threads_conf" :
[
{ "cpu_architecture" : "trinity", "affine_to_cpu" : 1, "use_cache" : true },
{ "cpu_architecture" : "trinity", "affine_to_cpu" : 3, "use_cache" : true },
]

Another +10hashes gained this way! It might be a temporary thing, but if
I dont' touch anything on the PC the hashrate goes to 137 H/s.

I'm not sure why it did it, since in both cases it uses generic aes avx , large cache
and two cores.

JCE-Miner

member

Activity: 350

Merit: 22

Masari works, that's Haven which may be broken. Now testing 0.29d, I added the Max hashrate, even if it's still a pure CPU version

Code:

21:00:10 | Hashrate CPU Thread 0: 11.42 h/s
21:00:10 | Hashrate CPU Thread 1: 11.54 h/s
21:00:10 | Hashrate CPU Thread 2: 11.67 h/s
21:00:10 | Hashrate CPU Thread 3: 11.71 h/s
21:00:10 | Total: 46.33 h/s - Max: 47.44 h/s

That's haven on my core2-quad

Iamtutut

full member

Activity: 1120

Merit: 131

Quote from: JCE-Miner on June 19, 2018, 02:46:07 PM

out of context, reading a file is portable. But i cannot legitimately (and don't want to) grab some external app from somewhere and make it write a text file that JCE would read, that's a pretty dirty way of working.

However that's rather a GUI/Monitor job, a GUI tool could use JCE to mine and in the same time monitor the cpu temperatures. Get the hashrate, get the temperature, and display both.

Hi, just mined a bit of Masari with the Ryzen 2400G, hashrate in "auto'" config was around 280H/s.

JCE-Miner

member

Activity: 350

Merit: 22

out of context, reading a file is portable. But i cannot legitimately (and don't want to) grab some external app from somewhere and make it write a text file that JCE would read, that's a pretty dirty way of working.

However that's rather a GUI/Monitor job, a GUI tool could use JCE to mine and in the same time monitor the cpu temperatures. Get the hashrate, get the temperature, and display both.

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 94. (Read 90858 times)