Pages:
Author

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 94. (Read 90858 times)

sr. member
Activity: 1484
Merit: 253
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.
I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...

Not sure which Wikipedia you are looking at, but on https://en.wikipedia.org/wiki/Epyc you can see L2 cache
or even better, on https://en.wikichip.org/wiki/amd/epyc you have even more information

And if you use logic, there is no way a cpu will have L3 cache and not have L2/L1, since in that case L3 becomes L1...

But in both your links there is no pointed L1 cache... i know that L1/L2 must be, I just wanted to know amount...
member
Activity: 350
Merit: 22
0.29e available

Quote
Fix 32+ cpu support
Fix JSON syntax

If you have less than 32 cpu and don't use JSON output, no need to update.
member
Activity: 473
Merit: 18
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.
I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...

Not sure which Wikipedia you are looking at, but on https://en.wikipedia.org/wiki/Epyc you can see L2 cache
or even better, on https://en.wikichip.org/wiki/amd/epyc you have even more information

And if you use logic, there is no way a cpu will have L3 cache and not have L2/L1, since in that case L3 becomes L1...
sr. member
Activity: 1484
Merit: 253
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.
I know it. No need to explain what means L1/L2/L3 and what they do...
I just look in wiki and there no info about L1/L2 on Epyc... Just stays "N\A"...
newbie
Activity: 70
Merit: 0
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...

L1 -> Level 1
L2 -> Level 2
L3 -> Level 3

as the levels get higher, the cache is slower but larger. Maybe AMD is using some clever technique so L1/L2 caches are more shared than they are already, so mentioning it is useless. But absend L1 or L2 cache would have a huge impact on performance.
sr. member
Activity: 1484
Merit: 253
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
In wikipedia no info about L1/L2 cache on Epyc. Only L3. Does L1/L2 cache exists on it or not? Just interesting...

P.S. 2Mb L3 cache per thread allready too low... AMD can add more L3 cache on Epyc... Especcially if L1/L2 cache is absent...
member
Activity: 350
Merit: 22
Impressive processor!

And yes i've IPBC-specific assembly for ryzen/threadripper so JCE is to be faster. That's why the binary is so big, it contains optimizations for all possible combinations.

I look at the ghost CPU 32 bug, your log is very helpful, as you might expect, i don't own any epyc myself.
I'm fixing the JSON regression too.

edit: both bugs fixed. The CPU32 bug was due to an overflow in my CPU counter. It's somehow luck the remaining code was functional. The biggest thread flood i had tested so far was my ryzen (12 logical cpu) plus the five double-mem GPUs of my rig (2x 5GPU) total 22 threads.

Now rebuilding version 0.29e, and 0.29d will be removed.
jr. member
Activity: 37
Merit: 5
When I try the latest version on an AMD Epyc (basically four Ryzen dice on a package) I get the following error:

Thread 30 successfully bound to CPU 30
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 30 of NUMA node 3 at: 0000021b3c200000
Starting CPU Mining thread 31, affinity: CPU 31
Thread 31 successfully bound to CPU 31
GetNumaProcessorNode failed for cpu 32, error code: 87
Retrying with no NUMA
Allocated 2MB Cached Large Page Scratchpad Buffer at: 0000021b3c400000
Connecting to mining pool support.ipbc.io:17777 ...
Devfee is 1.5%

That's strange because I only defined threads for the CPUs 0 to 31 in the config file.  Huh Why does the miner try to access a CPU (core) 32, which is not present? Apart from that the miner starts mining and has better hashrate than XMR-stak with Bittube: 5200 H/s vs 4900 H/s.  Cheesy
member
Activity: 350
Merit: 22
I admit i focused my test on Haven for the 0.29d to save time to release the gpu version asap. ok i note the problem with huge page release.
newbie
Activity: 23
Merit: 0
29c not releasing hugepages
member
Activity: 350
Merit: 22
there's a very light optim between 0.29b and 0.29c but gain is barely noticeable. not between 0.29c and d. d is a bugfix version.
On the g4560 the best config should be -t 2 if you mine cn-v7 or -t 4 if cryptolight/turtle/ipbc/aeon


json : right, a typo in my code… how lame, i'm good to rebuild it again, thanks for report!
member
Activity: 564
Merit: 19
Intel G4560 - Stock
DDR 2133
Linux JCE 0.29c w/hugepages

61 Kh/s
member
Activity: 473
Merit: 18
0.29d - api is returning invalid json (missing "," before the max speed)
xmr-stak api mode is fine
member
Activity: 350
Merit: 22
0.29d online - Windows and Linux

Code:
Max hashrate when you press r
BLOC added
Bixbite removed
Haven algo selection fixed

This is probably the last CPU-only version
newbie
Activity: 20
Merit: 0
impressive tip from robminer80 who gives better advices than the dev !
Probably thanks to the repartition on cores 1 and 3

I postpone 0.29d to make more tests and add coin BLOC (another CN-Heavy coin)

Apu's L2 cache is 2MB + 2MB in two modules, Cpu 0 and 1 have access to the first 2MB, Cpu 2 and 3 to the other 2MB
member
Activity: 350
Merit: 22
impressive tip from robminer80 who gives better advices than the dev !
Probably thanks to the repartition on cores 1 and 3

I postpone 0.29d to make more tests and add coin BLOC (another CN-Heavy coin)
member
Activity: 762
Merit: 35
thanks for the test, it seems my autoconfig is bad on the A10, it allocates too many threads. I'll fix it, thanks.

The multihash (double hash is the 2-case, you can set from 1- to 6- ) is also called low-power in stak IIRC. It's about using, on one CPU core, twice the register and twice the cache to get sometimes twice the speed. The trick is that it let the other cores free, so it consume less power and allows the Turbo to enable, for CPU with turbo.

Technically, it's good, sure when you want to save power, but also when you run out of cores and not of cache. If you have a CPU with 2 cores but 8M cache, normal config would give only 2x2M = 4M cache used.

you may enable double-hash to use 2x2x2M cache = 8M of cache, and get some extra perf.
It works more or less depending on the CPU. It's very efficient on Ryzen, and not at all on Core2.

I looked closer at the A10, and yeah that's a little APU with little cache.
I give you an experimental config that could let you get some extra perf, but not sure, i cannot test, i've no A10.

"cpu_threads_conf" :  
[  
     { "cpu_architecture" : "auto", "affine_to_cpu" : 0, "use_cache" : true },    
     { "cpu_architecture" : "auto", "affine_to_cpu" : 1, "use_cache" : false },
     { "cpu_architecture" : "auto", "affine_to_cpu" : 2, "use_cache" : true },
     { "cpu_architecture" : "auto", "affine_to_cpu" : 3, "use_cache" : false },
]

now finishing 0.29c, the last CPU-only version, with some updates.

12:52:22 | Hashrate Thread 0: 59.71 h/s
12:52:22 | Hashrate Thread 1: 2.57 h/s
12:52:22 | Hashrate Thread 2: 54.01 h/s
12:52:22 | Hashrate Thread 3: 2.44 h/s
12:52:22 | Total: 118.71 h/s
12:52:23 | Pool changes Difficulty to 2634.


limited to two threads with -t 2:

12:47:53 | Pool changes Difficulty to 2376.
12:47:59 | Hashrate Thread 0: 62.49 h/s
12:47:59 | Hashrate Thread 1: 60.26 h/s
12:47:59 | Total: 122.74 h/s




Try:

"cpu_threads_conf" :  
[  
     { "cpu_architecture" : "trinity", "affine_to_cpu" : 1, "use_cache" : true },    
     { "cpu_architecture" : "trinity", "affine_to_cpu" : 3, "use_cache" : true },
]

Another +10hashes gained this way! It might be a temporary thing, but if
I dont' touch anything on the PC the hashrate goes to 137 H/s.

I'm not sure why it did it, since in both cases it uses generic aes avx , large cache
and two cores.

member
Activity: 350
Merit: 22
Masari works, that's Haven which may be broken. Now testing 0.29d, I added the Max hashrate, even if it's still a pure CPU version


Code:
21:00:10 | Hashrate CPU Thread 0: 11.42 h/s
21:00:10 | Hashrate CPU Thread 1: 11.54 h/s
21:00:10 | Hashrate CPU Thread 2: 11.67 h/s
21:00:10 | Hashrate CPU Thread 3: 11.71 h/s
21:00:10 | Total: 46.33 h/s - Max: 47.44 h/s


That's haven on my core2-quad
full member
Activity: 1120
Merit: 131
out of context, reading a file is portable. But i cannot legitimately (and don't want to) grab some external app from somewhere and make it write a text file that JCE would read, that's a pretty dirty way of working.

However that's rather a GUI/Monitor job, a GUI tool could use JCE to mine and in the same time monitor the cpu temperatures. Get the hashrate, get the temperature, and display both.

Hi, just  mined a bit of Masari with the Ryzen 2400G, hashrate in "auto'" config was around 280H/s.
member
Activity: 350
Merit: 22
out of context, reading a file is portable. But i cannot legitimately (and don't want to) grab some external app from somewhere and make it write a text file that JCE would read, that's a pretty dirty way of working.

However that's rather a GUI/Monitor job, a GUI tool could use JCE to mine and in the same time monitor the cpu temperatures. Get the hashrate, get the temperature, and display both.
Pages:
Jump to: