[XMR] JCE Miner Cryptonight/forks, now with GPU! - page 13.

JCE-Miner

member

Activity: 350

Merit: 22

Ok, thanks for the report, so i'll separate the bulldozer-like assemblies from the Zen, since one got perf increase and the other perf decrease.
I'll pack this into the 0.33b14, with the GPU port of uPlexa and a more automatic legacy to avoid some GPU perf regressions.

pp55

newbie

Activity: 41

Merit: 0

Quote from: JCE-Miner on December 20, 2018, 09:50:38 AM

Online is
0.33j CPU Windows

* Increase/restore v8 speed
* uPlexa fork, as --variation 19
* Updated shitcoins: Saronite forked to Haven, XFH to Swap

About the speed, i tested it good on CPU I have, but did theorical fixes on CPU i don't have.
On Zen i got still +0.2% extra perf.

Unfortunately the same speed as 0.33i CPU Sad

JCE-Miner

member

Activity: 350

Merit: 22

I give priority to the GPU version this time Grin

tybiboune

jr. member

Activity: 313

Merit: 8

Hi ! no Linux build for the j release?

JCE-Miner

member

Activity: 350

Merit: 22

It was ready before you asked, in test session.
Next release is 0.33b14 GPU with the backport of CPU version (including uPlexa) and the fix for --legacy eating too much CPU.

whotheff

member

Activity: 762

Merit: 35

Quote from: JCE-Miner on December 20, 2018, 09:50:38 AM

Online is
0.33j CPU Windows

* Increase/restore v8 speed
* uPlexa fork, as --variation 19
* Updated shitcoins: Saronite forked to Haven, XFH to Swap

About the speed, i tested it good on CPU I have, but did theorical fixes on CPU i don't have.
On Zen i got still +0.2% extra perf.

Fast as lightning! Thanks JCE!

JCE-Miner

member

Activity: 350

Merit: 22

Online is
0.33j CPU Windows

* Increase/restore v8 speed
* uPlexa fork, as --variation 19
* Updated shitcoins: Saronite forked to Haven, XFH to Swap

About the speed, i tested it good on CPU I have, but did theorical fixes on CPU i don't have.
On Zen i got still +0.2% extra perf.

JCE-Miner

member

Activity: 350

Merit: 22

Quote

uPlexa

Will be in next version, code done and tested.
Stellite v8 is ready too, but i need a test pool and didn't find any yet, the testnet still says v7

Vishera performance: i admit this time, this is a real surprise. That's a modern AES Amd CPU, while i didn't test it since i've no such CPU (i've old Athlon64 and Ryzen, but not intermediate) i expected the perf to be at least on par.

Next version will restore Core2 and older CPU perf (sure), give a little +0.1% on Zen (quite sure, but it's in the margin of bench error) and add perf for Intel and those FX Cpu (not sure, theorical optim).
And add some new shitcoins like Swap, and uPlexa.

pp55

newbie

Activity: 41

Merit: 0

I mean in my case old versions are better on CPU on CNv8 Sad

And add miner version in log, pls.

HardKano

newbie

Activity: 76

Merit: 0

Nice !

pp55

newbie

Activity: 41

Merit: 0

Hi, JCE!

AMD FX-8320E, Turbo boost OFF, CNv8

0.33i CPU

Code:

0.33g CPU

Code:

Code:

Analyzing Processors topology...
AMD FX-8320E Eight-Core Processor
Assembly codename: generic_aes_avx
SSE2 : Yes
SSE3 : Yes
SSE4 : Yes
AES : Yes
AVX : Yes
AVX2 : No
Auto-configuration, selected CPUs will be highlighted...
Found CPU 0, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 1
L3 Cache: 8192 KB, shared with CPU 1, 2, 3, 4, 5, 6, 7
Found CPU 1, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 0
L3 Cache: 8192 KB, shared with CPU 0, 2, 3, 4, 5, 6, 7
Found CPU 2, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 3
L3 Cache: 8192 KB, shared with CPU 0, 1, 3, 4, 5, 6, 7
Found CPU 3, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 2
L3 Cache: 8192 KB, shared with CPU 0, 1, 2, 4, 5, 6, 7
Found CPU 4, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 5
L3 Cache: 8192 KB, shared with CPU 0, 1, 2, 3, 5, 6, 7
Found CPU 5, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 4
L3 Cache: 8192 KB, shared with CPU 0, 1, 2, 3, 4, 6, 7
Found CPU 6, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 7
L3 Cache: 8192 KB, shared with CPU 0, 1, 2, 3, 4, 5, 7
Found CPU 7, with:
L1 Cache: 16 KB
L2 Cache: 2048 KB, shared with CPU 6
L3 Cache: 8192 KB, shared with CPU 0, 1, 2, 3, 4, 5, 6
HTTP Local Server on port 3334

Preparing 7 Mining Threads...

+-- Thread 0 config ------------------------+
| Run on CPU: 0 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 1 config ------------------------+
| Run on CPU: 1 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 2 config ------------------------+
| Run on CPU: 2 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 3 config ------------------------+
| Run on CPU: 3 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 4 config ------------------------+
| Run on CPU: 4 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 5 config ------------------------+
| Run on CPU: 5 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

+-- Thread 6 config ------------------------+
| Run on CPU: 6 |
| Use cache: yes |
| Multi-hash: no |
| Assembly module: generic_aes_avx |
+-------------------------------------------+

Cryptonight Variation: Cryptonight V8 fork of Oct-2018

Low intensity.

Starting CPU Thread 0, affinity: CPU 0
Thread 0 successfully bound to CPU 0
Allocated shared Large Page at: 0000014709e00000
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 0 of NUMA node 0 at: 000001470a000000

Starting CPU Thread 1, affinity: CPU 1
Thread 1 successfully bound to CPU 1
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 1 of NUMA node 0 at: 000001470a200000

Starting CPU Thread 2, affinity: CPU 2
Thread 2 successfully bound to CPU 2
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 2 of NUMA node 0 at: 000001470a400000

Starting CPU Thread 3, affinity: CPU 3
Thread 3 successfully bound to CPU 3
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 3 of NUMA node 0 at: 000001470a600000

Starting CPU Thread 4, affinity: CPU 4
Thread 4 successfully bound to CPU 4
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 4 of NUMA node 0 at: 000001470a800000

Starting CPU Thread 5, affinity: CPU 5
Thread 5 successfully bound to CPU 5
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 5 of NUMA node 0 at: 000001470aa00000

Starting CPU Thread 6, affinity: CPU 6
Thread 6 successfully bound to CPU 6
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 6 of NUMA node 0 at: 000001470ac00000
15:59:58 | Monero (XMR/XMV) Mining session starts!

Both with --auto --archi vishera -t 7 --low in config

whotheff

member

Activity: 762

Merit: 35

Hi JCE, could you please add uPlexa coin to the miner?
https://bitcointalksearch.org/topic/dai-mainnet-upx-uplexa-ai-anonymity-and-ecommerce-via-iot-5058404

You can speak to Quantumleaper on Dicord if you decide to inlcude it:
https://discord.gg/ddRVYCb

laik2

sr. member

Activity: 652

Merit: 266

Quote from: JCE-Miner on December 19, 2018, 06:12:13 AM

Hi all,

Linux GPU: unlikely. I'm a niche miner (CPU and older GPUs) and adding the Linux concept would make it a niche of a niche, but a lot of dev time to do. The Win GPU is already like 15% of my fees but 90% of the support, the Linux version would be like 1% of my users for 95% of the support. I cannot afford this Sad

Sometime i don't look at the market and do things for fun, like supporting the HD6000, but it remains an acceptable dev time. Linux GPU wouldn't.

Btw try TeamRed on Linux for v8 mining, it burns like fire Wink

@PIOUPIOU99: yeah thanks, my new CPU miner also burns like fire Cool

Speed on Intel: i don't even have any big Intel CPU, i'm all AMD, as for the GPU (i've zero nVidia). But that's ok, i'll do some theorical optimizations for big Intel CPU too.
Can you tell me what exact CPU you have? Maybe a good config can close the gap with xmrstak. I know i must beat it by more than 1.5% to compensate for the devfee. It's true in most cases, but yeah maybe not the i7.

I do use it but competitive linux miner is always welcome

JCE-Miner

member

Activity: 350

Merit: 22

Right, i rephrase explicitely the comment as all CPUs where i lacked extra performance versus xmrig.
Of course my new v8 assembly is for modern AES CPU like Zen, the one for non-aes Core2 is already ultra-optimized and the 33i gives no extra perf compared to 33h

Also i observed a little regression too, you're right, it was hard to understand how there could be a side effect but found it, that's a cache allocation problem. Will be fixed in 33j that I already planned to release with an optim for Intel modern CPU and the UPlexa fork.

sergneo

newbie

Activity: 33

Merit: 0

jce_cn_cpu_miner.windows.033i
-1 h/s in comparison with the previous version. Where is the optimization on V8 ? No improvement seen.
CPU Xeon E5440 , Core2Quad Q9400.

Iamtutut

full member

Activity: 1120

Merit: 131

Mining bittube with the lastest GPU version: (4X RX574: 1240/2070; 1240/2070; 1240/2040; 1240/2035).

Code:

Starting GPU Thread 0, on GPU 0
Created OpenCL Context for GPU 0 at 000001cf487080a0
Created OpenCL Thread 0 Command-Queue for GPU 0 at 000001cf48720ca0
Scratchpad Allocation success for OpenCL Thread 0
Allocating big 1856MB scratchpad for OpenCL Thread 0...
Compiling kernels of OpenCL Thread 0...
Kernels of OpenCL Thread 0 compiled.

Starting GPU Thread 1, on GPU 0
Created OpenCL Thread 1 Command-Queue for GPU 0 at 000001cf4d55d740
Scratchpad Allocation success for OpenCL Thread 1
Allocating big 1856MB scratchpad for OpenCL Thread 1...
Compiling kernels of OpenCL Thread 1...
Kernels of OpenCL Thread 1 compiled.

Starting GPU Thread 2, on GPU 1
Created OpenCL Context for GPU 1 at 000001cf487839f0
Created OpenCL Thread 2 Command-Queue for GPU 1 at 000001cf4d55db30
Scratchpad Allocation success for OpenCL Thread 2
Allocating big 1856MB scratchpad for OpenCL Thread 2...
Compiling kernels of OpenCL Thread 2...
Kernels of OpenCL Thread 2 compiled.

Starting GPU Thread 3, on GPU 1
Created OpenCL Thread 3 Command-Queue for GPU 1 at 000001cf4d55d200
Scratchpad Allocation success for OpenCL Thread 3
Allocating big 1856MB scratchpad for OpenCL Thread 3...
Compiling kernels of OpenCL Thread 3...
Kernels of OpenCL Thread 3 compiled.

Starting GPU Thread 4, on GPU 2
Created OpenCL Context for GPU 2 at 000001cf487844f0
Created OpenCL Thread 4 Command-Queue for GPU 2 at 000001cf4d55d4a0
Scratchpad Allocation success for OpenCL Thread 4
Allocating big 1856MB scratchpad for OpenCL Thread 4...
Compiling kernels of OpenCL Thread 4...
Kernels of OpenCL Thread 4 compiled.

Starting GPU Thread 5, on GPU 2
Created OpenCL Thread 5 Command-Queue for GPU 2 at 000001cf588aa3a0
Scratchpad Allocation success for OpenCL Thread 5
Allocating big 1856MB scratchpad for OpenCL Thread 5...
Compiling kernels of OpenCL Thread 5...
Kernels of OpenCL Thread 5 compiled.

Starting GPU Thread 6, on GPU 3
Created OpenCL Context for GPU 3 at 000001cf48783470
Created OpenCL Thread 6 Command-Queue for GPU 3 at 000001cf588ab210
Scratchpad Allocation success for OpenCL Thread 6
Allocating big 1856MB scratchpad for OpenCL Thread 6...
Compiling kernels of OpenCL Thread 6...
Kernels of OpenCL Thread 6 compiled.

Starting GPU Thread 7, on GPU 3
Created OpenCL Thread 7 Command-Queue for GPU 3 at 000001cf588aa640
Scratchpad Allocation success for OpenCL Thread 7
Allocating big 1856MB scratchpad for OpenCL Thread 7...
Compiling kernels of OpenCL Thread 7...
Kernels of OpenCL Thread 7 compiled.
Keep-Alive enabled
Devfee for GPU is 0.9%

12:39:39 | Miner uptime 4:05:07
12:39:39 | Effective net hashrate 3591.50 h/s
12:39:39 | Devices results - Shares Accepted/Ignored/Rejected - Net Hashrate
12:39:39 | * GPU 0 - 98/0/0 - 891.67 h/s
12:39:39 | * GPU 1 - 87/0/0 - 817.14 h/s
12:39:39 | * GPU 2 - 101/0/0 - 976.39 h/s
12:39:39 | * GPU 3 - 91/0/0 - 906.30 h/s
12:40:56 | Hashrate GPU Thread 0: 462.00 h/s
12:40:56 | Hashrate GPU Thread 1: 461.32 h/s - Total GPU 0: 923.31 h/s
12:40:56 | Hashrate GPU Thread 2: 440.52 h/s
12:40:56 | Hashrate GPU Thread 3: 444.89 h/s - Total GPU 1: 885.41 h/s
12:40:56 | Hashrate GPU Thread 4: 449.96 h/s
12:40:56 | Hashrate GPU Thread 5: 449.84 h/s - Total GPU 2: 899.79 h/s
12:40:56 | Hashrate GPU Thread 6: 464.51 h/s
12:40:56 | Hashrate GPU Thread 7: 464.82 h/s - Total GPU 3: 929.33 h/s
12:40:56 | Total: 3637.83 h/s - Max: 3649.71 h/s

JCE-Miner

member

Activity: 350

Merit: 22

Hi all,

Linux GPU: unlikely. I'm a niche miner (CPU and older GPUs) and adding the Linux concept would make it a niche of a niche, but a lot of dev time to do. The Win GPU is already like 15% of my fees but 90% of the support, the Linux version would be like 1% of my users for 95% of the support. I cannot afford this Sad

Sometime i don't look at the market and do things for fun, like supporting the HD6000, but it remains an acceptable dev time. Linux GPU wouldn't.

Btw try TeamRed on Linux for v8 mining, it burns like fire Wink

@PIOUPIOU99: yeah thanks, my new CPU miner also burns like fire Cool

Speed on Intel: i don't even have any big Intel CPU, i'm all AMD, as for the GPU (i've zero nVidia). But that's ok, i'll do some theorical optimizations for big Intel CPU too.
Can you tell me what exact CPU you have? Maybe a good config can close the gap with xmrstak. I know i must beat it by more than 1.5% to compensate for the devfee. It's true in most cases, but yeah maybe not the i7.

PIOUPIOU99

copper member

Activity: 293

Merit: 11

Quote from: JCE-Miner on December 18, 2018, 01:47:06 PM

Quote from: laik2 on December 17, 2018, 06:29:17 PM

Still no linux

Online is the 0.33i CPU Windows and Linux, 32 and 64-bits
major release with a big +2% speed on v8, making my miner the best in all cases on CPU, even fees deduced.

for my light config v8
0.33e

0.33i

laik2

sr. member

Activity: 652

Merit: 266

Quote from: JCE-Miner on December 18, 2018, 08:02:18 AM

i'd say the last one, 0.33b13

my autoconfig aims for safety, for max perf, use the manual config, the github page provides some examples.
https://github.com/jceminer/cn_gpu_miner

but each card may be different (overclocking, memory...) so take time to tune the values. only three are relevant: multi_hash (a multiple of 16), alpha (64 or 128) and beta (8 or 16).

Actually I was talking about GPU version not having linux port

impynick

jr. member

Activity: 77

Merit: 6

on my i7 im getting 295 on xmr stak vs 286 on JCE. How can i improve these numbers? I do have hyperthreading on.

I'm unsure on how to config this? Currently use xmr stak with the following:

   { "low_power_mode" : false, "no_prefetch" : true, "asm" : "auto", "affine_to_cpu" : 0 },
   { "low_power_mode" : false, "no_prefetch" : true, "asm" : "auto", "affine_to_cpu" : 2 },
   { "low_power_mode" : false, "no_prefetch" : true, "asm" : "auto", "affine_to_cpu" : 4 },
   { "low_power_mode" : false, "no_prefetch" : true, "asm" : "auto", "affine_to_cpu" : 6 },

this is on v8....and compiled xmr stak with 0 dev fee....can you compete? If so I'll happily make the move if its worth it.

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 13. (Read 90858 times)