Author

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 109. (Read 90815 times)

member
Activity: 564
Merit: 19
CPU temperature info with report please
member
Activity: 350
Merit: 22
woow, what's that beast ? a knight landing ?

that's a lot of assembly to write, i'll probably release dual 64 bits first, then dual 32, then multiple.
full member
Activity: 1179
Merit: 131
That would be awesome man.  My processor has 128 mb of L4 Cache haha
member
Activity: 350
Merit: 22
yeah, looked and saw we again had the same idea, he uses sse2 registers to avoid the lack of GPR, i already do that for cryptonight heavy and 32 bits doublehash, for the same reason. but i use one double large page, while he uses two single large pages. here we differ.

right, i'll release doublehash first, then upgrade to triple, quand, penta, maybe hexa hash.
it will still multiply the algos, i expect the binary to raise at 6M or something Shocked

cryptonight_lite_v7 work on JCE?

If you mean Cryptolight V7, yes. Read first page.
thanks, that's the --variation 4, and it's automatic with a trtlcoin wallet.

Ini sangat bermanfaat sekali bagi kami.
អរគុណ Cheesy
let's both use our obscure native s-e asian language Smiley
newbie
Activity: 77
Merit: 0
Ini sangat bermanfaat sekali bagi kami.
full member
Activity: 1179
Merit: 131
finished cryptolight and light-v7 double hash.
worth the pain : with four doublehash on its four cores, my xeon jumps from 228 to 241 h/s, a welcome +5%

still need to do the 32 bits, and a lot of tests Cry
You can see sorce code from XMRig or xmr-stak to look how make lowpower mode more effective.

Meanwhile, XMRig make version with Triple hash, Quard hash and Penta hash modes threads for CPU's with large amount of cache.

Yes, I have an i7-5775R that will mine cryptonightv7 at 515 H/S in 5x mode.  Its absolutely amazing.
jr. member
Activity: 70
Merit: 3
cryptonight_lite_v7 work on JCE?

If you mean Cryptolight V7, yes. Read first page.
newbie
Activity: 92
Merit: 0
cryptonight_lite_v7 work on JCE?
sr. member
Activity: 1484
Merit: 253
finished cryptolight and light-v7 double hash.
worth the pain : with four doublehash on its four cores, my xeon jumps from 228 to 241 h/s, a welcome +5%

still need to do the 32 bits, and a lot of tests Cry
You can see sorce code from XMRig or xmr-stak to look how make lowpower mode more effective.

Meanwhile, XMRig make version with Triple hash, Quard hash and Penta hash modes threads for CPU's with large amount of cache.
member
Activity: 350
Merit: 22
finished cryptolight and light-v7 double hash.
worth the pain : with four doublehash on its four cores, my xeon jumps from 228 to 241 h/s, a welcome +5%

still need to do the 32 bits, and a lot of tests Cry
member
Activity: 350
Merit: 22
impressive score with a Vishera, close to my Ryzen ! ...

Is it possible to implement XOP instructions for this CPU (Bulldozer, Piledriver, Excavator), is there any advantage?
Thanks

the packed rotate may be useful for the kekkac part, but not cryptonight. i'll read more the list of instruction, but i don't expect a real boost.

Again a new test, done right now

Code:
                +--------------------------------------+
                | JC Expert Cryptonote CPU Miner 0.24e |
                +--------------------------------------+


For Windows 64-bits
Analyzing Processors topology...
AMD Ryzen 5 1600 Six-Core Processor
Architecture codename: Ryzen
  SSE2          : Yes
  SSE3          : Yes
  SSE4          : Yes
  AES           : Yes
  AVX           : Yes

Preparing 8 Mining Threads...

+-- Thread 0 config -----------------------------+
| Run on CPU:             0                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 1 config -----------------------------+
| Run on CPU:             1                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 2 config -----------------------------+
| Run on CPU:             2                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 3 config -----------------------------+
| Run on CPU:             4                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 4 config -----------------------------+
| Run on CPU:             6                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 5 config -----------------------------+
| Run on CPU:             7                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 6 config -----------------------------+
| Run on CPU:             8                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

+-- Thread 7 config -----------------------------+
| Run on CPU:             10                     |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        ryzen                  |
+------------------------------------------------+

Cryptonight Variation: Cryptonight V7 fork of April-2018

Low intensity.
Starting Mining thread 0, affinity: CPU 0
Thread 0 successfully bound to CPU 0
Allocated shared Large Page at: 0000000005800000
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 0 of NUMA node 0 at: 0000000005a00000
Starting Mining thread 1, affinity: CPU 1
Thread 1 successfully bound to CPU 1
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 1 of NUMA node 0 at: 0000000005e00000
Starting Mining thread 2, affinity: CPU 2
Thread 2 successfully bound to CPU 2
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 2 of NUMA node 0 at: 0000000006200000
Starting Mining thread 3, affinity: CPU 4
Thread 3 successfully bound to CPU 4
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 4 of NUMA node 0 at: 0000000006600000
Starting Mining thread 4, affinity: CPU 6
Thread 4 successfully bound to CPU 6
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 6 of NUMA node 0 at: 0000000006a00000
Starting Mining thread 5, affinity: CPU 7
Thread 5 successfully bound to CPU 7
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 7 of NUMA node 0 at: 0000000006e00000
Starting Mining thread 6, affinity: CPU 8
Thread 6 successfully bound to CPU 8
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 8 of NUMA node 0 at: 0000000007200000
Starting Mining thread 7, affinity: CPU 10
Thread 7 successfully bound to CPU 10
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 10 of NUMA node 0 at: 0000000007600000
Devfee is 1.5%

20:33:51 | Connecting to mining pool xmrpool.eu:3333 ...
20:33:51 | Monero (XMR) Mining session starts!
20:34:51 | Hashrate Thread 0: 58.65 h/s
20:34:51 | Hashrate Thread 1: 58.91 h/s
20:34:51 | Hashrate Thread 2: 66.21 h/s
20:34:51 | Hashrate Thread 3: 66.06 h/s
20:34:51 | Hashrate Thread 4: 58.90 h/s
20:34:51 | Hashrate Thread 5: 58.91 h/s
20:34:51 | Hashrate Thread 6: 66.51 h/s
20:34:51 | Hashrate Thread 7: 66.38 h/s
20:34:51 | Total: 500.48 h/s

Raw, unstaged log from my rig on CN-v7. The remote control of my rig takes a few h/s but i really reach 500+

Code:
[2018-04-22 20:38:59] : Mining coin: monero7
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 0.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 2.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 4.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 1.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 6.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 8.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 10.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Starting 1x thread, affinity: 7.
[2018-04-22 20:38:59] : hwloc: memory pinned
[2018-04-22 20:38:59] : Fast-connecting to monero.hashvault.pro:3333 pool ...
[2018-04-22 20:38:59] : Pool monero.hashvault.pro:3333 connected. Logging in...
[2018-04-22 20:39:00] : Difficulty changed. Now: 10000.
[2018-04-22 20:39:00] : Pool logged in.
HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   55.0 |   (na) |   (na) |  1 |   66.7 |   (na) |   (na) |
|  2 |   66.5 |   (na) |   (na) |  3 |   55.2 |   (na) |   (na) |
|  4 |   56.1 |   (na) |   (na) |  5 |   66.9 |   (na) |   (na) |
|  6 |   66.9 |   (na) |   (na) |  7 |   56.1 |   (na) |   (na) |
Totals (CPU):   489.5    0.0    0.0 H/s
-----------------------------------------------------------------
Totals (ALL):    489.5    0.0    0.0 H/s

Stak, same rule : the remote control steals a few h/s, i know it can reach 493, but not 500+
JCE is really faster on ryzen, but right the difference is <3%
member
Activity: 350
Merit: 22
on my case...
xmrig cpu miner is better...
core i3-2100
JCE = 40h/s
XMRIG = 120h/s
there's a configuration problem here, jce is always faster on non-aes. probably autoconfig went bad. can you try with --auto -t 4 ?
if possible provide both xmrig and jce first lines of log to see how many threads are used.

my score of 502 is no fake but it may depend on my memory, motherboard... that's just to compare, i'm 502 on v7 against 493 for stak on same machine.

i've done the 64 bits dualshare. now testing.
performance are just slightly above xmrig. my ryzen 1600 one thread gives 138.8 versus 137.4 on xmrig. less than 1%, we're probably both at hardware max, since both code are completely different.

on core2 xeon with all cache used (two simple and two dual) i jump from 117.1 to a whooping 117.6
On one thread, jump from 29.6 to 31.1

Impressive how the dualshare almost double perf on one thread on ryzen, and gives almost nothing on core2
legendary
Activity: 1510
Merit: 1003
i've done no test with oc, i'm not good at OC and have a cheap psu (litterally a 100W pico psu) so i avoid playing with fire.

my peak with jce is 507 on cn and 503 on v7, while stak gives 502 and 493 respectively.
When i gpu mine with claymore GPU 11.3 at the same time, jce drops to 499-500 with v7. Same with jce gpu (my opencl proto not finished yet).
I can't reproduce ~500h/s on stock ryzen 5 1600. To get this I need to clock my memory higher (and thus increase internal bus speed to boost cache performance), also I need to enable Performnce Bias option on my Asus m/b. Only in this case with stock cpu clocks I can get > 500h/s.

And yes, my tests show that background tasks slowdown V7 performance more than classic cryptonight
newbie
Activity: 66
Merit: 0
on my case...
xmrig cpu miner is better...
core i3-2100
JCE = 40h/s
XMRIG = 120h/s
newbie
Activity: 43
Merit: 0
impressive score with a Vishera, close to my Ryzen ! ...

Is it possible to implement XOP instructions for this CPU (Bulldozer, Piledriver, Excavator), is there any advantage?
Thanks
newbie
Activity: 56
Merit: 0
and about monitoring : i didn't really plan to embed a HTTP server, i would know how to make it, that's pretty standard, but not my priority, i focus on Assembly optimizations for now
sr. member
Activity: 1484
Merit: 253
impressive score with a Vishera, close to my Ryzen !

about the --variation, cryptolight v7 is for TurtleCoin as far as i know. Jce is optimized for it and uses half Large Pages to ensure contiguous memory.

So, on jce the equivalent is --variation 4
but if you mine turtlecoin, it should be automatic.

jce Dualshares on the way, it will be like the lowpower of stak, but not ready yet, i've to implement it in assembly, it takes some time.
so i expect jce not to be faster than stak with current version lacking dualshare.
Good news! Thanks for your work.
Waiting low power modes...
Hey bro, sorry again for the lost shares...
Even if at 34 h/s the loss is not that big...

You worth a preview of JCE 0.24e (e for experimental)

Without doublehash, on my Xeon Core2, one thread
Code:
Preparing 1 Mining Threads...

+-- Thread 0 config -----------------------------+
| Run on CPU:             0                      |
| Use cache:              yes                    |
| Double-hash:            no                     |
| Assembly module:        generic_sse4           |
+------------------------------------------------+

Cryptonight Variation: Original Cryptonight
Allocated 2MB Cached Large Page Scratchpad Buffer for CPU 0 of NUMA node 0 at: 0000000005600000
11:13:40 | Hashrate Thread 0: 29.26 h/s
11:13:40 | Total: 29.26 h/s

With experimental double-hash
Code:
Preparing 1 Mining Threads...

+-- Thread 0 config -----------------------------+
| Run on CPU:             0                      |
| Use cache:              yes                    |
| Double-hash:            yes                    |
| Assembly module:        generic_sse4           |
+------------------------------------------------+

Cryptonight Variation: Original Cryptonight
Allocated 4MB Cached Large Page Scratchpad Buffer for CPU 0 of NUMA node 0 at: 0000000005800000
11:03:53 | Hashrate Thread 0: 31.19 h/s
11:03:53 | Total: 31.19 h/s

So yes there's a light perf increase Cool
It's too light perf increase ))) Must be more...
member
Activity: 350
Merit: 22
i've done no test with oc, i'm not good at OC and have a cheap psu (litterally a 100W pico psu) so i avoid playing with fire.

my peak with jce is 507 on cn and 503 on v7, while stak gives 502 and 493 respectively.
When i gpu mine with claymore GPU 11.3 at the same time, jce drops to 499-500 with v7. Same with jce gpu (my opencl proto not finished yet).
legendary
Activity: 1510
Merit: 1003
I've the exact same ryzen 1600 and the best config is 8 threads, with jce or stak. If you have no gain compared to 6 that may be because of other background tasks, or obscure overclock side effect (turbo...?)

Jce is ~2.5% faster than stak on cryptonight v7, i reach 502 with my ryzen @stock while stak/xmrig max at 492, in rig configuration = all large pages enabled, no background task, all windows services (superfetch, OneDrive...) disabled.
There are ccminer with 4 nvidia cards and srb-miner with 1 vega card mining on this rig ))
What max speed in v7 were you able to get from your ryzen with oc?
member
Activity: 350
Merit: 22
I've the exact same ryzen 1600 and the best config is 8 threads, with jce or stak. If you have no gain compared to 6 that may be because of other background tasks, or obscure overclock side effect (turbo...?)

Jce is ~2.5% faster than stak on cryptonight v7, i reach 502 with my ryzen @stock while stak/xmrig max at 492, in rig configuration = all large pages enabled, no background task, all windows services (superfetch, OneDrive...) disabled.
Jump to: