Author

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 106. (Read 90858 times)

newbie
Activity: 71
Merit: 0
Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple

Any reccomendations for IPBC settings on a ryzen 1700? I cant seem to top 1800hsh

Trying 12 threads two double ten simple.
Also tried 14 threads two double 12 simple
newbie
Activity: 43
Merit: 0
I am on CrytonightV7
full member
Activity: 1120
Merit: 131
Hi

Having problems with an FX8300 JCE 0.25:

Code:
19:40:05 | Thread 3 finds a Share, value 40000
19:40:05 | Rejected by the pool.
19:40:05 | Message from the pool: Low difficulty share
19:40:24 | Thread 6 finds a Share, value 40000
19:40:24 | Rejected by the pool.
19:40:24 | Message from the pool: Low difficulty share

exact same issue with GRAFT, switch back to XTL.
newbie
Activity: 43
Merit: 0
Hi

Having problems with an FX8300 JCE 0.25:

Code:
19:40:05 | Thread 3 finds a Share, value 40000
19:40:05 | Rejected by the pool.
19:40:05 | Message from the pool: Low difficulty share
19:40:24 | Thread 6 finds a Share, value 40000
19:40:24 | Rejected by the pool.
19:40:24 | Message from the pool: Low difficulty share
member
Activity: 350
Merit: 22
sorry bro but that's on the 0.26 not released yet.
i over pumped my assembly and now reach 1704 on ipbc and 1733 on Turtle (cryptolight v7)

64 bits almost done, 32 still to be written.
jr. member
Activity: 196
Merit: 1
Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple
Damn! Awesome for a stock hexa-core Ryzen!

How to setup this? I can do some testes in Ryzen 7 to see the scalling with 8 cores (but the same cache)
jr. member
Activity: 75
Merit: 1
The CN-lite variants are producing surprising results.
You get an n2 hashrate. Half the cache memory used, four times the hashrate.

Even odder, on my 2630L I got best results using 2+1+2+1+2+1+2+1+2+1+2+1
            and on my 5640 best with just one thread one core.

Also with the 5640, couldn't run 12 threads(12Mb cache), ran into cache flooding when more than 10 threads, no other apps running.

Baz
member
Activity: 350
Merit: 22
Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple
sr. member
Activity: 1484
Merit: 253
if you let "use_cache" to its default true, all levels of cache are used. otherwise, none. as said in the doc, not used doesn't always implies not impacted.

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
Hmmm, what is the situation with L3 cache on FX83xx ?

i've a Bulldozer (an Excavator to be precise) on my rig to test, but broken for now. i'll do complete tests once possible.
I'm answered because imho L2 cache didn't use by miner... Maybe I'm wrong.
But after 4 threads (4x2=8Mb) speed stops to grow with adding threads, it just reduces in every threads so speed stays nearly the same...
member
Activity: 350
Merit: 22
if you let "use_cache" to its default true, all levels of cache are used. otherwise, none. as said in the doc, not used doesn't always implies not impacted.

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
Hmmm, what is the situation with L3 cache on FX83xx ?

i've a Bulldozer (an Excavator to be precise) on my rig to test, but broken for now. i'll do complete tests once possible.
sr. member
Activity: 1484
Merit: 253
ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
Hmmm, what is the situation with L3 cache on FX83xx ?

http://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html
Not info about cache speed.

FX83xx proccessors have 8Mb L3 cache shared between all cores and L2 cache 2Mb per 2 cores.
Questions:
1. Miner uses both L2 and L3 caches?
2. If miner uses L2 cache and affinity threads to cores is not set, this means that L2 cache can be shared 1Mb to thread used for mining and 1Mb to thread not used for miner.

Or I misunderstand something?
newbie
Activity: 70
Merit: 0
ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
Hmmm, what is the situation with L3 cache on FX83xx ?

http://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html
sr. member
Activity: 1484
Merit: 253
ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
Hmmm, what is the situation with L3 cache on FX83xx ?
member
Activity: 350
Merit: 22
ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks Smiley
jr. member
Activity: 196
Merit: 1
Hi !
Nice i've users from all the world Wink Bitcointalk magic Smiley

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:
:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.
I'm from the newest Brazil's province. Yeah, Bitcointalk magic Smiley

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.

except if you go over your cache limits, memory shoudn't have big impact. it's better to play with Performance or similar modes in Bios which lower cache timings and give huge cryponight speed boost.

thanks for the report baz, i'm working on multi hash over 2, that's good for you the cn-light and ipbc miners Smiley

to the guy who quoted the full doc : it makes the topic harder to read Sad
The Ryzen's L3 cache runs at memory clock. There is a huge gain from 2133 to 2800, and smaller gains at 2800+ for Cryptonight V7. For Lite, as I can see, there is more room to improvements clocking high memory (and, of course, L3 cache). 10% from 2933 to 3200.
member
Activity: 350
Merit: 22
Hi !
Nice i've users from all the world Wink Bitcointalk magic Smiley

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:
:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.
I'm from the newest Brazil's province. Yeah, Bitcointalk magic Smiley

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.

except if you go over your cache limits, memory shoudn't have big impact. it's better to play with Performance or similar modes in Bios which lower cache timings and give huge cryponight speed boost.

thanks for the report baz, i'm working on multi hash over 2, that's good for you the cn-light and ipbc miners Smiley

to the guy who quoted the full doc : it makes the topic harder to read Sad
jr. member
Activity: 75
Merit: 1
Nice results with IPBC on Xeon cpus.

E5 2630L @2000MHz;

1100h/s using 14 of the 16 cores

E5 5640 @3333MHz;

700h/s using 10 of the 12 cores.

These are the processors in my two mining rigs, running 4x RX Vegas and 3x GTX1070 respectively.
Each rig needs two cores to run the gpu miners.

Baz

newbie
Activity: 2
Merit: 0
jr. member
Activity: 196
Merit: 1
Hi !
Nice i've users from all the world Wink Bitcointalk magic Smiley

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:
:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.
I'm from the newest Brazil's province. Yeah, Bitcointalk magic Smiley

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.
member
Activity: 350
Merit: 22
Hi !
Nice i've users from all the world Wink Bitcointalk magic Smiley

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:
:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.
Jump to: