[XMR] JCE Miner Cryptonight/forks, now with GPU! - page 106.

cryptoinvestor_x

newbie

Activity: 71

Merit: 0

Quote from: JCE-Miner on May 01, 2018, 08:16:42 AM

Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple

Any reccomendations for IPBC settings on a ryzen 1700? I cant seem to top 1800hsh

Trying 12 threads two double ten simple.
Also tried 14 threads two double 12 simple

aGeoM

newbie

Activity: 43

Merit: 0

I am on CrytonightV7

Iamtutut

full member

Activity: 1120

Merit: 131

Quote from: aGeoM on May 02, 2018, 01:44:48 PM

Hi

Having problems with an FX8300 JCE 0.25:

Code:

exact same issue with GRAFT, switch back to XTL.

aGeoM

newbie

Activity: 43

Merit: 0

Hi

Having problems with an FX8300 JCE 0.25:

Code:

JCE-Miner

member

Activity: 350

Merit: 22

sorry bro but that's on the 0.26 not released yet.
i over pumped my assembly and now reach 1704 on ipbc and 1733 on Turtle (cryptolight v7)

64 bits almost done, 32 still to be written.

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: JCE-Miner on May 01, 2018, 08:16:42 AM

Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple

Damn! Awesome for a stock hexa-core Ryzen!

How to setup this? I can do some testes in Ryzen 7 to see the scalling with 8 cores (but the same cache)

Bazzaar

jr. member

Activity: 75

Merit: 1

The CN-lite variants are producing surprising results.
You get an n² hashrate. Half the cache memory used, four times the hashrate.

Even odder, on my 2630L I got best results using 2+1+2+1+2+1+2+1+2+1+2+1
and on my 5640 best with just one thread one core.

Also with the 5640, couldn't run 12 threads(12Mb cache), ran into cache flooding when more than 10 threads, no other apps running.

Baz

JCE-Miner

member

Activity: 350

Merit: 22

Even in assembly, we have very few control of the cache level. Just a few instructions: the movntdqa and some prefetchnt (nt == non temporal == no cache). JCE mostly let the CPU handle everything, except in "use_cache":false mode.

Progressing on assembly for multi-hash on AES-64 (the easiest to write).
I reach 79 h/s on Hexa-hash on CV-v7, which is very bad, but when using IPBC where the JCE multihash does marvels, i get some interresting results. More tests and optims to do.

Surprisingly, i didn't run out of registers in x64 with 6 hashes at the same time. But on 32 bits it will be terrible. I'll provide hexa-hash 32-bits for code symetry, but probably useless.

edit : i reach 1700 h/s on on IPBC my stock ryzen 1600 with JCE 0.26 and its multihash, with config multi
3+1+1+1+1+1+3+1+1+1+1+1
so 12 threads, two triple and ten simple. Curiously the double are less good...
That's the fastest combination I found. On CN-v7 still cannot beat the default 8x simple

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: JCE-Miner on May 01, 2018, 02:08:21 AM

if you let "use_cache" to its default true, all levels of cache are used. otherwise, none. as said in the doc, not used doesn't always implies not impacted.

Quote from: UnclWish on April 30, 2018, 02:04:08 PM

Quote from: JCE-Miner on April 30, 2018, 12:33:24 PM

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Hmmm, what is the situation with L3 cache on FX83xx ?

i've a Bulldozer (an Excavator to be precise) on my rig to test, but broken for now. i'll do complete tests once possible.

I'm answered because imho L2 cache didn't use by miner... Maybe I'm wrong.
But after 4 threads (4x2=8Mb) speed stops to grow with adding threads, it just reduces in every threads so speed stays nearly the same...

JCE-Miner

member

Activity: 350

Merit: 22

if you let "use_cache" to its default true, all levels of cache are used. otherwise, none. as said in the doc, not used doesn't always implies not impacted.

Quote from: UnclWish on April 30, 2018, 02:04:08 PM

Quote from: JCE-Miner on April 30, 2018, 12:33:24 PM

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Hmmm, what is the situation with L3 cache on FX83xx ?

i've a Bulldozer (an Excavator to be precise) on my rig to test, but broken for now. i'll do complete tests once possible.

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: s0ftcorn on April 30, 2018, 04:50:22 PM

Quote from: UnclWish on April 30, 2018, 02:04:08 PM

Quote from: JCE-Miner on April 30, 2018, 12:33:24 PM

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Hmmm, what is the situation with L3 cache on FX83xx ?

http://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html

Not info about cache speed.

FX83xx proccessors have 8Mb L3 cache shared between all cores and L2 cache 2Mb per 2 cores.
Questions:
1. Miner uses both L2 and L3 caches?
2. If miner uses L2 cache and affinity threads to cores is not set, this means that L2 cache can be shared 1Mb to thread used for mining and 1Mb to thread not used for miner.

Or I misunderstand something?

s0ftcorn

newbie

Activity: 70

Merit: 0

Quote from: UnclWish on April 30, 2018, 02:04:08 PM

Quote from: JCE-Miner on April 30, 2018, 12:33:24 PM

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Hmmm, what is the situation with L3 cache on FX83xx ?

http://www.cpu-world.com/CPUs/Bulldozer/AMD-FX-Series%20FX-8350.html

UnclWish

sr. member

Activity: 1484

Merit: 253

Quote from: JCE-Miner on April 30, 2018, 12:33:24 PM

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Hmmm, what is the situation with L3 cache on FX83xx ?

JCE-Miner

member

Activity: 350

Merit: 22

ryzen L3 is at mem clock ? hoo, i didn't know, even if i have one. it reminds me the old pentium and motherboard L2 clocked at bus speed. i edit my post, thanks

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: JCE-Miner on April 30, 2018, 12:01:18 PM

Quote from: Larvitar on April 30, 2018, 09:14:58 AM

Quote from: JCE-Miner on April 29, 2018, 12:02:01 PM

Hi !
Nice i've users from all the world Wink

Bitcointalk magic

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:

:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.

I'm from the newest Brazil's province. Yeah, Bitcointalk magic

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.

except if you go over your cache limits, memory shoudn't have big impact. it's better to play with Performance or similar modes in Bios which lower cache timings and give huge cryponight speed boost.

thanks for the report baz, i'm working on multi hash over 2, that's good for you the cn-light and ipbc miners

to the guy who quoted the full doc : it makes the topic harder to read Sad

The Ryzen's L3 cache runs at memory clock. There is a huge gain from 2133 to 2800, and smaller gains at 2800+ for Cryptonight V7. For Lite, as I can see, there is more room to improvements clocking high memory (and, of course, L3 cache). 10% from 2933 to 3200.

JCE-Miner

member

Activity: 350

Merit: 22

Quote from: Larvitar on April 30, 2018, 09:14:58 AM

Quote from: JCE-Miner on April 29, 2018, 12:02:01 PM

Hi !
Nice i've users from all the world Wink

Bitcointalk magic

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:

:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.

I'm from the newest Brazil's province. Yeah, Bitcointalk magic

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.

except if you go over your cache limits, ~~memory shoudn't have big impact~~. it's better to play with Performance or similar modes in Bios which lower cache timings and give huge cryponight speed boost.

thanks for the report baz, i'm working on multi hash over 2, that's good for you the cn-light and ipbc miners

to the guy who quoted the full doc : it makes the topic harder to read Sad

Bazzaar

jr. member

Activity: 75

Merit: 1

Nice results with IPBC on Xeon cpus.

E5 2630L @2000MHz;

1100h/s using 14 of the 16 cores

E5 5640 @3333MHz;

700h/s using 10 of the 12 cores.

These are the processors in my two mining rigs, running 4x RX Vegas and 3x GTX1070 respectively.
Each rig needs two cores to run the gpu miners.

Baz

Titipong007

newbie

Activity: 2

Merit: 0

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: JCE-Miner on April 29, 2018, 12:02:01 PM

Hi !
Nice i've users from all the world Wink

Bitcointalk magic

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:

:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.

I'm from the newest Brazil's province. Yeah, Bitcointalk magic

Just for feedback:
Ryzen 7 1700 at 3.6GHz and 3200MHz memory (16-16-16-36 1T) can reach 2400H/s mining CN-Lite. I'm having problems to stabilize at this. At 2933 memory I can reach around 2200H/s. So, the bottleneck is clairly de memory.

Is there a specific timing that improve CN mining? I know the L3 cache runs at memory frequency and takes advantage of main latencies. For GPUs I know it likes FAW lowered.

JCE-Miner

member

Activity: 350

Merit: 22

Hi !
Nice i've users from all the world Wink

Bitcointalk magic

I advise to use the -q parameter : it quits at first network problem. And with a .bat you can loop forever:

Code:

:MineXmr
jce_cn_cpu_miner64 -q --low ......
goto :MineXmr

this way at each connection problem you restart miner, you'll loose just a few seconds at each problem, probably negligible.
And the default wait between two network attemps is 5s, so restarting the miner or waiting for 5s is just the same delay.

I'm doing a huge refactoring of my assembly for factorize more, to provide multi-hash from triple- to hexa- at the same time. Current version provides only simple and double.

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 106. (Read 90858 times)