Pages:
Author

Topic: [XMR] JCE Miner Cryptonight/forks, now with GPU! - page 96. (Read 90814 times)

full member
Activity: 1120
Merit: 131
What would the 0.29C update be ?

For the CPU / GPU miner, will it be possible to split CPU and GPU mining ? I don't mine CN heavy algos at all with the CPU (4MB cache), but I sometimes mine CN heavy with the GPUs.
 So if possible I'd like to mine CN V7 / lite with the CPU separated from any other CN algo with the GPUs.
member
Activity: 350
Merit: 22
thanks for the test, it seems my autoconfig is bad on the A10, it allocates too many threads. I'll fix it, thanks.

The multihash (double hash is the 2-case, you can set from 1- to 6- ) is also called low-power in stak IIRC. It's about using, on one CPU core, twice the register and twice the cache to get sometimes twice the speed. The trick is that it let the other cores free, so it consume less power and allows the Turbo to enable, for CPU with turbo.

Technically, it's good, sure when you want to save power, but also when you run out of cores and not of cache. If you have a CPU with 2 cores but 8M cache, normal config would give only 2x2M = 4M cache used.

you may enable double-hash to use 2x2x2M cache = 8M of cache, and get some extra perf.
It works more or less depending on the CPU. It's very efficient on Ryzen, and not at all on Core2.

I looked closer at the A10, and yeah that's a little APU with little cache.
I give you an experimental config that could let you get some extra perf, but not sure, i cannot test, i've no A10.

"cpu_threads_conf" : 

     { "cpu_architecture" : "auto", "affine_to_cpu" : 0, "use_cache" : true },     
     { "cpu_architecture" : "auto", "affine_to_cpu" : 1, "use_cache" : false },
     { "cpu_architecture" : "auto", "affine_to_cpu" : 2, "use_cache" : true },
     { "cpu_architecture" : "auto", "affine_to_cpu" : 3, "use_cache" : false },
]

now finishing 0.29c, the last CPU-only version, with some updates.
jr. member
Activity: 196
Merit: 1
no problem, i'll ignore the parameter --api-remote. do you need i ignore -b too, with a dotted ip after ?
i think i'll release a last cpu version using the gpu version, but with gpu disabled. then a alpha with gpu+cpu
Yup!

Cheesy
member
Activity: 762
Merit: 35
Nope, JCE is not 50% slower than stak. Maybe not that faster, but not 50% slower.
And that score is very low for a A10, i can get 115 with a core2 quad. I guess you use one thread only, and stak enabled double-hash. It can be enabled on JCE with manual config too.

At least give the exact Stak configuration you used, and i can tell you how to do the same on JCE.

Noticed that XMR stack uses only two cores
(so it can supply them with 2Mb cache each=4MB total for that CPU).
Then disabled two of them in JCE and it jumped to 130h/s.

How to enable double hash? What does it do?
member
Activity: 350
Merit: 22
Nope, JCE is not 50% slower than stak. Maybe not that faster, but not 50% slower.
And that score is very low for a A10, i can get 115 with a core2 quad. I guess you use one thread only, and stak enabled double-hash. It can be enabled on JCE with manual config too.

At least give the exact Stak configuration you used, and i can tell you how to do the same on JCE.
member
Activity: 762
Merit: 35
Xmr Stack = 130h/s
JCE = 74h/s

huge pages on, avx-aes

on AMD A10-5800K

same pool, same time of day, waited atleast 15 min
to confirm hashrate.
member
Activity: 350
Merit: 22
no problem, i'll ignore the parameter --api-remote. do you need i ignore -b too, with a dotted ip after ?
i think i'll release a last cpu version using the gpu version, but with gpu disabled. then a alpha with gpu+cpu
jr. member
Activity: 196
Merit: 1
You are doing much progress with GPU. wow man

Sorry if it's boring, but I have to discuss Awesome Miner again.

Well, I tried today to run JCE "inside" Awesome miner. As you said some pages ago, JCE Miner has some similiarities with CPUminer-OPT API, so I tried to "camuflate" JCE as CPUminer-opt. Not success here

But reading log, I found the solution (or just a piece): The problem to run is the option "--api-remote -b 0.0.0.0:4034". I can set -mport easily, but Awesome don't alow to put off the --api-remote option.

To solve this, just ignore the --api-remote option or accept it as --mport (with IP:port form). Could you do this in the next release?

Thanks a lot man! Your miner is I M P R E S S I V E!

member
Activity: 350
Merit: 22
I have Pitcairn card. 270X with 4Gb vmem. Best speed on cnv7 is on Claymore 11.3 - it's about 500-510 h/s. Settings are -h 460 -dmem 1
On SRB cnv7 about 450-475 h/s with intensity 58-59, worksize 8, threads 2.

Heavy algo on SRB - about 400 h/s with intensity 26, worksize 8, threads 2.

I'm done with the optims, now focusing on the exotic variations (far easier to add than in CPU version because GPU have tons of registers).
I did the test you advised on my 7870 overpumped with custom JCE timings  Cool which is what i've the closest to your 270X
I could reach 504 with SRB, the best, with 2 threads, intensity 30 (that's your advice of 59 divided by two since a 7870 has 2G ram). At 31, SRB hangs.

On JCE with params 480-8-128 (that's the equivalent JCE config, i'll document it later when i release) I get 510.

On my Bonaire, i reach the exact same 268 than Claymore, it looks like the hardware max.
On my dual 7950 I go up to 1072 with JCE, i still need to compare with SRB. Claymore 9.7 climbed up to 1174, i just cannot reach it.

Next steps : bench the Tahiti against SRB, add the Heavy & al., and i can release an alpha.
edit: i can reach 1040 on SRB with intensity 32 on my 7950s (all other values give lower hashrate or hang). JCE goes up to 1072. All on CN-v7
member
Activity: 350
Merit: 22
Woow, my topic upped by zawawa, thanks bro Smiley

I didn't look at that precise line, as I said i don't use it, i use the 32-bits version of jce cpu algo here instead (since cgn and even nVidia are 32-bits scalar) but it's very possible it's optimized away. I meant that line was stupid funny, not slow, and the funnier is the comment about multiple of 4 and of 8.

I'm almost done with my gpu version, or at least, about to giveup optims. No way to go above 268 on my Bonaire, it sounds like it's the hardware max that claymore 9.7 maxed out too. I spent days fixing my stability problem on rx, and it's stable now, i had to rewrite one little optim.
sr. member
Activity: 728
Merit: 304
Miner Developer
@zawawa : first, of the three miners I cited, yours is the less copy-pasty. You provide some original features like the Phymem, and you keep it open-source. I didn't know you gave back some bucks to Wolf0, and that's very fair. However the CN OpenCL part is still an exact copy of the kernels one can find in other miners, including the stupid parts**
The two other are absolute clones, except the UI and the netcode.
Claymore, SRB and XMRig are really original. But close source. Mine will be too.

@vasilurda
The --low parameter avoid the use of CPU over Idle parts. Otherwise, if JCE mines with, let's say N threads, add parameter -t K where K=N-1 or N-2 to use less CPU. There's no percentage config, but -t does the same.

@MPNT
Thanks but raw performance is of little help with no comparison and no config provided.

My current status is 268 h/s max and stable on my Bonaire, exact same as Claymore, but with twice the memory usage.
I reach 1070 stable on my dual 7950 where claymore 9.7 gave 1160 and 11.3 gave 1147, but 11.3 makes 10% bad shares. I'll try SRB on it.
On my rx560 i'm at 528, against 510 for Stak and Claymore, but unstable Sad Sad


**
Quote
// For 256-bit keys, an sbox permutation is done every other 4th uint generated, AND every 8th
uint t = ((!(c & 7)) || ((c & 7) == 4)) ? SubWord(keybuf[c - 1]) : keybuf[c - 1];
my favorite stupid part of Wolf0 code. Looks nobody noticed a multiple of 8 is always a multiple of 4.

Quote
// For 256-bit keys, an sbox permutation is done every other 4th uint generated
uint t = ((c & 7) == 4) ? SubWord(keybuf[c - 1]) : keybuf[c - 1];
Now fixed. But I won't use it, i use a completely different Keygen code, inspired from my CPU version.

Well, I am pretty sure that the redundant conditional expression is optimized away by clang/LLVM.
Did you take a look at the resulting GCN assembly code?
full member
Activity: 1120
Merit: 131
Which coin is best to mine over cpu these days?

Try different algos, see your average hashrate and use this kind of website to estimate your profitability: https://www.cryptunit.com/
member
Activity: 350
Merit: 22
I'd say TurtleCoin or IPBC, or any Cryptoligh-style coin. CPU are beasts on them.
And CN-Heavy on GPU.
Monero is intermediate.
member
Activity: 350
Merit: 22
@zawawa : first, of the three miners I cited, yours is the less copy-pasty. You provide some original features like the Phymem, and you keep it open-source. I didn't know you gave back some bucks to Wolf0, and that's very fair. However the CN OpenCL part is still an exact copy of the kernels one can find in other miners, including the stupid parts**
The two other are absolute clones, except the UI and the netcode.
Claymore, SRB and XMRig are really original. But close source. Mine will be too.

@vasilurda
The --low parameter avoid the use of CPU over Idle parts. Otherwise, if JCE mines with, let's say N threads, add parameter -t K where K=N-1 or N-2 to use less CPU. There's no percentage config, but -t does the same.

@MPNT
Thanks but raw performance is of little help with no comparison and no config provided.

My current status is 268 h/s max and stable on my Bonaire, exact same as Claymore, but with twice the memory usage.
I reach 1070 stable on my dual 7950 where claymore 9.7 gave 1160 and 11.3 gave 1147, but 11.3 makes 10% bad shares. I'll try SRB on it.
On my rx560 i'm at 528, against 510 for Stak and Claymore, but unstable Sad Sad


**
Quote
// For 256-bit keys, an sbox permutation is done every other 4th uint generated, AND every 8th
uint t = ((!(c & 7)) || ((c & 7) == 4)) ? SubWord(keybuf[c - 1]) : keybuf[c - 1];
my favorite stupid part of Wolf0 code. Looks nobody noticed a multiple of 8 is always a multiple of 4.

Quote
// For 256-bit keys, an sbox permutation is done every other 4th uint generated
uint t = ((c & 7) == 4) ? SubWord(keybuf[c - 1]) : keybuf[c - 1];
Now fixed. But I won't use it, i use a completely different Keygen code, inspired from my CPU version.
newbie
Activity: 1
Merit: 0
rx 460 unlocked Micron 4gb (1.35v shiet gddr5 ) srb miner 1.6
Heavy cc 1150 mem 1850 508 h/s
Heavy cc 1225 mem 1850 542 h/s
v7 cc 1000 mem 1850 440 h/s
rx 460 unlocked Hynix 2gb (normal AJR) srb miner 1.6
v7 cc 1000 mem 1970 487 h/s
custom mem straps
    
newbie
Activity: 70
Merit: 0
Hi

Can you let me know how I can set max-cpu-usage of the miner?

Thanks.

You dont. If you dont want to tweak around just use --auto instead of -c config.txt, the miner will in most cases set the appropriate settings. If you have some exotic CPU or get unusally low hashrates, you should tweak around with the config, but this depends very much on the exact CPU you have.
newbie
Activity: 2
Merit: 0
Hi

Can you let me know how I can set max-cpu-usage of the miner?

Thanks.
sr. member
Activity: 728
Merit: 304
Miner Developer
Stak and XMRig are decent, both are copies of Wolf0, like Zawawa btw, but both the code and OpenCL contains goofs that made me laugh, so i just rewrote everything, and kept it close source, i won't do the same error than Wolf0 to open source just to see the fees go to copy-paste morons.

I take serious offence to this grossly inaccurate statement.
I was using W0lf's kernel as it was good enough, and I gave him a substantial portion of the DEVFEE.
I've known him for a while now, I respect his work, and he was quite happy with the compensation.
member
Activity: 350
Merit: 22
I'll release the first version as undocumented/untested to skip the last test phase, like I did for the linux CPU version, but i won't release my current dev version, i'm not satisfyed of it yet.

edit: a bit more satisfied now, i can reach the 268 h/s on my Bonaire, exact same speed as Claymore 9.7, but I need all the 1G memory of the card to get it, while Claymore 9.7 reached such a speed with only 512M. Maybe 268 is hardware max, like the 506 h/s of my Ryzen 1600 that i cannot improve whatever i try.

Now benching more on hd7000.
sr. member
Activity: 1484
Merit: 253
Can you provide some alpha/beta version to test it?
Pages:
Jump to: