Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 112. (Read 444067 times)

legendary
Activity: 1470
Merit: 1114
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling

maybe, but this never happens on Limx version.

Does pool confirm hash rate?

BTW what's the speed of limx vs multi?
hero member
Activity: 2548
Merit: 626
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling

maybe, but this never happens on Limx version.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling
hero member
Activity: 2548
Merit: 626
dont have this problem, look like you have a process using the cpu 0... maybe a gpu miner ?

see the capture on my thread https://bitcointalk.org/?topic=841401

Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...
hero member
Activity: 2548
Merit: 626
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
dont have this problem, look like you have a process using the cpu 0... maybe a gpu miner ?

see the capture on my thread https://bitcointalk.org/?topic=841401
hero member
Activity: 2548
Merit: 626
yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones Wink i tested different compilers recently

and the CPU #0 didnt put results in your log, 4x 66 should be 264

I compiled it myself on mingw.
Yes, there should be 4 threads working, but as you can see on the pic only 3 are working.
WHen i put 5 threads there are really 4 working threads etc etc.
It's always t-1.

Edit:

Edited joblo's cpuminer opt with your optimizations and here everything works ok :

legendary
Activity: 1470
Merit: 1114
Options for optimizing permitation calculation more:

1. move it to scanhash outside the hash loop. Trivial to implement, eliminates calling it every hash loop.

2. Move it to miner_tread when new work detected. Also trivial, eliminates calling when no new work.

3. move it to stratum thread. Slightly more complex to implement, Eliminates calculation by every miner thread.

3 should work, if not I'll fall back until it does.


Edit: They all work as long an endian of ntime is correct, scanhash flips to BE before calling hash.
Not a big improvement.
legendary
Activity: 1470
Merit: 1114

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:
#define swap_vars(a,b) a^=b;  b^=a; a^= b;


there is also the xchg asm function, but we dont care about that, its no more a big issue

xchg    ax, bx       ; Put AX in BX and BX in AX

__xchg() func should exist

I'm still lost in x86 assembly.
legendary
Activity: 1470
Merit: 1114
644kH/s on i7-6700k without AES Groestl.

Edit: from 445

Edit: AES Groestl now works, 810 kH/s!
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:
#define swap_vars(a,b) a^=b;  b^=a; a^= b;


there is also the xchg asm function, but we dont care about that, its no more a big issue

xchg    ax, bx       ; Put AX in BX and BX in AX

__xchg() func should exist
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones Wink i tested different compilers recently

and the CPU #0 didnt put results in your log, 4x 66 should be 264
hero member
Activity: 2548
Merit: 626
Threads 4 , no cpu 0 thread..
This is on windows, you are probably testing it on nix.

legendary
Activity: 1470
Merit: 1114
I think you have some kind of bug : i start mining with 4 threads, but always work only 3.
Yeah its always threads-1 Cheesy
If i put 2 , then it uses 1.
If i put 3, then it uses 2 Cheesy

Thread count is correct for me whether default or specified.
legendary
Activity: 1470
Merit: 1114
see my git, same permut (starting with bmw 80)

before :

Code:
2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:
[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed

Nice find. Calculating the next permutation on every hash is redundant, only need to when new work received.
It could probably be moved up another level or 2. Does it have to be thread specific? It seems each thread will
calculate the same chain. Maybe stratum thread can do it when new work received. I will follow up.

The hashrate is now about what I expect from an unoptimized 8 function chain, ie faster than x11.
And it's easy to see how all that fixed overhead would overwhelm any th erest of the algo. It's starting
to make sense.

I'll now try the drop in opts tpo see if they now behave as expected.

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:
#define swap_vars(a,b) a^=b;  b^=a; a^= b;
hero member
Activity: 2548
Merit: 626
I think you have some kind of bug : i start mining with 4 threads, but always work only 3.
Yeah its always threads-1 Cheesy
If i put 2 , then it uses 1.
If i put 3, then it uses 2 Cheesy
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
hero member
Activity: 2548
Merit: 626
see my git, same permut (starting with bmw 80)

before :

Code:
2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:
[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed

is this the timetraveler ?

edit: oh yeah baby it is Cheesy
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
see my git, same permut (starting with bmw 80)

before :

Code:
2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:
[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed
legendary
Activity: 1470
Merit: 1114
what speed for i7/i5 haswell family?

sorry,i don't write "for cryptonight".i am interesting this algo and i have 85-89 h/s with cpuminer-core-avx2 with -t 4 on i5-4690k.why speed so slow?

You should be able to get about double that. Try with 3 threads, i5 only has 6 MB cache. If you can't do better there
is something with your system. Check CPU usage to be sure no other programs are running. Make sure your memory
is installed correctly for 2 channel operation.
Jump to: