[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 112.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: doktor83 on January 26, 2017, 04:55:25 PM

Quote from: Epsylon3 on January 26, 2017, 04:52:00 PM

Quote from: doktor83 on January 26, 2017, 04:29:06 PM

Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling

maybe, but this never happens on Limx version.

Does pool confirm hash rate?

BTW what's the speed of limx vs multi?

doktor83

hero member

Activity: 2548

Merit: 626

Quote from: Epsylon3 on January 26, 2017, 04:52:00 PM

Quote from: doktor83 on January 26, 2017, 04:29:06 PM

Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling

maybe, but this never happens on Limx version.

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

Quote from: doktor83 on January 26, 2017, 04:29:06 PM

Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

should be the thermal throttling

doktor83

hero member

Activity: 2548

Merit: 626

Quote from: Epsylon3 on January 26, 2017, 04:14:29 PM

dont have this problem, look like you have a process using the cpu 0... maybe a gpu miner ?

see the capture on my thread https://bitcointalk.org/?topic=841401

Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...

I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024.
But still the falling hashrate ...

doktor83

hero member

Activity: 2548

Merit: 626

damn gremlins Cool

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

dont have this problem, look like you have a process using the cpu 0... maybe a gpu miner ?

see the capture on my thread https://bitcointalk.org/?topic=841401

doktor83

hero member

Activity: 2548

Merit: 626

Quote from: Epsylon3 on January 26, 2017, 03:02:38 PM

yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones Wink

i tested different compilers recently

and the CPU #0 didnt put results in your log, 4x 66 should be 264

I compiled it myself on mingw.
Yes, there should be 4 threads working, but as you can see on the pic only 3 are working.
WHen i put 5 threads there are really 4 working threads etc etc.
It's always t-1.

Edit:

Edited joblo's cpuminer opt with your optimizations and here everything works ok :

joblo

legendary

Activity: 1470

Merit: 1114

Options for optimizing permitation calculation more:

1. move it to scanhash outside the hash loop. Trivial to implement, eliminates calling it every hash loop.

2. Move it to miner_tread when new work detected. Also trivial, eliminates calling when no new work.

3. move it to stratum thread. Slightly more complex to implement, Eliminates calculation by every miner thread.

3 should work, if not I'll fall back until it does.

Edit: They all work as long an endian of ntime is correct, scanhash flips to BE before calling hash.
Not a big improvement.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Epsylon3 on January 26, 2017, 03:05:23 PM

Quote from: joblo on January 26, 2017, 02:47:31 PM

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:

#define swap_vars(a,b) a^=b;  b^=a; a^= b;

there is also the xchg asm function, but we dont care about that, its no more a big issue

xchg ax, bx ; Put AX in BX and BX in AX

__xchg() func should exist

I'm still lost in x86 assembly.

joblo

legendary

Activity: 1470

Merit: 1114

644kH/s on i7-6700k without AES Groestl.

Edit: from 445

Edit: AES Groestl now works, 810 kH/s!

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

Quote from: joblo on January 26, 2017, 02:47:31 PM

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:

#define swap_vars(a,b) a^=b;  b^=a; a^= b;

there is also the xchg asm function, but we dont care about that, its no more a big issue

xchg ax, bx ; Put AX in BX and BX in AX

__xchg() func should exist

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones Wink

i tested different compilers recently

and the CPU #0 didnt put results in your log, 4x 66 should be 264

doktor83

hero member

Activity: 2548

Merit: 626

Threads 4 , no cpu 0 thread..
This is on windows, you are probably testing it on nix.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: doktor83 on January 26, 2017, 01:59:36 PM

I think you have some kind of bug : i start mining with 4 threads, but always work only 3.
Yeah its always threads-1 Cheesy

If i put 2 , then it uses 1.
If i put 3, then it uses 2 Cheesy

Thread count is correct for me whether default or specified.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Epsylon3 on January 26, 2017, 01:31:14 PM

see my git, same permut (starting with bmw 80)

before :

Code:

2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:

[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed

Nice find. Calculating the next permutation on every hash is redundant, only need to when new work received.
It could probably be moved up another level or 2. Does it have to be thread specific? It seems each thread will
calculate the same chain. Maybe stratum thread can do it when new work received. I will follow up.

The hashrate is now about what I expect from an unoptimized 8 function chain, ie faster than x11.
And it's easy to see how all that fixed overhead would overwhelm any th erest of the algo. It's starting
to make sense.

I'll now try the drop in opts tpo see if they now behave as expected.

On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with
almost any type and can modify both args without pointers.

Code:

#define swap_vars(a,b) a^=b;  b^=a; a^= b;

doktor83

hero member

Activity: 2548

Merit: 626

I think you have some kind of bug : i start mining with 4 threads, but always work only 3.
Yeah its always threads-1 Cheesy

If i put 2 , then it uses 1.
If i put 3, then it uses 2 Cheesy

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

doktor83

hero member

Activity: 2548

Merit: 626

Quote from: Epsylon3 on January 26, 2017, 01:31:14 PM

see my git, same permut (starting with bmw 80)

before :

Code:

2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:

[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed

is this the timetraveler ?

edit: oh yeah baby it is Cheesy

Epsylon3

legendary

Activity: 1484

Merit: 1082

ccminer/cpuminer developer

see my git, same permut (starting with bmw 80)

before :

Code:

2017-01-26 19:16:55] CPU #1: 1.26 kH/s
[2017-01-26 19:16:55] CPU #2: 1.33 kH/s
[2017-01-26 19:16:55] CPU #0: 1.27 kH/s
[2017-01-26 19:17:28] timetravel block 387151, diff 0.419

now:

Code:

[2017-01-26 19:20:33] CPU #2: 78.91 kH/s
[2017-01-26 19:20:33] CPU #3: 76.86 kH/s
[2017-01-26 19:20:33] CPU #1: 74.50 kH/s
[2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!

with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Newton90 on January 26, 2017, 10:56:46 AM

Quote from: Newton90 on January 21, 2017, 11:00:05 AM

what speed for i7/i5 haswell family?

sorry,i don't write "for cryptonight".i am interesting this algo and i have 85-89 h/s with cpuminer-core-avx2 with -t 4 on i5-4690k.why speed so slow?

You should be able to get about double that. Try with 3 threads, i5 only has 6 MB cache. If you can't do better there
is something with your system. Check CPU usage to be sure no other programs are running. Make sure your memory
is installed correctly for 2 channel operation.

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 112. (Read 444131 times)