Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 171. (Read 444067 times)

legendary
Activity: 1470
Merit: 1114
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.

OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection.
Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the
AES_NI and non-AES_NI implementations.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
legendary
Activity: 1470
Merit: 1114
Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.

I already tried it and it works perfectly. Just my two cents. 

I don't know what EVP is? Where does it come into play at compile time, run time?
legendary
Activity: 1470
Merit: 1114
Is there a Github repository of this?

Eventually but not yet.
hero member
Activity: 583
Merit: 502
Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.

I already tried it and it works perfectly. Just my two cents. 
legendary
Activity: 1015
Merit: 1000
Is there a Github repository of this?
legendary
Activity: 1470
Merit: 1114
Promissing news for Windows users. I have successfully compiled and run cpuminer-opt using
CygWin. I won't go into a lot of details but install CygWin base, the g++/autoconf toolchain and all
the dependencies required by cpuminer-opt and compile like on linux from the cygwin terminal.

Caveats:

v3.1.9W is a repackage of v3.1.9 and is therefore missing some new algos, optimizations and other changes.
It should be considered beta quality.
Algos are hit and miss, x11 and quark work but cryptonight does not.
Only AES_NI is supported in this release.
Source code is modified from original v3.1.9 so don't try to use it on Linux.

I expect some issues to arise especially for those unfamiliar with Linux and Cygwin. Please be
patient and try to work things out for yourselves. Do some research. When you ask for help please
provide complete information and what you have tried to solve the problem so I don't have to
retrace your steps. I also welcome other more experienced users to help out answering the newbie
questions.

Here's the download link for v3.1.9W:

https://drive.google.com/file/d/0B0lVSGQYLJIZLUU5Njd2bVRKMUE/view?usp=sharing

legendary
Activity: 1470
Merit: 1114
Ok found the problem... The profiler did it Roll Eyes

I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark

and then exported the profile data to KCachegrind to get the graph.

I don't know how running it indirectly can do that, except if it emulates another cpuid.

Normal run is ok, it detects Q8200.


Thanks for the follow up, had me worried when your cpuid was shown correct.
legendary
Activity: 1470
Merit: 1114

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.


This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=btver1bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".





As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs.

bdver1 uses AES, AVX and so on CPU instructions.

Thanks, I also found this tidbit:

AMD Opteron™ and AMD FX series processors with “Bulldozer” processor core (options: -march=bdver1 and -mtune=bdver1) and
AMD processors with “Bobcat” core (options: -march=btver1 and -mtune=btver1).

http://developer.amd.com/community/blog/2012/04/23/gcc-4-7-is-available-with-support-for-amd-opteron-6200-series-and-amd-fx-series-processors/
sr. member
Activity: 312
Merit: 250

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.


This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=btver1bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".





As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs.

bdver1 uses AES, AVX and so on CPU instructions.
full member
Activity: 192
Merit: 100
tried "bdver1", no luck

kim@spiel-2:~/cpuminer-opt-3.1.18$ gcc -march=native -Q --help=target
The following options are target specific:
  -m128bit-long-double              [disabled]
  -m32                              [disabled]
  -m3dnow                           [disabled]
  -m3dnowa                          [disabled]
  -m64                              [enabled]
  -m80387                           [enabled]
  -m8bit-idiv                       [disabled]
  -m96bit-long-double               [enabled]
  -mabi=                            sysv
  -mabm                             [enabled]
  -maccumulate-outgoing-args        [disabled]
  -maddress-mode=                   short
  -madx                             [disabled]
  -maes                             [disabled]
  -malign-double                    [disabled]
  -malign-functions=                0
  -malign-jumps=                    0
  -malign-loops=                    0
  -malign-stringops                 [enabled]
  -mandroid                         [disabled]
  -march=                           amdfam10
  -masm=                            att
  -mavx                             [disabled]
  -mavx2                            [disabled]
  -mavx256-split-unaligned-load    [disabled]
  -mavx256-split-unaligned-store    [disabled]
  -mbionic                          [disabled]
  -mbmi                             [disabled]
  -mbmi2                            [disabled]
  -mbranch-cost=                    0
  -mcld                             [disabled]
  -mcmodel=                         32
  -mcpu=                            
  -mcrc32                           [disabled]
  -mcx16                            [enabled]
  -mdispatch-scheduler              [disabled]
  -mf16c                            [disabled]
  -mfancy-math-387                  [enabled]
  -mfentry                          [enabled]
  -mfma                             [disabled]
  -mfma4                            [disabled]
  -mforce-drap                      [disabled]
  -mfp-ret-in-387                   [enabled]
  -mfpmath=                         387
  -mfsgsbase                        [disabled]
  -mfused-madd                      
  -mfxsr                            [enabled]
  -mglibc                           [enabled]
  -mhard-float                      [enabled]
  -mhle                             [disabled]
  -mieee-fp                         [enabled]
  -mincoming-stack-boundary=        0
  -minline-all-stringops            [disabled]
  -minline-stringops-dynamically    [disabled]
  -mintel-syntax                    
  -mlarge-data-threshold=           0x10000
  -mlong-double-64                  [disabled]
  -mlong-double-80                  [enabled]
  -mlwp                             [disabled]
  -mlzcnt                           [enabled]
  -mmmx                             [disabled]
  -mmovbe                           [disabled]
  -mms-bitfields                    [disabled]
  -mno-align-stringops              [disabled]
  -mno-fancy-math-387               [disabled]
  -mno-push-args                    [disabled]
  -mno-red-zone                     [disabled]
  -mno-sse4                         [enabled]
  -momit-leaf-frame-pointer         [disabled]
  -mpc32                            [disabled]
  -mpc64                            [disabled]
  -mpc80                            [disabled]
  -mpclmul                          [disabled]
  -mpopcnt                          [enabled]
  -mprefer-avx128                   [disabled]
  -mpreferred-stack-boundary=       0
  -mprfchw                          [enabled]
  -mpush-args                       [enabled]
  -mrdrnd                           [disabled]
  -mrdseed                          [disabled]
  -mrecip                           [disabled]
  -mrecip=                          
  -mred-zone                        [enabled]
  -mregparm=                        0
  -mrtd                             [disabled]
  -mrtm                             [disabled]
  -msahf                            [enabled]
  -msoft-float                      [disabled]
  -msse                             [disabled]
  -msse2                            [disabled]
  -msse2avx                         [disabled]
  -msse3                            [disabled]
  -msse4                            [disabled]
  -msse4.1                          [disabled]
  -msse4.2                          [disabled]
  -msse4a                           [disabled]
  -msse5                            
  -msseregparm                      [disabled]
  -mssse3                           [disabled]
  -mstack-arg-probe                 [disabled]
  -mstackrealign                    [enabled]
  -mstringop-strategy=              [default]
  -mtbm                             [disabled]
  -mtls-dialect=                    gnu
  -mtls-direct-seg-refs             [enabled]
  -mtune=                           amdfam10
  -muclibc                          [disabled]
  -mveclibabi=                      [default]
  -mvect8-ret-in-mem                [disabled]
  -mvzeroupper                      [disabled]
  -mx32                             [disabled]
  -mxop                             [disabled]
  -mxsave                           [disabled]
  -mxsaveopt                        [disabled]

  Known assembler dialects (for use with the -masm-dialect= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2
legendary
Activity: 1708
Merit: 1049
Ok found the problem... The profiler did it Roll Eyes

I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark

and then exported the profile data to KCachegrind to get the graph.

I don't know how running it indirectly can do that, except if it emulates another cpuid.

Normal run is ok, it detects Q8200.
legendary
Activity: 1708
Merit: 1049
Let me get this straight. You compiled with -march=native on a core2 that thinks it's a i5-670.

Yep...

Quote
The compile succeeded and the miner ran ok. That's pretty special.

.16 was broken due to some errors (algogate? can't remember) which I removed manually from all the sources, but .18 runs ok.

Quote
The CPU model and AES support comes directly from CPUID and has been reliable until now.
Even the AMD guys haven't reported CPUID problems.

Can you confirm CPUID is correct:
Code:
cat /proc/cpuinfo |grep model

cat /proc/cpuinfo |grep model
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz


cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
stepping        : 7
microcode       : 0x70a
cpu MHz         : 1754.042
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dtherm
bugs            :
bogomips        : 3508.08
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


Quote
It would be interesting to see what the compiler thought:
Code:
gcc -march=native -Q --help=target | fgrep march

gcc -march=native -Q --help=target | fgrep march
=
  -march=                               core2

Quote
Regarding echo512, yes you will get the slow version. Unfortunately I'm unaware of a SSE2 optimized
version and the AES version is already used by cpuminer-opt on capable CPUs.

In case when you need to see performances, or find sources:

https://bench.cr.yp.to/primitives-sha3.html
+
sources of every possible variant here: https://github.com/floodyberry/supercop/tree/master/crypto_hash


Quote
I have yet to study your data in any detail but the following may put performance into perspective.

Echo512 and groestl have AES optimizations for most algos.

Cryptonight and hodl have their own unique AES optimizations.

The rest of the x11 chain, including groestl but excluding echo, have SSE2 optimized versions.
The algos in the longer X chains, as well as non-aes echo, are filled with slow SPH versions.

Yeah, lacking AES (and AVX) hurts a lot.
legendary
Activity: 1470
Merit: 1114
Images snipped.

3.1.18 is kind'a buggy in cpu detection. In a Q8200 it says:

Checking CPU capatibility...
        Intel(R) Core(TM) i5 CPU         670  @ 3.47GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........NO.
   Algo supports AES_NI.......YES.
CPU and algo support AES_NI, but SW build does not.
Rebuild with "-march=native" for better performance.
Starting mining without AES_NI optimizations...

...so naturally it thinks I have AES, even though the build is not for AES (-march=native / no AES in Intel quad core q8200).

Anyway I did a profile run in x11 to check the slowdowns:

A couple of them, with echo512 being the biggest culprit, dominate the process in terms of time wasted.

It seems there is a very optimized AES version for it: https://bench.cr.yp.to/impl-hash/echo512.html

https://bench.cr.yp.to/impl-hash/echo512.html
https://github.com/floodyberry/supercop/tree/master/crypto_hash/echo512/aes/aes64

Let me get this straight. You compiled with -march=native on a core2 that thinks it's a i5-670.
The compile succeeded and the miner ran ok. That's pretty special.

The CPU model and AES support comes directly from CPUID and has been reliable until now.
Even the AMD guys haven't reported CPUID problems.

Can you confirm CPUID is correct:
Code:
cat /proc/cpuinfo |grep model

It would be interesting to see what the compiler thought:
Code:
gcc -march=native -Q --help=target | fgrep march

Regarding echo512, yes you will get the slow version. Unfortunately I'm unaware of a SSE2 optimized
version and the AES version is already used by cpuminer-opt on capable CPUs.

I have yet to study your data in any detail but the following may put performance into perspective.

Echo512 and groestl have AES optimizations for most algos.

Cryptonight and hodl have their own unique AES optimizations.

The rest of the x11 chain, including groestl but excluding echo, have SSE2 optimized versions.
The algos in the longer X chains, as well as non-aes echo, are filled with slow SPH versions.
legendary
Activity: 1708
Merit: 1049
3.1.18 is kind'a buggy in cpu detection. In a Q8200 it says:

Checking CPU capatibility...
        Intel(R) Core(TM) i5 CPU         670  @ 3.47GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........NO.
   Algo supports AES_NI.......YES.
CPU and algo support AES_NI, but SW build does not.
Rebuild with "-march=native" for better performance.
Starting mining without AES_NI optimizations...

...so naturally it thinks I have AES, even though the build is not for AES (-march=native / no AES in Intel quad core q8200).

Anyway I did a profile run in x11 to check the slowdowns:



A couple of them, with echo512 being the biggest culprit, dominate the process in terms of time wasted.

It seems there is a very optimized AES version for it: https://bench.cr.yp.to/impl-hash/echo512.html



https://bench.cr.yp.to/impl-hash/echo512.html
https://github.com/floodyberry/supercop/tree/master/crypto_hash/echo512/aes/aes64
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.1.18 is one of the best in recent version history.
Talking both Intel and AMD wise.

Thank you for your hard work!

Can't wait to see what branch 3.2 will offer.  Wink

The algo-gate work won't be user visible but will make it easier to add new algos.
Thanks for your support and invaluable help in getting to the bottom of the compile issue.
sr. member
Activity: 312
Merit: 250
cpuminer-opt v3.1.18 is one of the best in recent version history.
Talking both Intel and AMD wise.

Thank you for your hard work!

Can't wait to see what branch 3.2 will offer.  Wink
legendary
Activity: 1470
Merit: 1114
With things, hopefully, settling down after a period of rapid development
and many algo additions algo-gate has evolved significantly and delelopped
some mutations. Some functions developped multiple personalities or had names
that made no sense which deviated from the highly structured goals of the gate system.
Nevertheless it proved its worth when merging new algos by isolating all the algo
specific code. It also showed promise in gating other functions such as jsonrpc2.

I'll start working on v3.2 and do some genetic engineering on algo-gate so it doesn't
turn into a monster. I will also convert jsonrpc2 functions to use the gate and possibly
others

In the meantime only bug fixes and high priority features will be added to the 3.1
stream.
legendary
Activity: 1470
Merit: 1114
with -march=native o.k., with "btver1" negative

Checking CPU capatibility...
        AMD Sempron(tm) 145 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........YES.
Starting mining without AES_NI optimizations...

[2016-04-23 22:06:15] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-04-23 22:06:15] 1 miner threads started, using 'hodl' algorithm.
[2016-04-23 22:06:18] Stratum difficulty set to 1
[2016-04-23 22:07:07] hodl.suprnova.cc:4693 hodl block 43933
[2016-04-23 22:07:07] CPU #0: 373 H, 7.66 H/s
[2016-04-23 22:08:17] CPU #0: 529 H, 7.63 H/s
[2016-04-23 22:08:17] accepted: 1/1 (100%), 529 H, 7.63 H/s yes!

Edit @Giulini: what do you get from "gcc -march=native -Q --help=target"?

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.


This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=btver1bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".


full member
Activity: 192
Merit: 100
with -march=native o.k., with "btver1" negative

Checking CPU capatibility...
        AMD Sempron(tm) 145 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........YES.
Starting mining without AES_NI optimizations...

[2016-04-23 22:06:15] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-04-23 22:06:15] 1 miner threads started, using 'hodl' algorithm.
[2016-04-23 22:06:18] Stratum difficulty set to 1
[2016-04-23 22:07:07] hodl.suprnova.cc:4693 hodl block 43933
[2016-04-23 22:07:07] CPU #0: 373 H, 7.66 H/s
[2016-04-23 22:08:17] CPU #0: 529 H, 7.63 H/s
[2016-04-23 22:08:17] accepted: 1/1 (100%), 529 H, 7.63 H/s yes!
Jump to: