[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 171.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: pallas on April 27, 2016, 04:05:55 PM

Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.

OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection.
Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the
AES_NI and non-AES_NI implementations.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: GoldTiger69 on April 27, 2016, 08:03:49 AM

Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.

I already tried it and it works perfectly. Just my two cents.

I don't know what EVP is? Where does it come into play at compile time, run time?

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: michelem on April 27, 2016, 04:09:14 AM

Is there a Github repository of this?

Eventually but not yet.

GoldTiger69

hero member

Activity: 583

Merit: 502

Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.

I already tried it and it works perfectly. Just my two cents.

michelem

legendary

Activity: 1015

Merit: 1000

Is there a Github repository of this?

joblo

legendary

Activity: 1470

Merit: 1114

Promissing news for Windows users. I have successfully compiled and run cpuminer-opt using
CygWin. I won't go into a lot of details but install CygWin base, the g++/autoconf toolchain and all
the dependencies required by cpuminer-opt and compile like on linux from the cygwin terminal.

Caveats:

v3.1.9W is a repackage of v3.1.9 and is therefore missing some new algos, optimizations and other changes.
It should be considered beta quality.
Algos are hit and miss, x11 and quark work but cryptonight does not.
Only AES_NI is supported in this release.
Source code is modified from original v3.1.9 so don't try to use it on Linux.

I expect some issues to arise especially for those unfamiliar with Linux and Cygwin. Please be
patient and try to work things out for yourselves. Do some research. When you ask for help please
provide complete information and what you have tried to solve the problem so I don't have to
retrace your steps. I also welcome other more experienced users to help out answering the newbie
questions.

Here's the download link for v3.1.9W:

https://drive.google.com/file/d/0B0lVSGQYLJIZLUU5Njd2bVRKMUE/view?usp=sharing

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: AlexGR on April 23, 2016, 11:50:46 PM

Ok found the problem... The profiler did it Roll Eyes

I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark

and then exported the profile data to KCachegrind to get the graph.

I don't know how running it indirectly can do that, except if it emulates another cpuid.

Normal run is ok, it detects Q8200.

Thanks for the follow up, had me worried when your cpuid was shown correct.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: th3.r00t on April 24, 2016, 03:50:03 AM

Quote from: joblo on April 23, 2016, 04:06:49 PM

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.

This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=~~btver1~~bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".

As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs.

bdver1 uses AES, AVX and so on CPU instructions.

Thanks, I also found this tidbit:

AMD Opteron™ and AMD FX series processors with “Bulldozer” processor core (options: -march=bdver1 and -mtune=bdver1) and
AMD processors with “Bobcat” core (options: -march=btver1 and -mtune=btver1).

http://developer.amd.com/community/blog/2012/04/23/gcc-4-7-is-available-with-support-for-amd-opteron-6200-series-and-amd-fx-series-processors/

th3.r00t

sr. member

Activity: 312

Merit: 250

Quote from: joblo on April 23, 2016, 04:06:49 PM

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.

This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=~~btver1~~bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".

As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs.

bdver1 uses AES, AVX and so on CPU instructions.

Giulini

full member

Activity: 192

Merit: 100

tried "bdver1", no luck

kim@spiel-2:~/cpuminer-opt-3.1.18$ gcc -march=native -Q --help=target
The following options are target specific:
-m128bit-long-double       [disabled]
-m32       [disabled]
-m3dnow        [disabled]
-m3dnowa       [disabled]
-m64       [enabled]
-m80387        [enabled]
-m8bit-idiv        [disabled]
-m96bit-long-double        [enabled]
-mabi=       sysv
-mabm        [enabled]
-maccumulate-outgoing-args       [disabled]
-maddress-mode=        short
-madx        [disabled]
-maes        [disabled]
-malign-double       [disabled]
-malign-functions=       0
-malign-jumps=       0
-malign-loops=       0
-malign-stringops        [enabled]
-mandroid        [disabled]
-march=        amdfam10
-masm=       att
-mavx        [disabled]
-mavx2       [disabled]
-mavx256-split-unaligned-load    [disabled]
-mavx256-split-unaligned-store    [disabled]
-mbionic       [disabled]
-mbmi        [disabled]
-mbmi2       [disabled]
-mbranch-cost=       0
-mcld        [disabled]
-mcmodel=        32
-mcpu=
-mcrc32        [disabled]
-mcx16       [enabled]
-mdispatch-scheduler       [disabled]
-mf16c       [disabled]
-mfancy-math-387       [enabled]
-mfentry       [enabled]
-mfma        [disabled]
-mfma4       [disabled]
-mforce-drap       [disabled]
-mfp-ret-in-387        [enabled]
-mfpmath=        387
-mfsgsbase       [disabled]
-mfused-madd
-mfxsr       [enabled]
-mglibc        [enabled]
-mhard-float       [enabled]
-mhle        [disabled]
-mieee-fp        [enabled]
-mincoming-stack-boundary=       0
-minline-all-stringops       [disabled]
-minline-stringops-dynamically    [disabled]
-mintel-syntax
-mlarge-data-threshold=        0x10000
-mlong-double-64       [disabled]
-mlong-double-80       [enabled]
-mlwp        [disabled]
-mlzcnt        [enabled]
-mmmx        [disabled]
-mmovbe        [disabled]
-mms-bitfields       [disabled]
-mno-align-stringops       [disabled]
-mno-fancy-math-387        [disabled]
-mno-push-args       [disabled]
-mno-red-zone        [disabled]
-mno-sse4        [enabled]
-momit-leaf-frame-pointer        [disabled]
-mpc32       [disabled]
-mpc64       [disabled]
-mpc80       [disabled]
-mpclmul       [disabled]
-mpopcnt       [enabled]
-mprefer-avx128        [disabled]
-mpreferred-stack-boundary=       0
-mprfchw       [enabled]
-mpush-args        [enabled]
-mrdrnd        [disabled]
-mrdseed       [disabled]
-mrecip        [disabled]
-mrecip=
-mred-zone       [enabled]
-mregparm=       0
-mrtd        [disabled]
-mrtm        [disabled]
-msahf       [enabled]
-msoft-float       [disabled]
-msse        [disabled]
-msse2       [disabled]
-msse2avx        [disabled]
-msse3       [disabled]
-msse4       [disabled]
-msse4.1       [disabled]
-msse4.2       [disabled]
-msse4a        [disabled]
-msse5
-msseregparm       [disabled]
-mssse3        [disabled]
-mstack-arg-probe        [disabled]
-mstackrealign       [enabled]
-mstringop-strategy=       [default]
-mtbm        [disabled]
-mtls-dialect=       gnu
-mtls-direct-seg-refs        [enabled]
-mtune=        amdfam10
-muclibc       [disabled]
-mveclibabi=       [default]
-mvect8-ret-in-mem       [disabled]
-mvzeroupper       [disabled]
-mx32        [disabled]
-mxop        [disabled]
-mxsave        [disabled]
-mxsaveopt       [disabled]

Known assembler dialects (for use with the -masm-dialect= option):
att intel

Known ABIs (for use with the -mabi= option):
ms sysv

Known code models (for use with the -mcmodel= option):
32 kernel large medium small

Valid arguments to -mfpmath=:
387 387+sse 387,sse both sse sse+387 sse,387

Known vectorization library ABIs (for use with the -mveclibabi= option):
acml svml

Known address mode (for use with the -maddress-mode= option):
long short

Valid arguments to -mstringop-strategy=:
byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop

Known TLS dialects (for use with the -mtls-dialect= option):
gnu gnu2

AlexGR

legendary

Activity: 1708

Merit: 1049

Ok found the problem... The profiler did it Roll Eyes

I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark

and then exported the profile data to KCachegrind to get the graph.

I don't know how running it indirectly can do that, except if it emulates another cpuid.

Normal run is ok, it detects Q8200.

AlexGR

legendary

Activity: 1708

Merit: 1049

Quote from: joblo on April 23, 2016, 10:13:34 PM

Let me get this straight. You compiled with -march=native on a core2 that thinks it's a i5-670.

Yep...

Quote

The compile succeeded and the miner ran ok. That's pretty special.

.16 was broken due to some errors (algogate? can't remember) which I removed manually from all the sources, but .18 runs ok.

Quote

The CPU model and AES support comes directly from CPUID and has been reliable until now.
Even the AMD guys haven't reported CPUID problems.

Can you confirm CPUID is correct:

Code:

cat /proc/cpuinfo |grep model

cat /proc/cpuinfo |grep model
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz

cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz
stepping : 7
microcode : 0x70a
cpu MHz : 1754.042
cache size : 2048 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dtherm
bugs :
bogomips : 3508.08
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Quote

It would be interesting to see what the compiler thought:

Code:

gcc -march=native -Q --help=target | fgrep march

gcc -march=native -Q --help=target | fgrep march
=
-march= core2

Quote

Regarding echo512, yes you will get the slow version. Unfortunately I'm unaware of a SSE2 optimized
version and the AES version is already used by cpuminer-opt on capable CPUs.

In case when you need to see performances, or find sources:

https://bench.cr.yp.to/primitives-sha3.html
+
sources of every possible variant here: https://github.com/floodyberry/supercop/tree/master/crypto_hash

Quote

I have yet to study your data in any detail but the following may put performance into perspective.

Echo512 and groestl have AES optimizations for most algos.

Cryptonight and hodl have their own unique AES optimizations.

The rest of the x11 chain, including groestl but excluding echo, have SSE2 optimized versions.
The algos in the longer X chains, as well as non-aes echo, are filled with slow SPH versions.

Yeah, lacking AES (and AVX) hurts a lot.

joblo

legendary

Activity: 1470

Merit: 1114

Images snipped.

Quote from: AlexGR on April 23, 2016, 09:13:41 PM

3.1.18 is kind'a buggy in cpu detection. In a Q8200 it says:

Checking CPU capatibility...
   Intel(R) Core(TM) i5 CPU 670 @ 3.47GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........NO.
   Algo supports AES_NI.......YES.
CPU and algo support AES_NI, but SW build does not.
Rebuild with "-march=native" for better performance.
Starting mining without AES_NI optimizations...

...so naturally it thinks I have AES, even though the build is not for AES (-march=native / no AES in Intel quad core q8200).

Anyway I did a profile run in x11 to check the slowdowns:

A couple of them, with echo512 being the biggest culprit, dominate the process in terms of time wasted.

It seems there is a very optimized AES version for it: https://bench.cr.yp.to/impl-hash/echo512.html

https://bench.cr.yp.to/impl-hash/echo512.html
https://github.com/floodyberry/supercop/tree/master/crypto_hash/echo512/aes/aes64

Let me get this straight. You compiled with -march=native on a core2 that thinks it's a i5-670.
The compile succeeded and the miner ran ok. That's pretty special.

The CPU model and AES support comes directly from CPUID and has been reliable until now.
Even the AMD guys haven't reported CPUID problems.

Can you confirm CPUID is correct:

Code:

cat /proc/cpuinfo |grep model

It would be interesting to see what the compiler thought:

Code:

gcc -march=native -Q --help=target | fgrep march

Regarding echo512, yes you will get the slow version. Unfortunately I'm unaware of a SSE2 optimized
version and the AES version is already used by cpuminer-opt on capable CPUs.

I have yet to study your data in any detail but the following may put performance into perspective.

Echo512 and groestl have AES optimizations for most algos.

Cryptonight and hodl have their own unique AES optimizations.

The rest of the x11 chain, including groestl but excluding echo, have SSE2 optimized versions.
The algos in the longer X chains, as well as non-aes echo, are filled with slow SPH versions.

AlexGR

legendary

Activity: 1708

Merit: 1049

3.1.18 is kind'a buggy in cpu detection. In a Q8200 it says:

Checking CPU capatibility...
   Intel(R) Core(TM) i5 CPU 670 @ 3.47GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........NO.
   Algo supports AES_NI.......YES.
CPU and algo support AES_NI, but SW build does not.
Rebuild with "-march=native" for better performance.
Starting mining without AES_NI optimizations...

...so naturally it thinks I have AES, even though the build is not for AES (-march=native / no AES in Intel quad core q8200).

Anyway I did a profile run in x11 to check the slowdowns:

A couple of them, with echo512 being the biggest culprit, dominate the process in terms of time wasted.

It seems there is a very optimized AES version for it: https://bench.cr.yp.to/impl-hash/echo512.html

https://bench.cr.yp.to/impl-hash/echo512.html
https://github.com/floodyberry/supercop/tree/master/crypto_hash/echo512/aes/aes64

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: th3.r00t on April 23, 2016, 05:12:16 PM

cpuminer-opt v3.1.18 is one of the best in recent version history.
Talking both Intel and AMD wise.

Thank you for your hard work!

Can't wait to see what branch 3.2 will offer. Wink

The algo-gate work won't be user visible but will make it easier to add new algos.
Thanks for your support and invaluable help in getting to the bottom of the compile issue.

th3.r00t

sr. member

Activity: 312

Merit: 250

cpuminer-opt v3.1.18 is one of the best in recent version history.
Talking both Intel and AMD wise.

Thank you for your hard work!

Can't wait to see what branch 3.2 will offer. Wink

joblo

legendary

Activity: 1470

Merit: 1114

With things, hopefully, settling down after a period of rapid development
and many algo additions algo-gate has evolved significantly and delelopped
some mutations. Some functions developped multiple personalities or had names
that made no sense which deviated from the highly structured goals of the gate system.
Nevertheless it proved its worth when merging new algos by isolating all the algo
specific code. It also showed promise in gating other functions such as jsonrpc2.

I'll start working on v3.2 and do some genetic engineering on algo-gate so it doesn't
turn into a monster. I will also convert jsonrpc2 functions to use the gate and possibly
others

In the meantime only bug fixes and high priority features will be added to the 3.1
stream.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Giulini on April 23, 2016, 03:17:29 PM

with -march=native o.k., with "btver1" negative

Checking CPU capatibility...
   AMD Sempron(tm) 145 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........YES.
Starting mining without AES_NI optimizations...

[2016-04-23 22:06:15] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-04-23 22:06:15] 1 miner threads started, using 'hodl' algorithm.
[2016-04-23 22:06:18] Stratum difficulty set to 1
[2016-04-23 22:07:07] hodl.suprnova.cc:4693 hodl block 43933
[2016-04-23 22:07:07] CPU #0: 373 H, 7.66 H/s
[2016-04-23 22:08:17] CPU #0: 529 H, 7.63 H/s
[2016-04-23 22:08:17] accepted: 1/1 (100%), 529 H, 7.63 H/s yes!

Edit @Giulini: what do you get from "gcc -march=native -Q --help=target"?

Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be
faster for some algos.

This might be the best I can come up with. Now that you both have it figured out for your
own situation are the tips in README.md clear enough for other users? I've added another phrase
to the existing in italic.

Some users with AMD CPUs without AES_NI have reported problems compiling
with build.sh or "-march=native". Problems have included compile errors
and poor performance. These users are recommended to compile manually
specifying "-march=~~btver1~~bdver1" on the configure command line. If all else fails
"-march=core2" will provide the best compatibility but the lowest performance".

Giulini

full member

Activity: 192

Merit: 100

with -march=native o.k., with "btver1" negative

Checking CPU capatibility...
AMD Sempron(tm) 145 Processor
CPU arch supports AES_NI...NO.
CPU arch supports SSE2.....YES.
SW built for SSE2..........YES.
Starting mining without AES_NI optimizations...

[2016-04-23 22:06:15] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-04-23 22:06:15] 1 miner threads started, using 'hodl' algorithm.
[2016-04-23 22:06:18] Stratum difficulty set to 1
[2016-04-23 22:07:07] hodl.suprnova.cc:4693 hodl block 43933
[2016-04-23 22:07:07] CPU #0: 373 H, 7.66 H/s
[2016-04-23 22:08:17] CPU #0: 529 H, 7.63 H/s
[2016-04-23 22:08:17] accepted: 1/1 (100%), 529 H, 7.63 H/s yes!

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 171. (Read 444131 times)