[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 62.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: nizzuu on October 12, 2017, 12:18:27 AM

Hi, the aes-avx2 windows binary build crashes on i5-7600 while using hsr algo (-t 4 -a hsr). No issues for aes-avx version.

Looking into it.

4ward

member

Activity: 473

Merit: 18

Quote from: Wh1teKn1ght on October 11, 2017, 11:28:47 PM

Isn't AES, AVX better than SSE2? My CPU shows support for AES and AVX, yet the algo being used is SSE2. I am running the cpuminer-aes-avx.exe so why is it using SSE2 algo?

Thanks

Because algo supports SSE2 only

nizzuu

full member

Activity: 187

Merit: 100

Cryptocurrency enthusiast

Hi, the aes-avx2 windows binary build crashes on i5-7600 while using hsr algo (-t 4 -a hsr). No issues for aes-avx version.

Wh1teKn1ght

sr. member

Activity: 339

Merit: 251

Isn't AES, AVX better than SSE2? My CPU shows support for AES and AVX, yet the algo being used is SSE2. I am running the cpuminer-aes-avx.exe so why is it using SSE2 algo?

Thanks

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: ljglug on October 10, 2017, 11:09:43 AM

Quote from: joblo on January 13, 2016, 02:35:45 PM

New in v3.6.9

Added phi1612 algo for LUX coin
Added x13sm3 algo, alias hsr, for Hshare coin

Hshare coin coins do not seem to be mine
What is the updated algorithm for mining the coin?

Thanks for reposting the first post but I've already read it, maybe you should read it too.

ljglug

full member

Activity: 168

Merit: 100

Quote from: joblo on January 13, 2016, 02:35:45 PM

This is the home of cpuminer-opt, The optimized CPU miner.

cpuminer-opt now supports over 50 algorithms with more than 20 optimized to use
AES_NI, AVX, AVX2 and SHA on capable CPUs.

Source code:

git: https://github.com/JayDDee/cpuminer-opt

tarball: https://drive.google.com/file/d/0B0lVSGQYLJIZdy1fQ2o5T1otXzQ/view?usp=sharing

Windows binaries

https://drive.google.com/file/d/0B0lVSGQYLJIZVWtXa1RZcE0wWEE/view?usp=sharing

New in v3.6.9

Added phi1612 algo for LUX coin
Added x13sm3 algo, alias hsr, for Hshare coin

Legacy version 3.5.9.1 May provide better performance on some algos with older CPUs that
don't have AES NI. Most users should not use it.

git clone https://github.com/JayDDee/cpuminer-opt -b legacy

Tarball: https://drive.google.com/file/d/0B0lVSGQYLJIZcDg0d0QzbzJBUDA/view?usp=sharing

Windows Binaries: https://drive.google.com/file/d/0B0lVSGQYLJIZT0tlY3o4ZjEycXM/view?usp=sharing

Security warning

Miner programs are often flagged as malware by antivirus programs. This is
a false positive, they are flagged simply because they are miners. The source
code is open for anyone to inspect. If you don't trust the software, don't use
it.

The cryptographic code has been taken from trusted sources but has been
modified for speed at the expense of accepted security practices. This
code should not be imported into applications where secure cryptography is
required.

Errata:

AMD CPUs older than Piledriver, including Athlon x2 and Phenom II x4, are not
supported by cpuminer-opt due to an incompatible implementation of SSE2 on
these CPUs. Some algos may crash the miner with an invalid instruction.
Users are recommended to use an unoptimized miner such as cpuminer-multi.

Solo mining of cryptonight does not work.

Bench stats collection enabled (-p stats) when mining timetravel causes miner to exit after
50 share submissions.

cpuminer-opt does not work mining Decred algo at Nicehash and produces only
"invalid extranonce2 size" rejects.

Benchmark testing does not work for x11evo.

Requirements:

1. A x86_64 architecture CPU with a minimum of SSE2 support. This includes Intel
Core2 and newer and AMD equivalents. In order to take advantage of AES_NI
optimizations a CPU with AES_NI is required. This includes Intel Westbridge
and newer and AMD equivalents. Further optimizations are available on some algos
for CPUs with AVX and AVX2, Sandybridge and Haswell respectively.

Older CPUs are supported by cpuminer-multi by TPruvot but at reduced performance.

2. 64 bit Linux OS. Ubuntu and Fedora based distributions, including Mint and Centos are known
to work and have all dependencies in their repositories. Others may work but may require
more effort.

64 bit Windows OS is supported using the pre-compiled binaries package or may be compiled
with mingw_w64 and msys.

Hshare coin coins do not seem to be mine
What is the updated algorithm for mining the coin?

guytechie

hero member

Activity: 677

Merit: 500

Quote from: joblo on October 10, 2017, 09:09:38 AM

Quote from: guytechie on October 10, 2017, 08:39:04 AM

Hi, I haven't been on here for a while now. Just wondering if the Windows binary is now compiled with SHA support working. I have both a Ryzen 1700 and a TR 1950X I'd like to fully utilize when there's down time, which both have hardware SHA acceleration if I am correct.

Thanks!

Unfortunately I can't yet compile with SHA. One user claims to have successfully done it on Windows but didn't share the procedure
so I'm now skeptical.

Thanks for the update. It's a pity, as I hear m7m does much better with SHA optimizations.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: guytechie on October 10, 2017, 08:39:04 AM

Hi, I haven't been on here for a while now. Just wondering if the Windows binary is now compiled with SHA support working. I have both a Ryzen 1700 and a TR 1950X I'd like to fully utilize when there's down time, which both have hardware SHA acceleration if I am correct.

Thanks!

Unfortunately I can't yet compile with SHA. One user claims to have successfully done it on Windows but didn't share the procedure
so I'm now skeptical.

guytechie

hero member

Activity: 677

Merit: 500

Hi, I haven't been on here for a while now. Just wondering if the Windows binary is now compiled with SHA support working. I have both a Ryzen 1700 and a TR 1950X I'd like to fully utilize when there's down time, which both have hardware SHA acceleration if I am correct.

Thanks!

UspesenRudar

newbie

Activity: 33

Merit: 0

-t defines number of cores
--cpu-priority set process priority (default: 0 idle, 2 normal to 5 highest)
--cpu-affinity set process affinity to cpu core(s), mask 0x3 for cores 0 and 1

with this parameters can you define how much is CPU working.

sir4o

newbie

Activity: 67

Merit: 0

Hiya,
Is it possible to trottle with this miner, let's say to use cores at 50%?

joblo

legendary

Activity: 1470

Merit: 1114

cpuminer-opt v3.6.9 released.

Added phi1612 algo for LUX coin
Added x13sm3 algo, alias hsr, for Hshare coin

git: https://github.com/JayDDee/cpuminer-opt

source tarball: https://drive.google.com/file/d/0B0lVSGQYLJIZdy1fQ2o5T1otXzQ/view?usp=sharing

Windows binaries: https://drive.google.com/file/d/0B0lVSGQYLJIZVWtXa1RZcE0wWEE/view?usp=sharing

AlexGR

legendary

Activity: 1708

Merit: 1049

Quote from: joblo on October 05, 2017, 05:12:15 PM

Quote from: AlexGR on October 05, 2017, 04:10:54 PM

Quote from: joblo on September 28, 2017, 06:50:45 PM

Quote from: NameTaken on September 28, 2017, 06:18:31 PM

Anyone planning on testing the 7980XE?

The specs don't impress me and it's overpriced. It has a low base clock and a relatively small cache, both
critical for CPU mining. Intel is known for better single threaded performance but that doesn't matter when
mining.

The 24 MB cache limits the number of threads mining cryptonight to 12, not even enough to load all the physical cores.
A Threadripper 1920X (12C/24T, 32MB cache) will likely perform better for less than half the price.

Compute intensive algos are irelevant because GPUs are much more efficient and CPUs can't comptete.

The 7980XE doesn't yet have SHA support, unlike Ryzen, but that's less of an issue because there are few algos
that can use it.

It does have AVX512 but I don't see much benefit in that because it only improves compute performance. On those
algos that could potentially use it the gain would be small, less than the gain from AVX to AVX2. There are fewer opportunities
to promote AVX2 to AVX512 because AVX512 works on larger vectors, only algos that use vectors of 512 bits or greater
can use it.

I'm curious for some real results to compare, but I won't be bying one.

Keep in mind that it extends the registers to 32 (xmm16-xmm31 / ymm16-ymm31 / zmm16-zmm31). If register pressure is an issue, it can help. It also offers masking with K registers which might be useful in some cases.

One problem though is that avx512 gets underclocked... in a xeon system which worked around 2 - 2.1 ghz normal, and typical code execution was boosted at 2.6ghz, avx512 was running at ~1.8ghz.

Google cloud has some servers with avx512 which you can play on without buying avx512 CPUs, but they kind of suck at benchmarking due to being VMs with unstable performance (resource sharing).

Thanks for sharing your thoughts.

I have not seen any register issues with the existing vectored code.

Having more registers at your disposal is always nice - it allows for new possibilities in how you write the code - especially if there are a lot of variables or tables.

Quote

The x, y & z regs are also overlaid in the 7980XE but only the lower 256 or 128
bits can be accessed by ymm or xmm respectively. This creates a lot of overhead when an app needs to revert to smaller vectors for some operations.

I think the problem is from xmm->ymm due to having 1x128 or 2x128 lanes which create a dependency issue, requiring the zeroing of the upper ymm part to use the xmm without a perf penalty. There's a lot of avx code that sucks without vzeroupper for this reason. IIRC ymm->zmm don't have the same issue, even if they overlap, but I may be wrong on this.

But having +16 more registers is good for such scenarios also... in case you want to reuse a register which was previously overlapped, you just use a new one thus avoiding false dependencies altogether (assuming at least a vzeroupper or a vzeroall at the start of the function).

Quote

AVX & AVX2 are also underclocked, AVX512 is underclocked more.

Something like that...

Quote

The K registers seem interesting. I don't fully understand them but they appear to be able to reduce the number of instructions when shuffling vector elements.

There's a lot of things they can be used for. Essentially they perform partial operations on the full width of a register, but this can be pretty useful. You can avoid doing some stuff twice and blending the two different stuff, (as you can do it in one go), you can read memory up to X bytes by using the appropriate mask, etc.

If you are using, say, a 512bit vector on 64bit elements, and want to perform something on 384 bits (6x64) you just put a 0b00111111 on the k register and then use the k register alongside with the instruction. Or you can do stuff like 0b01010101, thus working on first, third, fifth, seventh element and leaving the rest unchanged (or have it overwritten with zeroes on one go - depending the z flag setting, which is also new).

Now that I'm thinking about this, and this is relevant to what you said earlier on xmm/ymm/zmm overlap and performance issues, one can use just one type of register (like zmm or ymm) for all types of operations, whether small or large, assuming they also use the appropriate k register to set the width they want. In this way false dependencies should be nullified even between xmm/ymm. You want to do 128bit op? You use a ymm register with a 0b00001111 (32bit elements) or 0b0011 (64bit elements) k-mask and the ymm is addressed as ymm on the 128bit lower part. Opcode will probably be somewhat larger though.

K-regs are not too hard in their use, but I dislike the fact that they can't get fed with immediate values like general purpose registers and that I have to go immediate=>gpr=>k register or load values from memory.

The only thing that took me a while to find out (I thought I was hitting a gcc bug) is how to properly write the instruction with the proper syntax, for gcc assembly-within-c...

For example if I want to move 320 bits from memory to a zmm register it goes like this:

"mov $0b1111111111, %%eax\n" (10 x 1 bit = 10 x 32 bit elements)
"kmovd %%eax, %%k1\n"
"vmovdqu32 0(%0), %%zmm0 %{%%k1%}%{z%}\n"

The z flag is there to zero out the rest of the bits (if there was anything on zmm0).

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: AlexGR on October 05, 2017, 04:10:54 PM

Quote from: joblo on September 28, 2017, 06:50:45 PM

Quote from: NameTaken on September 28, 2017, 06:18:31 PM

Anyone planning on testing the 7980XE?

The specs don't impress me and it's overpriced. It has a low base clock and a relatively small cache, both
critical for CPU mining. Intel is known for better single threaded performance but that doesn't matter when
mining.

The 24 MB cache limits the number of threads mining cryptonight to 12, not even enough to load all the physical cores.
A Threadripper 1920X (12C/24T, 32MB cache) will likely perform better for less than half the price.

Compute intensive algos are irelevant because GPUs are much more efficient and CPUs can't comptete.

The 7980XE doesn't yet have SHA support, unlike Ryzen, but that's less of an issue because there are few algos
that can use it.

It does have AVX512 but I don't see much benefit in that because it only improves compute performance. On those
algos that could potentially use it the gain would be small, less than the gain from AVX to AVX2. There are fewer opportunities
to promote AVX2 to AVX512 because AVX512 works on larger vectors, only algos that use vectors of 512 bits or greater
can use it.

I'm curious for some real results to compare, but I won't be bying one.

Keep in mind that it extends the registers to 32 (xmm16-xmm31 / ymm16-ymm31 / zmm16-zmm31). If register pressure is an issue, it can help. It also offers masking with K registers which might be useful in some cases.

One problem though is that avx512 gets underclocked... in a xeon system which worked around 2 - 2.1 ghz normal, and typical code execution was boosted at 2.6ghz, avx512 was running at ~1.8ghz.

Google cloud has some servers with avx512 which you can play on without buying avx512 CPUs, but they kind of suck at benchmarking due to being VMs with unstable performance (resource sharing).

Thanks for sharing your thoughts.

I have not seen any register issues with the existing vectored code. The x, y & z regs are also overlaid in the 7980XE but only the lower 256 or 128
bits can be accessed by ymm or xmm respectively. This creates a lot of overhead when an app needs to revert to smaller vectors for some operations.
Like I said there are fewer opportunities with larger vector operations.

AVX & AVX2 are also underclocked, AVX512 is underclocked more.

The K registers seem interesting. I don't fully understand them but they appear to be able to reduce the number of instructions when shuffling vector elements.

AlexGR

legendary

Activity: 1708

Merit: 1049

Quote from: joblo on September 28, 2017, 06:50:45 PM

Quote from: NameTaken on September 28, 2017, 06:18:31 PM

Anyone planning on testing the 7980XE?

The specs don't impress me and it's overpriced. It has a low base clock and a relatively small cache, both
critical for CPU mining. Intel is known for better single threaded performance but that doesn't matter when
mining.

The 24 MB cache limits the number of threads mining cryptonight to 12, not even enough to load all the physical cores.
A Threadripper 1920X (12C/24T, 32MB cache) will likely perform better for less than half the price.

Compute intensive algos are irelevant because GPUs are much more efficient and CPUs can't comptete.

The 7980XE doesn't yet have SHA support, unlike Ryzen, but that's less of an issue because there are few algos
that can use it.

It does have AVX512 but I don't see much benefit in that because it only improves compute performance. On those
algos that could potentially use it the gain would be small, less than the gain from AVX to AVX2. There are fewer opportunities
to promote AVX2 to AVX512 because AVX512 works on larger vectors, only algos that use vectors of 512 bits or greater
can use it.

I'm curious for some real results to compare, but I won't be bying one.

Keep in mind that it extends the registers to 32 (xmm16-xmm31 / ymm16-ymm31 / zmm16-zmm31). If register pressure is an issue, it can help. It also offers masking with K registers which might be useful in some cases.

One problem though is that avx512 gets underclocked... in a xeon system which worked around 2 - 2.1 ghz normal, and typical code execution was boosted at 2.6ghz, avx512 was running at ~1.8ghz.

Google cloud has some servers with avx512 which you can play on without buying avx512 CPUs, but they kind of suck at benchmarking due to being VMs with unstable performance (resource sharing).

Digital Mutant

full member

Activity: 420

Merit: 105

Quote from: joblo on October 03, 2017, 09:40:27 PM

Quote from: Digital Mutant on October 03, 2017, 05:12:58 PM

Quote from: joblo on January 13, 2016, 02:35:45 PM

This is the home of cpuminer-opt, The optimized CPU miner.

v3.6.8

Legacy version 3.5.9.1

someone can teel me what of the 2 versions is best for my CPU
AMD A10 quad-core processor A10-7300 turbo core 3.20GHz???
Thank you

You conveniently left out the part that explains the legacy version.

Anyway it's virtually useless because the affected algorithms mostly have ASIC miners now or very
efficient GPU miners.

The only people who should consider using the legacy version are those who know what the're doing.
If you have to ask, don't use it.

Ok thank you for the information!

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Digital Mutant on October 03, 2017, 05:12:58 PM

Quote from: joblo on January 13, 2016, 02:35:45 PM

This is the home of cpuminer-opt, The optimized CPU miner.

v3.6.8

Legacy version 3.5.9.1

someone can teel me what of the 2 versions is best for my CPU
AMD A10 quad-core processor A10-7300 turbo core 3.20GHz???
Thank you

You conveniently left out the part that explains the legacy version.

Anyway it's virtually useless because the affected algorithms mostly have ASIC miners now or very
efficient GPU miners.

The only people who should consider using the legacy version are those who know what the're doing.
If you have to ask, don't use it.

Digital Mutant

full member

Activity: 420

Merit: 105

Quote from: joblo on January 13, 2016, 02:35:45 PM

This is the home of cpuminer-opt, The optimized CPU miner.

v3.6.8

Legacy version 3.5.9.1

someone can teel me what of the 2 versions is best for my CPU
AMD A10 quad-core processor A10-7300 turbo core 3.20GHz???
Thank you

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: guytechie on September 28, 2017, 10:29:55 PM

Is it just me or is Windows Defender detecting cpuminer-opt as a Trojan:Win32/Vagger!rfn malware?

You can't read 3 posts back?

guytechie

hero member

Activity: 677

Merit: 500

Is it just me or is Windows Defender detecting cpuminer-opt as a Trojan:Win32/Vagger!rfn malware?

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 62. (Read 444122 times)