Pages:
Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner (Read 444067 times)

legendary
Activity: 1470
Merit: 1114
All cpuminer-opt discussion should now take place in the thread

https://bitcointalksearch.org/topic/m.53865575

This thread will be locked.
full member
Activity: 1424
Merit: 225
What an asshole. This multiple personality thing could be fun. Smiley

I'm going to begin posting under this user to increae my post count and remove the newbie restrictions.
One of those restrictions seems to be I can't have a sig.

I will also add a note to README.md at my github repository to confirm my identity. It will be in the next release.
legendary
Activity: 1470
Merit: 1114
Look who just joined BCT.

Yes, it's really me.
full member
Activity: 1424
Merit: 225
Guess who?

With some scammers trying to peddle malware disguised as cpuminer-opt I decided to take some action
to protect my identity and created a new BCT user more closely associated with my real identity.

This user is still a newbie and should be treated as one. Don't trust me, yet.

Over time I will move the cpuminer-opt discussion to a new thread started by me.
legendary
Activity: 1470
Merit: 1114
FYI:

scryptn2 == scrypt:1048576
legendary
Activity: 1470
Merit: 1114
@joblo

skein, skein2, and skunk fails for 3.11.8+ for avx2, and avx sha, however works fine for avx512.

I just released cpuminer-opt-3.12.2 which fixes xevan skein & skein2 but I overlooked skunk. It will be
in the next release. It also uses skein as the first funtion in the chain so it's the same issue.
legendary
Activity: 1470
Merit: 1114
@joblo

skein, skein2, and skunk fails for 3.11.8+ for avx2, and avx sha, however works fine for avx512.  i've tested up to the most recent release with version 3.12.1.  
only tested avx512 on my i7-7820X.
tested avx2 on 7820X, 9900K, and 8700K.
tested avx sha on 3600, and 3900X

Looks very similar to https://github.com/JayDDee/cpuminer-opt/issues/238, which I'm working on as I write.
I'll make sure to test Skein & skein2 as well. I've been doing a lot of testing lately, There was more damage from
the restructuring in the past couple of months than I realized.

I don't test avx-sha because you miss out on 4 way parallel hashing on many algos, including skein.
Going from 4 way AVX2 to 1 way even slows down Ryzen.

Most Lyra2 based algos are ok with AVX because they can still do 4 way (and 16 way AVX512),
but most other algos require AVX2.

You should try both AVX & AVX2 to find the best for each algo.

As far as SHA goes there is a crossover between HW sha256 and parallel SW sha256. With the current situation
HW SHA is used up to and including AVX2, and parallel SW AVX512. When AMD improves AVX2 that may change.

Edit: Yup. Same problem, same fix.
sr. member
Activity: 703
Merit: 272
@joblo

skein, skein2, and skunk fails for 3.11.8+ for avx2, and avx sha, however works fine for avx512.  i've tested up to the most recent release with version 3.12.1.  
only tested avx512 on my i7-7820X.
tested avx2 on 7820X, 9900K, and 8700K.
tested avx sha on 3600, and 3900X
legendary
Activity: 1470
Merit: 1114
Notice to API users

This is a heads up of my plan to disable the API by default in an upcoming release.

If you currently use the API with default configuration you should add
-b  127.0.0.1:4048
to the command line or configuration script.

If you already manually enable the API with -b or --api-listen no changes are required.

Another notice will be posted prior to the release that implements the changed default.

Update Feb2

Change will be in the next release. More details: https://github.com/JayDDee/cpuminer-opt/issues/234
legendary
Activity: 1470
Merit: 1114
Another note, this time about share counting.

The share counts are calculated mostly independent of one another,
particularly the submit count and the acknowledged results.

Under most circumstances the numbers should always add up
(don't add blocks sloved, they are also counted as accepted).
The miner can even handle multiple shares submitted before
receiving replies. It can stack up to 8 pending submits.

There is no direct linkage between the submitted shares and the
replies. they are simply matched up in the same order as the the
shares were submitted and the replies received. A lost reply will
result in an unresolved discrepency that can only be corrected by
restarting the miner.

If there is a brief pile up of pending shares the miner will try to
re-synch when the replies are recieved. Some share statistics may
be lost. In such cases the miner may recover but long term statistics
may be affected. It may be desireable to restart the miner.
legendary
Activity: 1470
Merit: 1114

I added a new line to the block log reporting another TTF and hash rate. The hash rate is
a representation of the network difficulty converted to hashes and the TTF is just
counting the new blocks over time.

The usefullness of this new information is yet to be determined. It is
not displayed for a multipool where block numbers differ among different coins.

The other changes should be intuitive. They are intended to make the logs more compact
to reduce the chance of line wrap and to highlight the important fields in the logs.

I found an error in the network hashrate calculation. The actual hashrate is the displayed
rate divided by the block time. Will be fixed in next release.
legendary
Activity: 1470
Merit: 1114
A note about low difficulty shares.

I have recently noticed some instances of low difficulty shares being rejected.
It is intermittant, has been  seen on 2 algos, each with a specific stratum difficulty.
If the stratum difficulty is changed low difficulty shares are no longer submitted.

Higher share difficuty is good, high enough in solo mode and a block is solved.
In pool mode only the number of shares with a difficulty higher than the target
matter, all valid shares regardless of difficulty are of equal value.

Low difuculty rejects can occur when the target is set incorrectly. This can be a pool or
miner configuration error. They are usually intermittant as some shares will naturally
be higher than the actual target.

Low difficulty shares do not represent lost performance. These shares would normally be
discarded instead if submitted.

The opposite problem can also occur if the targetting is not exact. Valid shares could be
discarded. This is lost performance. This is also a silent error, the only indication would be
a low hash rate reported at the pool.

I have been tweaking the hash test to ensure targetting is precise. This may be why some
low difficulty shares are being seen. It could be a problem calculating the target for certain
stratum difficulties.

Given the pros and cons I'm willing to tolerate some low difficulty shares if it guarantees
no good shares are being tossed.

Stale shares are also explainable and unavoidable. Shares are stale when they are submitted
for a job that has expired. They are more likely to occur with higher network latency.
Finding a low latenc connection is good.

Stale share can be confirmed withthe log messages. The stale share is always preceded by a new
job issued after the share was submitted. The job id of the stale share does not match the new job id.

Invalid shares are always bad. If they are not user error, ie wrong algo or port, It's a software bug.

 
legendary
Activity: 1470
Merit: 1114
A couple of notes about v3.11.6.

CPU temperature on Linux is still a work in progress. I have 4 CPUs and each
has a different path to the CPU temperature. I don't know if the difference is
because of the CPU or the OS version.

I finally have all my CPUs' temperatures being reported correctly but I have no
idea of any others.

It would be appreciated if any Linux users could post whether they see the correct
temperature. Please include CPU and OS version.

Those ambitious enough may explore what works for their system by browsing the /sys
file system. Some sampe file path are listed in sysinfos.c. Some ppaths don't exist on some
systems or may report a bogus value. Note the values are multiplied by 1000.

I added a new line to the block log reporting another TTF and hash rate. The hash rate is
a representation of the network difficulty converted to hashes and the TTF is just
counting the new blocks over time.

The usefullness of this new information is yet to be determined. It is
not displayed for a multipool where block numbers differ among different coins.

The other changes should be intuitive. They are intended to make the logs more compact
to reduce the chance of line wrap and to highlight the important fields in the logs.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt-3.11.1 is released.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.11.1

Faster panama for x25x AVX2 & AVX512.

Fixed echo VAES for Xevan.

Removed support for scryptjane algo.

Reverted macro implemtations of hash functions to SPH reference code
for SSE2 versions of algos.

Use older release for scryptjane of SSE2 mcro support.
legendary
Activity: 1470
Merit: 1114
Yet another asshole is posing as me and posting malware as cpuminer-opt. As always be careful.
legendary
Activity: 1470
Merit: 1114
is m7m supported with AVX512 now?
i didn't see it and i haven't noticed any speed increase based on the prior versions.... haven't tested v3.11.0 yet ... working on it.

Unfortunately AVX512 only improves algos that have already been taken over by GPUS and ASICS
and they are improvimng faster than CPUs can. That's because GPUs are real vector processors
while CPU SIMD just emulates vector processing with strict restrictions on data organization.
A GPU can run thousands of threads while the biggests CPUs with AVX512 can barely crack 100.

The secret is in the algorithm, those can can be vectorized can be vectoized better on a GPU.
The only way to speed up M7M is more CPU cores and faster clocks.

VAES has some potential as a few CPU algos use can use it. But VAES will only help with linear
vectorizing (loop unrolling) rather than enabling parallel operation.
sr. member
Activity: 703
Merit: 272
cpuminer-opt-3.11.0 introduces full support for Intel's Icelake CPUs.

Iclelake architecture includes AVX512, SHA, and VAES. AVX512 and SHA are already supported on
Intel Skylake-X and AMD Ryzen, respectively. VAES is new with Icelake and is an extension of
AES_NI and AVX512 that provides 4 way parallel AES encryption and decryption in a 512 bit vector.

Icelake is only available for mobile at this time, desktop availability is unknown.

VAES support is only available as source code and requires GCC 8.

See the OP for more details about v3.11.0

This release marks the end of the rapid development of the past several weeks. Things
will slow down considerably with mostly bug fixes and minor tweaks.

I am also planning a cleanup to remove some troublesome and useless code, namely the macros
for blake, bmw, etc used by algos like x11, as well as scrypt-jane algo. The macros don't provide
any noticeable performance difference from the refernce code and srypt-jane hasn't been used
for several years. There are other dead algos but they don't cause problems so there is no need to
remove them. This will also reduce the bloat. If anyone has concerns wwith this plan, please speak up.

is m7m supported with AVX512 now?
i didn't see it and i haven't noticed any speed increase based on the prior versions.... haven't tested v3.11.0 yet ... working on it.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt-3.11.0 introduces full support for Intel's Icelake CPUs.

Iclelake architecture includes AVX512, SHA, and VAES. AVX512 and SHA are already supported on
Intel Skylake-X and AMD Ryzen, respectively. VAES is new with Icelake and is an extension of
AES_NI and AVX512 that provides 4 way parallel AES encryption and decryption in a 512 bit vector.

Icelake is only available for mobile at this time, desktop availability is unknown.

VAES support is only available as source code and requires GCC 8.

See the OP for more details about v3.11.0

This release marks the end of the rapid development of the past several weeks. Things
will slow down considerably with mostly bug fixes and minor tweaks.

I am also planning a cleanup to remove some troublesome and useless code, namely the macros
for blake, bmw, etc used by algos like x11, as well as scrypt-jane algo. The macros don't provide
any noticeable performance difference from the refernce code and srypt-jane hasn't been used
for several years. There are other dead algos but they don't cause problems so there is no need to
remove them. This will also reduce the bloat. If anyone has concerns wwith this plan, please speak up.
legendary
Activity: 1470
Merit: 1114
@joblo
Here's my results for AVX512 vs AVX2 on version 3.10.5    i'm running windows pro 10 x64  8gigs ram

Thanks, Those results are in line with mine. The 100% AVX512 algos are pretty close to double the
hash rate so that indicates no significant scaling issues with AVX 512 unless memory accesses are
bottlenecked.

The long X chains are showing the effects of diminishing returns. Further optimization of previously
optimized code has less effect as it represents a diminishing proportion of the complete algo.
sr. member
Activity: 703
Merit: 272
which version of AVX2 would you like to see?.. i think i have twenty of your previous versions benched up to version 3.10.2 for avx2 on this cpu

Just use the latest release compiled for avx2. That will provide the most direct comparison. If you have Windows
it's already compiled for you. With Linux just compile with "-march=skylake" instead of "-march=native".

You can confirm that the SW features only list AVX2 but the CPU still lists AVX512.

@joblo
Here's my results for AVX512 vs AVX2 on version 3.10.5    i'm running windows pro 10 x64  8gigs ram

Pages:
Jump to: