Pages:
Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 7. (Read 444096 times)

sr. member
Activity: 490
Merit: 256

A few points. and questions:

What's your CPU and OS?

[...]

Do you use CPU groups? Which version of Windows? You said you affine the process seperately and that
causes problems. That could be related to CPU groups if the process is in a different group from the miner
threads.


Tested on Intel(R) Xeon CPU E5-266O v4 @ 2.OOGHz CPUs, on Windows Servr 2016.
I do not use CPU groups (not sure what those are: will look into that) and I tested with 20 CPUs. [EDIT: I checked and CPU groups are applicable to machines with more than 64 CPUs so I only have one group here]
I open a command prompt and prior to launching cpuminer-opt I set the command prompt's affinity to the one desired (set to 20 CPUs). The miner then runs with the desired processors already affined. v3.9.1.1 now complains that it can't affine to all CPUs (obviously, because I removed some from the parent process). I'm supposing that's just a benign warning and nothing will really change. You probably made the miner explicitely affine to all CPUs by default on startup hence the warnings.


Yescrypt performance:
There were no changes to the yescrypt code. I added yespower and tinkered with using that code for
yescrypt without success so I left yescrypt as is. If you are aware of a better performing miner
please point me to it and I'll have a look.

Argh, I'm sorry. I meant yespowerR16 instead of yescryptR16 in my last item! And I was referring to bellflower2015's fork of your miner (you can easily find it on github).

Thanks!
legendary
Activity: 1470
Merit: 1114

* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.

If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing
If it's not the case, it might be something in the recent changes

In both cases, I'm using the official Windows binaries.


A few points. and questions:

What's your CPU and OS?

Summary: Changes were made to support Windows CPU groups
                 There have been no changes to yescrypt code.
                 There have been no recent changes to Windows buld process

Edit: please use -D to display affinity debug info.

Long version:

CPU limit and affinity:
A change was made initially in 3.9.0, later tweaked to add CPU groups support to Windows.
This may be responsible for that issue. I can have a look at the code in light of the specific symtoms you saw.
Do you use CPU groups? Which version of Windows? You said you affine the process seperately and that
causes problems. That could be related to CPU groups if the process is in a different group from the miner
threads.

General performance degredation:
The binaries are still made the same way using mingw,specifically using the winbuild-cross.sh script.
The compiler was upgraded (evident in the startup messages showing the compiler version) prior
to 3.9.
However, I have been making some architectural changes that may have a small impact on performance,
though 5% seems a bit much. I'm making them due to issues in preparation for AVX512 where
up to 16 lanes can run parallel in a single CPU tread. The overhead for interleaving and deinterleaving the data,
the increase in memory usage, etc, don't scale well.

Some of those changes affect the locally displayed hashrate, both in volume of thread hash reports and
their values. I have reduced the latency between detecting a solution and submitting it to the pool. As
a side effect there are fewer hash meter reports and the reported hashrate is actually from the previous
block. Another side effect is the reduced latency is not reflected in the hash rate reported by the miner.
I considered it an acceptible compromise as it's just optics. The acumulated share difficulty over time is
what the pool uses. In both the miner and the pool the hashrate is an artificial metric.
The changes result in less deinterleaving of final hash (check for solution before interleaving instead of
after), and submitting a share immediately when found while continuing the scan instead of aborting the scan
to submit the share an start a new scan. On their face it is obviously more efficient but I measured no
discernable difference in reprted hash rate.
These changes are being migrated slowly and can be confirmed by a more detailed share submitted message
indicating which thread and lane found the solution.

Sorry for the ramble but there's a lot going on at the same time. I appreciate the testing and reports of any deviations
from previous versions, especially the unintended ones.

sr. member
Activity: 490
Merit: 256

* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.

If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing
If it's not the case, it might be something in the recent changes

In both cases, I'm using the official Windows binaries.
member
Activity: 473
Merit: 18

* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.

If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing
If it's not the case, it might be something in the recent changes
sr. member
Activity: 490
Merit: 256
Hi,. joblo!
Nice to see you back!

I noticed a few things in this v3.9.1.1 release:

* --cpu-affinity truncates to a 32-bit value which means one can't use CPUs at or above 32 unless I don't specify affinity at all (which for most algos is worse). I think this has been addressed in the past (regression?)
* Miner now reports failure to affine to CPU x, y, z on startup if the startup processed is not affined to them: I usually affine the command prompt process before launching the miner (used to be a fix to the previous problem and it's also more flexible to me). I hope this does not affect anything performance related.
* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.
* For yescryptR16yespowerR16, I get 2200H/s, quite a bit below the 3000H/s I get with bellflower2015's variant. I suppose this is because you just introduced this algo and still didn't have the chance to tweak it.

Cheers!
legendary
Activity: 1470
Merit: 1114
cpuminer-opt-3.9.1.1 is released

Fixed lyra2 regression affecting non-AVX2.

Compiling on Windows using Cygwin now works.

Simply use "./build.sh" from a cygwin shell.

I have no list of likely packages that need installing on top of the base Cygwin
installation. You'll have to wing it for now.

It isn't portable therefore the Windows binaries package continues to use
the existing procedure.

As always please report any problems.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt-3.9.1 is released

https://github.com/JayDDee/cpuminer-opt/releases

Fixed AVX2 version of anime algo.

Added sonoa algo.

Added "-DRYZEN_" compile option for Ryzen to override 4-way hashing when algo
contains sha256 and use SHA instead. This is due to a combination of
the introduction of HW SHA support combined with the poor performance
of AVX2 on Ryzen. The Windows binaries package replaces cpuminer-avx2-sha
with cpuminer-zen compiled with the override. Refer to the build instructions
for more information.

Ongoing restructuring to streamline the process, reduce latency,
reduce memory usage and unnecessary copying of data. Most of these
will not result in a notoceably higher reported hashrate as the
change simply reduces the time wasted that wasn't factored into the
hash rate reported by the miner. In short, less dead time resulting in
a higher net hashrate.

One of these measures to reduce latency also results in an enhanced
share submission message including the share number*, the CPU thread,
and the vector lane that found the solution. The time difference between
the share submission and acceptance (or rejection) response indicates
network ltatency. One other effect of this change is a reduction in hash
meter messages because the scan function no longer exits when a share is
found. Scan cycles will go longer and submit multiple shares per cycle.
*the share number is antcipated and includes both accepted and rejected
shares. Because the share is antipated and not synchronized it may be
incorrect in time of very rapid share submission. Under most conditions
it should be easy to match the submission with the corresponding response.

Removed "-DUSE_SPH_SHA" option, all users should have a recent version of
openssl installed: v1.0.2 (Ubuntu 16.04) or better. Ryzen SHA requires
v1.1.0 or better. Ryzen SHA is not used when hashing multi-way parallel.
Ryzen SHA is available in the Windows binaries release package.

Improved compile instructions, now in seperate files: INSTALL_LINUX and
INSTALL_WINDOWS. The Windows instructions are used to build the binaries
release package. It's built on a Linux system either running as a virtual
machine or a seperate computer. At this time there is no known way to
build natively on a Windows system.
legendary
Activity: 1470
Merit: 1114
Attention Ryzen users.

It is well known that Ryzen has a HW implementation of SHA and also well known that
Ryzen also added AVX2 capabilities. Unfortunately Ryzen's AVX2 performance is poor.

The combination of these 2 points makes for some unusual effects on some algorithms
depending on how much sha256 they use and how much AVX2 they use.

An extreme example is the sha256t algo, which is pure sha256 and also supports 8-way AVX2
and 4-way SSE.

The hw SHA implementation can't do parallel so the 8-way and 4-way code uses sw sha.

On Intel CPUs the performance is very predictable, 8-way AVX2 is fastest, 4-way SSE2 is next
and 1 way is slowest.

On Ryzen it's the reverse. the single stream using HW SHA is fastest. A 16 thread Ryzen 1700
using HW SHA outperforms an 8 thread i7-6700K 8 way AVX2 by 50%. The 4 way SSE2 code is
just as fast as, and maybe a little faster than, 8-way AVX2 on Ryzen. And the AVX2 performance
is downright pitifull in most cases. Th eonly case where AVX2 may perform better is in
4-way AVX2 where there is no SSE2 equivalent.

As previously mentioned the impact depends on the mix of SHA and AVX2 in the algo
as well as whether SSE2 parallel hashing is available.

I will investigate further and provide recommendations for Ryzen users.

The solution may extend beyond compiling and may require some code changes to ensure
Ryzen prefers SHA over n-way when the algo contains a significant amout of sha256.

It likely won't be the upcoming release.

Edit: Here's a list of algos that use sha256

sha256t: as described above.
lbry: significantly affected but less than sha256t
skein: similar to lbry.
m7m: no 4-way, not a problem.
yescrypt and yespower: no 4 way, not a problem.

legendary
Activity: 1470
Merit: 1114
Here's a tease. It's the only visible part, much more is going on behind the scene.
I'm trying to streamline the process, reduce overhead (especially interleaving for
4 way) and new innovative (imo) ideas for increasing performance. For now it's still
in the napkin stage but it's starting to take shape. It means increasing the parallelization
beyond the size of the largest vector. I have no idea if it will incease performance or
by how much. It may actually be a flop but I think the idea has merit. It's a bit of a twist
on another idea pioneered by a long time miner developper with an explosive name.
That's all for now. I have a bug fix someone is waiting for I'm almost ready to think
about a new release, still a few days away.

Code:
[2019-05-29 00:17:17] Share 8 submitted by thread 12, lane 1.
[2019-05-29 00:17:17] Accepted 8/8 (100%), diff 0.0113, 2659.60 kH/s, 70C
[2019-05-29 00:17:29] Share 9 submitted by thread 2, lane 1.
[2019-05-29 00:17:29] Accepted 9/9 (100%), diff 0.0187, 2659.60 kH/s, 70C
[2019-05-29 00:17:35] Share 10 submitted by thread 11, lane 2.
[2019-05-29 00:17:35] Accepted 10/10 (100%), diff 0.00811, 2659.60 kH/s, 70C
[2019-05-29 00:17:52] Share 11 submitted by thread 8, lane 1.
[2019-05-29 00:17:52] Accepted 11/11 (100%), diff 0.0127, 2659.02 kH/s, 71C
legendary
Activity: 1470
Merit: 1114
Hi Dev...is it possible ad  algo Lyra2CZ  new algo for mining BitcoinCZ

At the moment no miner only by wallet...listed on sistemkoin exchange

https://bitcointalksearch.org/topic/annbczbitcoin-cz-bcz-5140548

https://github.com/BitcoinCZ

Thanks for your good work and support


It looks like lyra2Z. Have you tried it?
member
Activity: 129
Merit: 10
Hi Dev...is it possible ad  algo Lyra2CZ  new algo for mining BitcoinCZ

At the moment no miner only by wallet...listed on sistemkoin exchange

https://bitcointalksearch.org/topic/annbczbitcoin-cz-bcz-5140548

https://github.com/BitcoinCZ

Thanks for your good work and support
legendary
Activity: 1470
Merit: 1114
Tpruvot has the first version of the algo, but they released a tweaked one (RFv2).
There is also a pull request with RFv2, but it has the same issue.

Anyway, I get your point about not being interested ))

It was a new algo and it's changed already, yet another reason why I don't like it.

This seems to be a trend: vertcoin, zcoin, cryptonight, ...

It appears to be an anti ASIC strategy, with SW miners able to adapt quicky without
requiring new HW.

It's not that big of a deal for a single coin but daunting for a multialgo miner to keep up.
That the race I've withdrawn from.
member
Activity: 473
Merit: 18
Can you add Ranfonrest2?

https://github.com/MicroBitcoinOrg/Cpuminer

From my experience, the reference miner reports significantly higher speed than actual on pool side (Seems like x256)

TPruvot has it, does it work better? I've already looked at the code.

My first glance shows it's a completely new algo and can't benefit from any of the canned
optimizations. To optimize it requires a detailed analysis of the code to look for opportunities to
vectorize either serially, parallelly, or not at all. I expect the scalar code to be near optimum already.
It's a huge task to do the whole algo at once. Not really interested at this time.

Hashrate displayed by the miner, both thread and share, are artificially
calculated based on the number of iterations over time. The pool calculates based on the number and
difficulty of submitted valid shares. Perhaps there's a math error in the miners calculations.



Tpruvot has the first version of the algo, but they released a tweaked one (RFv2).
There is also a pull request with RFv2, but it has the same issue.

Anyway, I get your point about not being interested ))
legendary
Activity: 1470
Merit: 1114
Can you add Ranfonrest2?

https://github.com/MicroBitcoinOrg/Cpuminer

From my experience, the reference miner reports significantly higher speed than actual on pool side (Seems like x256)

TPruvot has it, does it work better? I've already looked at the code.

My first glance shows it's a completely new algo and can't benefit from any of the canned
optimizations. To optimize it requires a detailed analysis of the code to look for opportunities to
vectorize either serially, parallelly, or not at all. I expect the scalar code to be near optimum already.
It's a huge task to do the whole algo at once. Not really interested at this time.

Hashrate displayed by the miner, both thread and share, are artificially
calculated based on the number of iterations over time. The pool calculates based on the number and
difficulty of submitted valid shares. Perhaps there's a math error in the miners calculations.

member
Activity: 473
Merit: 18
Can you add Ranfonrest2?

https://github.com/MicroBitcoinOrg/Cpuminer

From my experience, the reference miner reports significantly higher speed than actual on pool side (Seems like x256)
member
Activity: 302
Merit: 26
cpuminer-opt-3.9.0.1 is released.

Fixed a problem where cpuminer could hang at startup on Windows 7 and server 2008.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.9.0.1

checked - works well.
p.s. checked Yespowerr16 - mining Yenten Coin - OK Wink
legendary
Activity: 1470
Merit: 1114
cpuminer-opt-3.9.0.1 is released.

Fixed a problem where cpuminer could hang at startup on Windows 7 and server 2008.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.9.0.1
legendary
Activity: 1470
Merit: 1114

hello, nice to have you back. We are very happy.

with the old version of the program I was very happy.
In the new version 3.9.0 with PHI2 alog.


I noticed that on one virtual Windows 2008 server it takes about a minute to start working.
on the same hardware, and the virtual Windows 2012 server starts immediately.
It waits on:

         **********  cpuminer-opt 3.9.0  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX2 and SHA extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT


Just for info.
Have a nice day


Thanks for posting. It appears to be a problem with Win7 & Server2008.
Win8+ & Server2012+ should bre ok. I'm working on a fix.
newbie
Activity: 33
Merit: 0

hello, nice to have you back. We are very happy.

with the old version of the program I was very happy.
In the new version 3.9.0 with PHI2 alog.


I noticed that on one virtual Windows 2008 server it takes about a minute to start working.
on the same hardware, and the virtual Windows 2012 server starts immediately.
It waits on:

         **********  cpuminer-opt 3.9.0  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX2 and SHA extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT


Just for info.
Have a nice day


member
Activity: 302
Merit: 26
cpuminer-opt v3.9.0 is released.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.9.0

Added support for Windows CPU groups. (not available in precompiled binaries)
Fixed BIP34 coinbase height.
Prep work for AVX512.
Added lyra2rev3 for the vertcoin algo change.
Added yespower, yespowerr16 (Yenten)
Added phi2 algo for LUX


Thanks for adding the algorithm for the updated Yenten Coin. [Fork from yescryptr16 to yespowerr16 - for mining only CPU]
Pages:
Jump to: