[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 30.

joblo

legendary

Activity: 1470

Merit: 1114

cpuminer-opt-3.8.1

Fixes x16r on CPUs with only SSE2.
More Optimizations for X algos, qubit & deep.
Corrected algo optimizations for scrypt and yescrypt, no new optimizations.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.8.1

I have reviewed the scrypt algos for potential optimizations and updated the
specs to show the actual optimizations currently available in the algo.
No new optimizations were added, performance has changed.

Scrypt was taken from pooler and is highly optimized with AVX2.
No further optimizations seem possible at this time.

Yescrypt uses SSE4.1 (128 bit vectors) and also SHA which is also 128 bit. This suffers
the same problem as AES algos because the HW acceleration of SHA is limited to 128 bits.

Neoscrypt may have some potential. It currently uses some SSE2 but may have the potential
for full paralization. It will be lot of work and has some instances of data dependent memory accesses
which could kill performance. No results are expected any time soon.

Scrypt-Jane seems dead, Nicehash has zero activity, can't find it on any of the major pools. No plans.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: ol92 on February 06, 2018, 03:05:47 PM

For linux, physical cores and logical ones are divided as you said, but on windows numbers pairs are for physical cores and impairs for logical even for intel.

It appears you are right. I just did a test with cryptonight on Windows. The total hash rate was the same but the per-thread
hash rates had 2 high and 2 low with default affinity, all 4 threads showed the same rate with affinity 0x55.

ol92

sr. member

Activity: 445

Merit: 255

Quote from: joblo on February 05, 2018, 05:59:11 PM

Quote from: guytechie on February 05, 2018, 05:11:07 PM

I'm baffled.

If you ignore the TR with affinity 1 it makes sense. I don't understand that one. Maybe the TR
has some kind of override that won't let you set stupid affinity. Wink

The CI only accepts decimal or hex at this time. I don't think it's worth the effort to support it
as converting from binary to hex is trivial.

Regarding the technicalities of the CCX and modular cache, it has nothing to do with the logical
CPU mapping. AMD could just have easily mapped them so that cores 0 to n/2-1 would all be on different
physical cores and n/2 to n-1 would be SMT (hyperthreaded). That's how it works on my Haswell.

You seem to confirm that AMD has messed things up, not only compared to Intel but within their own
products.

The need to use alternating cores with Ryzen seems consistent, TR should be the same. I don't know
if it carries to their older CPUs, or their servers.

For linux, physical cores and logical ones are divided as you said, but on windows numbers pairs are for physical cores and impairs for logical even for intel.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: nizzuu on February 06, 2018, 03:50:01 AM

v3.8.0.1, -a x16r, sse2 build at rvn.suprnova.cc - rejects only Undecided

Pool reports about low difficulty shares.

No issues with avx2 and aes-sse42 builds.

Too bad you didn't find it yesterday. It'll be fixed in 3.8.1.

nizzuu

full member

Activity: 187

Merit: 100

Cryptocurrency enthusiast

v3.8.0.1, -a x16r, sse2 build at rvn.suprnova.cc - rejects only Undecided

Pool reports about low difficulty shares.

No issues with avx2 and aes-sse42 builds.

4ward

member

Activity: 473

Merit: 18

Quote from: joblo on February 05, 2018, 10:21:35 PM

cpuminer-opt-3.8.0.1 fixes X16R AVX2 low hashrate.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.8.0.1

There were actually 2 bugs. The first one prevented any shares from being submitted from
lanes 1 to 3 which hid the second bug that caused only rejects from those lanes. As a result
only the valid shares were submitted and the only symptom was the low hashrate reported
by the pool. A strange coincidence.

Thanks to 4ward for noticing and reporting the problem.

Significant improvement on pools side now
Thank you for the swift fix )

nizzuu

full member

Activity: 187

Merit: 100

Cryptocurrency enthusiast

Quote from: oldschoolPT on February 02, 2018, 03:00:01 PM

hi,
possible to use in solo mining for ZOI?

Yes, getwork in cpuminer-opt does work for lyra2z330

Quote from: Enth on February 02, 2018, 09:47:42 PM

Hi. I have only ~650H/s with my 4690k (lyra2z330), what's wrong?

Hi. Make shure you do not use -t 4 setting for your CPU. Use avx2 build with "-t 2 -cpu-affinity 3" to get the maximum hashrate.

stevascha

member

Activity: 312

Merit: 10

Quote from: joblo on February 05, 2018, 10:21:35 PM

cpuminer-opt-3.8.0.1 fixes X16R AVX2 low hashrate.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.8.0.1

There were actually 2 bugs. The first one prevented any shares from being submitted from
lanes 1 to 3 which hid the second bug that caused only rejects from those lanes. As a result
only the valid shares were submitted and the only symptom was the low hashrate reported
by the pool. A strange coincidence.

Thanks to 4ward for noticing and reporting the problem.

thanks!! currently i use v3.8.0 and doing fine
ryzen 7 1700x yescryptR16 1300khs with cpu affinity 5555

joblo

legendary

Activity: 1470

Merit: 1114

cpuminer-opt-3.8.0.1 fixes X16R AVX2 low hashrate.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.8.0.1

There were actually 2 bugs. The first one prevented any shares from being submitted from
lanes 1 to 3 which hid the second bug that caused only rejects from those lanes. As a result
only the valid shares were submitted and the only symptom was the low hashrate reported
by the pool. A strange coincidence.

Thanks to 4ward for noticing and reporting the problem.

guytechie

hero member

Activity: 677

Merit: 500

Quote from: joblo on February 05, 2018, 05:59:11 PM

Quote from: guytechie on February 05, 2018, 05:11:07 PM

I'm baffled.

If you ignore the TR with affinity 1 it makes sense. I don't understand that one. Maybe the TR
has some kind of override that won't let you set stupid affinity. Wink

The CI only accepts decimal or hex at this time. I don't think it's worth the effort to support it
as converting from binary to hex is trivial.

Regarding the technicalities of the CCX and modular cache, it has nothing to do with the logical
CPU mapping. AMD could just have easily mapped them so that cores 0 to n/2-1 would all be on different
physical cores and n/2 to n-1 would be SMT (hyperthreaded). That's how it works on my Haswell.

You seem to confirm that AMD has messed things up, not only compared to Intel but within their own
products.

The need to use alternating cores with Ryzen seems consistent, TR should be the same. I don't know
if it carries to their older CPUs, or their servers.

"More research is needed." (c)

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: guytechie on February 05, 2018, 05:11:07 PM

I'm baffled.

If you ignore the TR with affinity 1 it makes sense. I don't understand that one. Maybe the TR
has some kind of override that won't let you set stupid affinity. Wink

The CI only accepts decimal or hex at this time. I don't think it's worth the effort to support it
as converting from binary to hex is trivial.

Regarding the technicalities of the CCX and modular cache, it has nothing to do with the logical
CPU mapping. AMD could just have easily mapped them so that cores 0 to n/2-1 would all be on different
physical cores and n/2 to n-1 would be SMT (hyperthreaded). That's how it works on my Haswell.

You seem to confirm that AMD has messed things up, not only compared to Intel but within their own
products.

The need to use alternating cores with Ryzen seems consistent, TR should be the same. I don't know
if it carries to their older CPUs, or their servers.

guytechie

hero member

Activity: 677

Merit: 500

Quote from: joblo on February 05, 2018, 04:13:51 PM

Your tr 1950x with cpu-affinity 1 makes no sense, the Ryzen is behaving as expected.
Affinity of 1 means that only CPU 0 will be used for all threads..
There should be as many bits set in the affinity mask as there are threads to run,
If there are fewer bits than threads you start doubling up threads on logical cores, very bad.

I don't know what to tell you. For some reason, this is happening on the TR CPU.

So for some reason, that's expected behavior on my Ryzen, but not the TR.

As for WHY I did "1", it's because I read somewhere on this thread that value means to alternate every other core starting with CPU 0. Misinformation, perhaps, but it was the only info I had to go on until now.

Quote

The only affinity that might make sense with 8 threads is either 0x5555 or 0xaaaa.
The only difference is even vs odd numbered cores. Some think 0xaaaa is better
because it leaves core 0 free. I don't know if it matters, but I digress.

Thanks. With that info, I can reverse-conclude that the hex value is just binary for 0101010101010101 (representing 16 threads).

I was wondering how that worked.

I will try this and get back to you (not in a position to test right now).

Quote

How do you know cpus 0 to 8 are not on seperate physical cores? Check the CPU core
temperatures with 1/2 threads and default affinity and confirm all cores are the same relative temperature.
If 4 cores are hot and 4 cool your assumption was correct and you need to use affinity to
spread the threads over all the cores.

I know with Cryptonight because something about making sure they're on the right CCX so they don't share cache - or something of that matter. Without doing anything, the hashrate is terrible. After getting it to alternate cores, hashrate was 5 to 6x faster.

I used xmr-stak to verify. Their miner was easier to mine Cryptonight. They alternated the cores - can verify with monitoring tools such as HWInfo, Task Manager, and CoreTemp.

I want to use cpuminer-opt because of xmr-stak's dev fee. Using cpuminer-opt, I noticed without any affinity settings (just the thread setting), they were not alternating - AND the hasrate was much lower.

Quote

With your TR1950x you would use a mask of 0x55555555 for 16 threads, IF YOU NEED IT.

Thanks for this. Will try and report back.

Quote

And remember NEVER USE AFFINITY WITH DEFAULT THREADS.

Of course - default threads usually mean 100% of CPU, so no reason to set affinity.

Quote

Please provide a full report to clear up all this confusion.

Will do.

Quote

I don't have any idea if TR maps logical cores differently than Ryzen, or differently than Intel.
If it does blame AMD. I don't have any problem with Intel CPUs, never use affinity.

I'm not sure, but I think with my Haswell, it might just automatically alternate (use physical cores) without the need of setting affinity (just set the threads). I haven't had the time to play with Cryptonight on my Haswell yet. I'll check it out when I have time and let you know.

Quote

Some people tend to use decimal for affinity. Those that do probably don't know what they
are doing because affinity is a bitmap. If you don't understand a bitmap represented in hexadecimal
it's even more difficult to understand it in decimal.

Does it work with binary values?

Code:

--cpu-affinity 1010101010101010

UPDATE:
With 0x5555 and 8 threads for Ryzen (or 0x55555555 and 16 threads for TR), I get an overall 60-65% CPU utilization, and the load is spread evenly across all cores (virtual and physical).

What's weirder is on the TR, I revert back to "1", and it behaves the exact same! WTF is going on?

Hashrate using 16 threads (out of 32) - no affinity settings:
Around 400 H/s

Hashrate using 16 threads - affinity set to 1 (previously):
Around 850 H/s

Hashrate using 16 threads - affinity 0x55555555:
Around 600 H/s

Hashrate using after going back to affinity 1, still 16 threads:
Around 600 H/s

I tried also on the Ryzen. So affinity 1 = only mines on core 0. With Affinity set to 0x5555 (8 threads out of 16), around 60% cpu utilization spread across all cores (same behavior as TR).

I'm baffled.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: guytechie on February 05, 2018, 12:39:19 PM

Hi guys, I have a question about the --cpu-afinnity switch. Yeah, I know, yet another one.

I tried searching for the answers here, but thread is so long, the search is coming up with stuff that didn't help.

I have a TR 1950x and a Ryzen 7 1700/

Without messing with the affinity switch, just halving the # of threads doesn't actually distribute the load to all physical cores. It just goes from cpu 0 to cpu 8 (in the case of Ryzen 1700). This is apparent with Cryptonight, which hashes much faster when halving the # of threads to just the physical cores (not the virtual ones).

--cpu-affinity 1 works perfectly with the TR 1950x. It alternates the load from cpu 0, 2, 4, etc. However, the behavior is not the same on the Ryzen 7 1700. Instead, it would just run all threads on cpu 0 only.

Is there a different affinity mask for Ryzen 7 1700s specifically?

Your tr 1950x with cpu-affinity 1 makes no sense, the Ryzen is behaving as expected.
Affinity of 1 means that only CPU 0 will be used for all threads..
There should be as many bits set in the affinity mask as there are threads to run,
If there are fewer bits than threads you start doubling up threads on logical cores, very bad.

The only affinity that might make sense with 8 threads is either 0x5555 or 0xaaaa.
The only difference is even vs odd numbered cores. Some think 0xaaaa is better
because it leaves core 0 free. I don't know if it matters, but I digress.

How do you know cpus 0 to 8 are not on seperate physical cores? Check the CPU core
temperatures with 1/2 threads and default affinity and confirm all cores are the same relative temperature.
If 4 cores are hot and 4 cool your assumption was correct and you need to use affinity to
spread the threads over all the cores.

With your TR1950x you would use a mask of 0x55555555 for 16 threads, IF YOU NEED IT.

And remember NEVER USE AFFINITY WITH DEFAULT THREADS.

Please provide a full report to clear up all this confusion.

I don't have any idea if TR maps logical cores differently than Ryzen, or differently than Intel.
If it does blame AMD. I don't have any problem with Intel CPUs, never use affinity.

Some people tend to use decimal for affinity. Those that do probably don't know what they
are doing because affinity is a bitmap. If you don't understand a bitmap represented in hexadecimal
it's even more difficult to understand it in decimal.

guytechie

hero member

Activity: 677

Merit: 500

Hi guys, I have a question about the --cpu-afinnity switch. Yeah, I know, yet another one.

I tried searching for the answers here, but thread is so long, the search is coming up with stuff that didn't help.

I have a TR 1950x and a Ryzen 7 1700/

Without messing with the affinity switch, just halving the # of threads doesn't actually distribute the load to all physical cores. It just goes from cpu 0 to cpu 8 (in the case of Ryzen 1700). This is apparent with Cryptonight, which hashes much faster when halving the # of threads to just the physical cores (not the virtual ones).

--cpu-affinity 1 works perfectly with the TR 1950x. It alternates the load from cpu 0, 2, 4, etc. However, the behavior is not the same on the Ryzen 7 1700. Instead, it would just run all threads on cpu 0 only.

Is there a different affinity mask for Ryzen 7 1700s specifically?

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: 4ward on February 05, 2018, 10:24:24 AM

So I finally got some time to test this and confirm the results I saw
I have mined to 2 wallets on http://pool.threeeyed.info/
Running 2 miners in parallel, with 2 threads each, on 4 core cpu, with manually defined affinity in task manager to make sure they use separate cores

Running for 2 hours with no interruptions, here is the results
multi gave double the profit, and reported more or less correct speed
opt reported speed was about 4 times higher that the received speed on pool

cpuminer-opt 3.8.0:
Screenshot:
http://prntscr.com/iafztr
Miner output:
https://text-share.com/view/e95290d4
Pool link:
http://pool.threeeyed.info/?address=RMuoJFg2qDSxEaDaCZG24Yhn3k99gW6SkF

cpuminer-multi-1.3.3
Screenshot:
http://prntscr.com/iafzkk
Miner output:
https://text-share.com/view/59a010b3
Pool link:
http://pool.threeeyed.info/?address=RJmz1bAtpa4hXrX7LC82cVikoB4gpd5L52

That's not good. I have confirmed the hash rate calculation for other 4way algos
so this seems specific to x16r. The fact that the pool-side hash rate lower than multi
suggests that only one of the 4 lanes is finding shares and the other three are finding
nothing, not even rejects.

There is a simple test I can do to confirm the lane issue but I'm pretty deep in debugging
more optimizations right now so I can't follow up immediately. I'll try to find a window to
do this test (I'll use your addresses at 3eyed for testing) then decide how to move forward.

Thanks for the hard work.

Edit: confirmed, only lane 0 submitting shares.

4ward

member

Activity: 473

Merit: 18

Quote from: joblo on January 24, 2018, 09:14:26 AM

Quote from: 4ward on January 24, 2018, 06:24:10 AM

Quote from: joblo on January 23, 2018, 09:15:41 PM

cpuminer-opt-3.8.0 released.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.8.0

4way no longer a seperate feature, included in AVX2.
Added x16r algo for Ravencoin, anime algo for Animecoin.
More 4way optimizations for X13 and up.
Tweaked CPU affinity to better support more than 64 CPUs.
Fixed compile problem on some old AMD CPUs.

This release is a major milestone for cpuminer-opt. It essentially marks the end of
4way phase 1. 15 of the 17 functions in X17 have some form of optimization. Only
fugue and whirlpool still use unoptimized code and 4 way seems impossible on
these functions using SIMD.

for some reason X16r shows about 4x the hashrate of the official cpuminer (by Tpuvot), while actually delivering lower hashrate and sharerate on the pool side

p.s. dropped a penny in the ETH jar

Thanks for the tip. Your report about hash rate is scary because it's hard to verify. x16r by it's nature has a very volatile
hash rate and Suprnova has been notorious for displaying incorrect hash rates.

There is no difference in the miner-side hashrate calculation for x16r. I made a change for 4way to account for 4 nonces
per iteration but it applies to all 4way algos and seems to be accurate.

It'll need more data from other users to determine if there is a problem with hashrate calculation.

Edit: hash rate for x16r is more volatile with opt vs multi due to the mix of optimized and unoptimized hash functions.
X16r can theoretically run the same function 16 times. If it's an unoptimized function the hash rate with opt wil be the
same as multi. But if it's a highly optimized function the hash rate gain will be higher than average.

So I finally got some time to test this and confirm the results I saw
I have mined to 2 wallets on http://pool.threeeyed.info/
Running 2 miners in parallel, with 2 threads each, on 4 core cpu, with manually defined affinity in task manager to make sure they use separate cores

Running for 2 hours with no interruptions, here is the results
multi gave double the profit, and reported more or less correct speed
opt reported speed was about 4 times higher that the received speed on pool

cpuminer-opt 3.8.0:
Screenshot:
http://prntscr.com/iafztr
Miner output:
https://text-share.com/view/e95290d4
Pool link:
http://pool.threeeyed.info/?address=RMuoJFg2qDSxEaDaCZG24Yhn3k99gW6SkF

cpuminer-multi-1.3.3
Screenshot:
http://prntscr.com/iafzkk
Miner output:
https://text-share.com/view/59a010b3
Pool link:
http://pool.threeeyed.info/?address=RJmz1bAtpa4hXrX7LC82cVikoB4gpd5L52

Andre100

newbie

Activity: 128

Merit: 0

Quote from: 4ward on February 04, 2018, 10:35:56 AM

Quote from: Andre100 on February 04, 2018, 10:27:50 AM

Can I ask, after monero fly away with Diff, now more profitable LYRA2 ?

Lyra2z, Yescrypt, sometimes HODL on NiceHash, Yenten, HPP coin

This is in general the algos that can still get some profit on a CPU

ohh, on NiceHash i see. Thx.

erixxx

newbie

Activity: 3

Merit: 0

Quote from: mangoo on February 04, 2018, 09:33:01 AM

Quote from: erixxx on February 04, 2018, 08:32:43 AM

Hi! can someone please make a full guide step by step on how to compile this miner static , on linux? thx

Did you try these instructions:

https://lxadm.com/Static_compilation_of_cpuminer

Hi mangoo! i manage to compile another cpuminer using the steps from the link you provide, if u want send me a pm with a xmr adress, i want to send you 1xmr! have a nice day all

Etherion

sr. member

Activity: 512

Merit: 260

I'm using this to mine zcoin/lyra2v on i7 7700k. I get about 770khs whish is great. But when mining yescrypt I get 2, 8khs which seems very low. Is that about what one can expect from a i7 7700k?

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: erixxx on February 04, 2018, 11:41:32 AM

Quote from: mangoo on February 04, 2018, 09:33:01 AM

Quote from: erixxx on February 04, 2018, 08:32:43 AM

Hi! can someone please make a full guide step by step on how to compile this miner static , on linux? thx

Did you try these instructions:

https://lxadm.com/Static_compilation_of_cpuminer

Hi! thx for your quick answer , yes i did try those steps from the link above, i try like almost everything possible, im sure im missing something, but i dont know what..thats why i ask for guide, i have a lot of errors when i compile! error '__int128' is not supported on this target

You followed the guide but but now you're asking for a guide??? You're account is brand new so unless you can demostrate
you know what you're doing I have to assume you are a real noob. So, as a noob, DO EVERYTHING DEFAULT! Make sureyou
can do that before messing with stuff.

And if you want help you'll have to provide proper data including exactly what you're doing and what you're seeing.
I will ignore any posts without supporting data.

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 30. (Read 444067 times)