[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 46.

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: joblo on December 17, 2017, 03:26:01 PM

Quote from: Larvitar on December 17, 2017, 01:53:04 PM

Tribus 4way 8 threads:

Code:

[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578
[2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s
[2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s
[2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s
[2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s
[2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s
[2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s
[2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s
[2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s

Apparently Tribus 4way likes SMT/HT here.

It's interesting that the thread rate didn't increase with fewer threads. Were the threads spread over
all 8 cores? You can try "-t 8 --cpu-affinity 0x5555" to select alternate vcores.

Code:

[2017-12-17 17:34:59] [2017-12-17 17:36:25] tribus block 449526, diff 130.915
[2017-12-17 17:36:25] CPU #6: 5670.24 kH, 753.19 kH/s
[2017-12-17 17:36:25] CPU #5: 5840.23 kH, 775.66 kH/s
[2017-12-17 17:36:25] CPU #0: 69.55 kH, 763.09 kH/s
[2017-12-17 17:36:25] CPU #7: 5672.16 kH, 753.14 kH/s
[2017-12-17 17:36:25] CPU #4: 5766.59 kH, 765.78 kH/s
[2017-12-17 17:36:25] CPU #2: 5597.96 kH, 743.19 kH/s
[2017-12-17 17:36:25] CPU #3: 5665.52 kH, 752.36 kH/s
[2017-12-17 17:36:25] CPU #1: 5690.77 kH, 755.51 kH/s
[2017-12-17 17:36:26] Accepted 2/2 (100%), 39.97 MH, 6061.92 kH/s

Ya, the default affinity was choosing virtual threads instead physical ones. Damn! 6MH/s!

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Larvitar on December 17, 2017, 01:53:04 PM

Tribus 4way 8 threads:

Code:

[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578
[2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s
[2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s
[2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s
[2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s
[2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s
[2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s
[2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s
[2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s

Apparently Tribus 4way likes SMT/HT here.

It's interesting that the thread rate didn't increase with fewer threads. Were the threads spread over
all 8 cores? You can try "-t 8 --cpu-affinity 0x5555" to select alternate vcores.

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: joblo on December 17, 2017, 01:09:52 PM

Quote from: Larvitar on December 17, 2017, 12:16:09 PM

I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

This is very interesting feedback. I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz.

Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both?
None of the following should cause that much of a difference, but it helps to quantify.

AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again
with AVX to compare.

4way uses 4 time the memory of plain AVX2. This will expose any cache performance issues. Try running fewer
threads to see if performance (total, not just per thread) improves.

Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.

Thanks for the reply.

About Tribus (3.7.7 version):

Tribus AVX 16 threads:

Code:

[2017-12-17 15:45:48] tribus block 449382, diff 297.717
[2017-12-17 15:45:48] CPU #3: 73.32 kH, 226.66 kH/s
[2017-12-17 15:45:48] CPU #2: 60.95 kH, 225.42 kH/s
[2017-12-17 15:45:48] CPU #1: 68.89 kH, 228.54 kH/s
[2017-12-17 15:45:48] CPU #0: 59.57 kH, 220.31 kH/s
[2017-12-17 15:45:48] CPU #7: 71.66 kH, 226.42 kH/s
[2017-12-17 15:45:48] CPU #4: 47.67 kH, 206.94 kH/s
[2017-12-17 15:45:48] CPU #14: 69.70 kH, 228.19 kH/s
[2017-12-17 15:45:48] CPU #6: 66.07 kH, 226.71 kH/s
[2017-12-17 15:45:48] CPU #12: 36.67 kH, 223.24 kH/s
[2017-12-17 15:45:48] CPU #15: 69.95 kH, 228.24 kH/s
[2017-12-17 15:45:48] CPU #11: 66.53 kH, 225.95 kH/s
[2017-12-17 15:45:48] CPU #5: 70.96 kH, 227.81 kH/s
[2017-12-17 15:45:48] CPU #10: 312.06 kH, 275.75 kH/s
[2017-12-17 15:45:48] CPU #8: 43.73 kH, 172.57 kH/s
[2017-12-17 15:45:48] CPU #9: 68.83 kH, 238.64 kH/s
[2017-12-17 15:45:48] CPU #13: 72.51 kH, 228.39 kH/s

Tribus AVX2 16 threads:

Code:

[2017-12-17 15:45:48][2017-12-17 15:49:10] tribus block 449390, diff 254.451
[2017-12-17 15:49:10] CPU #4: 97.38 kH, 211.38 kH/s
[2017-12-17 15:49:10] CPU #6: 110.08 kH, 237.92 kH/s
[2017-12-17 15:49:10] CPU #7: 110.38 kH, 238.04 kH/s
[2017-12-17 15:49:10] CPU #0: 103.07 kH, 221.32 kH/s
[2017-12-17 15:49:10] CPU #1: 109.05 kH, 234.17 kH/s
[2017-12-17 15:49:10] CPU #9: 109.41 kH, 238.00 kH/s
[2017-12-17 15:49:10] CPU #8: 108.26 kH, 234.98 kH/s
[2017-12-17 15:49:10] CPU #13: 109.99 kH, 238.22 kH/s
[2017-12-17 15:49:10] CPU #5: 112.40 kH, 241.36 kH/s
[2017-12-17 15:49:10] CPU #11: 111.49 kH, 239.40 kH/s
[2017-12-17 15:49:10] CPU #3: 111.29 kH, 238.97 kH/s
[2017-12-17 15:49:10] CPU #15: 110.46 kH, 238.21 kH/s
[2017-12-17 15:49:10] CPU #2: 110.69 kH, 237.67 kH/s
[2017-12-17 15:49:10] CPU #10: 111.39 kH, 239.19 kH/s
[2017-12-17 15:49:10] CPU #14: 110.70 kH, 237.20 kH/s
[2017-12-17 15:49:10] CPU #12: 94.46 kH, 199.39 kH/s
[2017-12-17 15:49:15] CPU #12: 836.08 kH, 196.43 kH/s
[2017-12-17 15:49:15] Accepted 1/1 (100%), 2472.11 kH, 3722.47 kH/s

Tribus 4way 16 threads:

Code:

[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 15:50:38] tribus block 449392, diff 221.049
[2017-12-17 15:50:38] CPU #0: 2552.29 kH, 340.11 kH/s
[2017-12-17 15:50:38] CPU #1: 3076.95 kH, 410.02 kH/s
[2017-12-17 15:50:38] CPU #12: 2199.45 kH, 293.25 kH/s
[2017-12-17 15:50:38] CPU #8: 2508.86 kH, 334.41 kH/s
[2017-12-17 15:50:38] CPU #14: 2807.39 kH, 374.11 kH/s
[2017-12-17 15:50:38] CPU #9: 3002.02 kH, 400.25 kH/s
[2017-12-17 15:50:38] CPU #2: 2978.50 kH, 396.85 kH/s
[2017-12-17 15:50:38] CPU #3: 2993.07 kH, 398.79 kH/s
[2017-12-17 15:50:38] CPU #5: 2997.27 kH, 399.67 kH/s
[2017-12-17 15:50:38] CPU #4: 2927.24 kH, 390.44 kH/s
[2017-12-17 15:50:38] CPU #6: 2954.16 kH, 393.72 kH/s
[2017-12-17 15:50:38] CPU #7: 2983.57 kH, 397.69 kH/s
[2017-12-17 15:50:38] CPU #11: 3005.27 kH, 400.79 kH/s
[2017-12-17 15:50:38] CPU #15: 2946.88 kH, 393.06 kH/s
[2017-12-17 15:50:38] CPU #10: 2947.45 kH, 392.77 kH/s
[2017-12-17 15:50:38] CPU #13: 2742.90 kH, 365.66 kH/s

Tribus 4way 8 threads:

Code:

[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578
[2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s
[2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s
[2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s
[2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s
[2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s
[2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s
[2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s
[2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s

Apparently Tribus 4way likes SMT/HT here.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: My9bot on December 17, 2017, 12:20:53 PM

cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4

Thanks for that. Do you have a howto guide? I need to file it for when I finally upgrade my build environment

With your permission I will add your link to the OP.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Larvitar on December 17, 2017, 12:16:09 PM

I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

This is very interesting feedback. I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz.

Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both?
None of the following should cause that much of a difference, but it helps to quantify.

AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again
with AVX to compare. Another, better, way to copmare AVX2 vs AVX performance is lyra2rev2. It has the most
AVX2 code.

4way uses 4 times the memory of plain AVX2. This will expose any cache performance issues. Try running fewer
threads to see if performance (total, not just per thread) improves.

Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: My9bot on December 17, 2017, 12:20:53 PM

cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4

Thank you! Cheesy

EDIT:
Starting miner it asks for libcrypto-1_1-x64.dll. Do I need it or just have to rename the libcrypto1.0.0.dll?

EDIT2:
Solved by installing OpenSSL 1.1 x64.

My9bot

full member

Activity: 239

Merit: 100

Quote from: Larvitar on December 17, 2017, 12:16:09 PM

Quote from: joblo on December 14, 2017, 06:54:51 PM

cpuminer-opt-3.7.6 is released.

Added lyra2h algo for Hppcoin.
Added support for more than 64 CPUs.
Optimized shavite with AES, improves x11 etc.

Get it on git: https://github.com/JayDDee/cpuminer-opt/releases

More detailed release notes:

Lyra2h has not been tested. It is virtually a clone of lyra2z so it should work.
Please report any problems.

Support for over 64 CPU is limited in that specifying --cpu-affinity has no effect.
The arg will be ignored and he default affinity will be used. This has not been
tested either so if anyone has the ability to test it please do so and report.

There are no new 4way algos this release but optiizing shavite came as a surprise
and helps all CPUs with AES.

The past two releases have also seen some reworking of some existing SIMD code as
I learn new techniques. It should be more efficient but not likely to produce a significant
speed up.

There are currently 2 4way blockers. BMW is blocking full optimization of x11 and blake256
is blocking m7m. I'd like to get those resolved but I'm stuck at the moment. Since m7m is
CPU only I'd like to prioritize that algo.

A few algos have 4way enabled bur are either untested or have known problems that affect
performance.

Tested working: skein, keccak, keccakc, nist5, tribus.

Enabled untested: skein2, jha, whirlpool, pentablake.

Enabled with known problems: blake256 lane corruption: lyra2z, decred, blake.
These algos operate in 2way mode due to invalid hash in 2 lanes.

Kudos for you! Awesome miner

Lets to the feedback:
I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

I would like to see SHA enabled and working in Windows, but I saw how difficult are. But, if I could help, I can allow you to connect to my machine to try something. I dont have knowledge about coding, but want help to compile a SHA miner.

cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4

Larvitar

jr. member

Activity: 196

Merit: 1

Quote from: joblo on December 14, 2017, 06:54:51 PM

cpuminer-opt-3.7.6 is released.

Added lyra2h algo for Hppcoin.
Added support for more than 64 CPUs.
Optimized shavite with AES, improves x11 etc.

Get it on git: https://github.com/JayDDee/cpuminer-opt/releases

More detailed release notes:

Lyra2h has not been tested. It is virtually a clone of lyra2z so it should work.
Please report any problems.

Support for over 64 CPU is limited in that specifying --cpu-affinity has no effect.
The arg will be ignored and he default affinity will be used. This has not been
tested either so if anyone has the ability to test it please do so and report.

There are no new 4way algos this release but optiizing shavite came as a surprise
and helps all CPUs with AES.

The past two releases have also seen some reworking of some existing SIMD code as
I learn new techniques. It should be more efficient but not likely to produce a significant
speed up.

There are currently 2 4way blockers. BMW is blocking full optimization of x11 and blake256
is blocking m7m. I'd like to get those resolved but I'm stuck at the moment. Since m7m is
CPU only I'd like to prioritize that algo.

A few algos have 4way enabled bur are either untested or have known problems that affect
performance.

Tested working: skein, keccak, keccakc, nist5, tribus.

Enabled untested: skein2, jha, whirlpool, pentablake.

Enabled with known problems: blake256 lane corruption: lyra2z, decred, blake.
These algos operate in 2way mode due to invalid hash in 2 lanes.

Kudos for you! Awesome miner

Lets to the feedback:
I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

I would like to see SHA enabled and working in Windows, but I saw how difficult are. But, if I could help, I can allow you to connect to my machine to try something. I dont have knowledge about coding, but want help to compile a SHA miner.

joblo

legendary

Activity: 1470

Merit: 1114

New release cpuminer-opt-3.7.7

Fixed regression caused by 64 CPU support.
Fixed lyra2h.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.7.7

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: lncm on December 17, 2017, 06:28:35 AM

Sorry to annoy you with so many questions.

You ask snap questions without thinking then you challenge my answers based on your misconceptions.

Running out of memory is a simple problem that you should be able to solve yourself.

You don't need to apologize, just try harder before asking questions. And if you do need to ask a
question about a problem you should show how you tried to solve it. You learn more that way.

lncm

member

Activity: 388

Merit: 13

Quote from: joblo on December 16, 2017, 07:53:36 PM

Quote from: lncm on December 16, 2017, 06:14:54 PM

Quote from: joblo on December 16, 2017, 05:49:02 PM

Quote from: lncm on December 16, 2017, 05:33:05 PM

Quote from: joblo on December 14, 2017, 02:37:56 PM

Quote from: lncm on December 14, 2017, 01:22:56 PM

Quote from: joblo on December 14, 2017, 10:09:33 AM

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory. You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?

You don't have enough RAM to run that many threads without using VM. Using VM is slow.
Stop arguing and do the math: N*threads.

How many RAM per thread? So if I run less threads could it be actually faster?

Sorry to annoy you with so many questions.

PS: in task manager cpuminer has 11.5 Gb RAM allocated.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: lncm on December 16, 2017, 06:14:54 PM

Quote from: joblo on December 16, 2017, 05:49:02 PM

Quote from: lncm on December 16, 2017, 05:33:05 PM

Quote from: joblo on December 14, 2017, 02:37:56 PM

Quote from: lncm on December 14, 2017, 01:22:56 PM

Quote from: joblo on December 14, 2017, 10:09:33 AM

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory. You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?

You don't have enough RAM to run that many threads without using VM. Using VM is slow.
Stop arguing and do the math: N*threads.

lncm

member

Activity: 388

Merit: 13

Quote from: joblo on December 16, 2017, 05:49:02 PM

Quote from: lncm on December 16, 2017, 05:33:05 PM

Quote from: joblo on December 14, 2017, 02:37:56 PM

Quote from: lncm on December 14, 2017, 01:22:56 PM

Quote from: joblo on December 14, 2017, 10:09:33 AM

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory. You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: lncm on December 16, 2017, 05:33:05 PM

Quote from: joblo on December 14, 2017, 02:37:56 PM

Quote from: lncm on December 14, 2017, 01:22:56 PM

Quote from: joblo on December 14, 2017, 10:09:33 AM

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory. You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

lncm

member

Activity: 388

Merit: 13

Quote from: joblo on December 14, 2017, 02:37:56 PM

Quote from: lncm on December 14, 2017, 01:22:56 PM

Quote from: joblo on December 14, 2017, 10:09:33 AM

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory. You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?

Thanks and keep up the good work!

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: mangoo on December 16, 2017, 11:32:47 AM

Quote from: joblo on December 16, 2017, 09:26:20 AM

This is the proposed fix for the 32 cpu limit:

Code:

@@ -204,7 +204,7 @@
   for ( uint8_t i = 0; i < ncpus; i++ )
   {
   // cpu mask
- if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set );
+ if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set );
   }
   if ( id == -1 )
   {
@@ -1690,9 +1690,9 @@
   {
   if (opt_debug)
   applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
- thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+ thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );

- affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+ affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
   }
   else if (opt_affinity != -1)
   {

All good now - all CPUs running with this patch, thanks!

Thanks for testing. I still don't understand why it worked before with -1UL (32 bit) but it's moot now.

If I get a response (or after a suitable timeout with no response) for the lyra2h fix I will release both

mangoo

newbie

Activity: 23

Merit: 0

Quote from: joblo on December 16, 2017, 09:26:20 AM

This is the proposed fix for the 32 cpu limit:

Code:

@@ -204,7 +204,7 @@
   for ( uint8_t i = 0; i < ncpus; i++ )
   {
   // cpu mask
- if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set );
+ if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set );
   }
   if ( id == -1 )
   {
@@ -1690,9 +1690,9 @@
   {
   if (opt_debug)
   applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
- thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+ thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );

- affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+ affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
   }
   else if (opt_affinity != -1)
   {

All good now - all CPUs running with this patch, thanks!

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: mangoo on December 16, 2017, 12:31:52 AM

Quote from: joblo on December 15, 2017, 10:32:39 PM

Stupid mistake, try this change in algo/lyra2/lyra2h.c line 34:

Code:

34c34
< LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16 );
---
> LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);

I presume no news means it now works? I'd like confirmation.

With the following change it still only uses 32 CPUs:

Code:

--- algo/lyra2/lyra2h.c.orig 2017-12-14 23:28:51.000000000 +0000
+++ algo/lyra2/lyra2h.c 2017-12-16 05:29:48.295167452 +0000
@@ -31,7 +31,7 @@
   sph_blake256( &ctx_blake, input + 64, 16 );
   sph_blake256_close( &ctx_blake, hash );

- LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);
+ LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16);

   memcpy(state, hash, 32);
}

Not sure if I should try your earlier changes as well? If so - could you send a patch in diff -u format?

I'm a bit confused by this post.

Your comment about still using 32 CPUs is for my previous post about using 1ULL to force it to 64 bits.
You're saying that didn't work?

The quote above is for a different problem with rejects mining the new lyra2h algo. Is that what you
are now offerring to test?

Edit: I re-read you post a few more times and it appears you're saying that the Lyra2 chage didn't fix
the 32 cpu limit problem you initially reported. It only (hopefully) fixes the rejects from lyra2h reported
by someone else.

This is the proposed fix for the 32 cpu limit:

Code:

@@ -204,7 +204,7 @@
   for ( uint8_t i = 0; i < ncpus; i++ )
   {
   // cpu mask
- if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set );
+ if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set );
   }
   if ( id == -1 )
   {
@@ -1690,9 +1690,9 @@
   {
   if (opt_debug)
   applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
- thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+ thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );

- affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+ affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
   }
   else if (opt_affinity != -1)
   {

nizzuu

full member

Activity: 187

Merit: 100

Cryptocurrency enthusiast

Quote from: Drag0g0 on December 16, 2017, 03:00:28 AM

Im getting "stratum_recv_line failed" with stable connection, did try differend pools and no help.

It happen every ~15min

Trying mine Yenten.

Pool issues, or your hardaware is too slow to send at least one share in the desired period of time (e.g. 15mins for your pool), so the pool thinks you're not there. Try to decrease diff (use fixed diff if this pool supports it, or a port with a lower diff).

Drag0g0

newbie

Activity: 64

Merit: 0

Im getting "stratum_recv_line failed" with stable connection, did try differend pools and no help.

It happen every ~15min

Trying mine Yenten.

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 46. (Read 444117 times)