Pages:
Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 94. (Read 444129 times)

full member
Activity: 154
Merit: 100
Hi!

I have noticed there is some issue with getwork when trying to solo mine HexxCoin with cpuminer.
(HexxCoin is a zcoin fork, lyra2rev2 algo from zoin as 330).
Miner works well with ocminers pool.

Actually pointed this out some time ago when tried it with zcoin wallet before it forked to gpu algo.
Lot of people wasted power and time and may still be doing it when trying useless solo with cpuminer&wallet.
You will only see list of new blocks going past.

So, where is the problem?
Any plans to fix solo?

All that wasted time and power could have been avoided by reading the miner's requirements.

So i ask again, where is the problem?
Any plans to fix solo?
legendary
Activity: 1470
Merit: 1114
Hi!

I have noticed there is some issue with getwork when trying to solo mine HexxCoin with cpuminer.
(HexxCoin is a zcoin fork, lyra2rev2 algo from zoin as 330).
Miner works well with ocminers pool.

Actually pointed this out some time ago when tried it with zcoin wallet before it forked to gpu algo.
Lot of people wasted power and time and may still be doing it when trying useless solo with cpuminer&wallet.
You will only see list of new blocks going past.

So, where is the problem?
Any plans to fix solo?

All that wasted time and power could have been avoided by reading the miner's requirements.
full member
Activity: 154
Merit: 100
Hi!

I have noticed there is some issue with getwork when trying to solo mine HexxCoin with cpuminer.
(HexxCoin is a zcoin fork, lyra2rev2 algo from zoin as 330).
Miner works well with ocminers pool.

Actually pointed this out some time ago when tried it with zcoin wallet before it forked to gpu algo.
Lot of people wasted power and time and may still be doing it when trying useless solo with cpuminer&wallet.
You will only see list of new blocks going past.

So, where is the problem?
Any plans to fix solo?

hero member
Activity: 700
Merit: 500
Edit: All links updated to 3.5.9.

cpuminer-opt v3.5.9 is available on git. Google drive is currently down. tarball and windows binaries
links stil point to v3.5.8. This post and the OP will be updated when download links are updated.


https://github.com/JayDDee/cpuminer-opt

Reduced stack usage for hmq1725 and small speedup.
Added Deep algo optimized for AES and AVX2
Rewrite of SSE2 Luffa, midstate now supported in deep, qubit & timetravel
Small changes to algo-gate.

It is hoped the reduced stack usage by hqm1725 will solve the stack smashing crash seen on
Ubuntu 16.04 using gcc 5.4. Users who had problems are encouraged to retest, once 3.5.9
is available in all formats.

Much of the hmq1725 code is shared among many algos. The non-shared code was reviewed for
potential pointer errors but none were found. The optimizartions described below were the result
of changes to Luffa, specifically it is now 100% vectorized. There is no more flippping back and forth
between scalar and vector operations. The code is cleaner and less prone to pointer errors.
If I'm lucky the rewrite also may have magically solved the hmq1725 problem.

The fact is the problem is real and only seems to occur with gcc 5.4. It is not relelated to stack protector
compile option as the miner crashes even with it disabled. Yet gcc 4.8 has no problems. The problem is
not yet understood.

Deep is significantly optimized but there's nothing to compare it with. Qubit also benefits from the
same optimizations but, for comparative purposes, less than Deep. Some Timetravel permutations,
about one out of 8, also see an increase. All mostly due to implementing midstate precalc for Luffa.



will test today evening (+10 hrs from now)

thanks
legendary
Activity: 1470
Merit: 1114
Edit: All links updated to 3.5.9.

cpuminer-opt v3.5.9 is available on git. Google drive is currently down. tarball and windows binaries
links stil point to v3.5.8. This post and the OP will be updated when download links are updated.


https://github.com/JayDDee/cpuminer-opt

Reduced stack usage for hmq1725 and small speedup.
Added Deep algo optimized for AES and AVX2
Rewrite of SSE2 Luffa, midstate now supported in deep, qubit & timetravel
Small changes to algo-gate.

It is hoped the reduced stack usage by hqm1725 will solve the stack smashing crash seen on
Ubuntu 16.04 using gcc 5.4. Users who had problems are encouraged to retest, once 3.5.9
is available in all formats.

Much of the hmq1725 code is shared among many algos. The non-shared code was reviewed for
potential pointer errors but none were found. The optimizartions described below were the result
of changes to Luffa, specifically it is now 100% vectorized. There is no more flippping back and forth
between scalar and vector operations. The code is cleaner and less prone to pointer errors.
If I'm lucky the rewrite also may have magically solved the hmq1725 problem.

The fact is the problem is real and only seems to occur with gcc 5.4. It is not relelated to stack protector
compile option as the miner crashes even with it disabled. Yet gcc 4.8 has no problems. The problem is
not yet understood.

Deep is significantly optimized but there's nothing to compare it with. Qubit also benefits from the
same optimizations but, for comparative purposes, less than Deep. Some Timetravel permutations,
about one out of 8, also see an increase. All mostly due to implementing midstate precalc for Luffa.

legendary
Activity: 1470
Merit: 1114
While working on deep I solved why midstate precalc didn't work for luffa. This will help deep a lot, because of the
short chain, qubit less, and timetravel even less.

Edit: I've just rewritten SSE2 Luffa to avoid flipping back and forth from vector operations to scalar operations.
Now that the code is clean it looks like it could be promoted to AVX2, although there are some bits that will be
challenging. But that will have to wait for a future release.

For now Luffa stays at SSE2 but adds support for midstate precalc in Timetravel, Qubit, and Deep. Just testing now,
don't how long that will take, have to wait for the right Timetravel permutation.

If the stack smashing in hmq1725 is not solved in the next release by reducing stack usage, I will have higher
confidence the problem is not in Luffa code.

legendary
Activity: 1470
Merit: 1114
Thanks both of you. The problem is confirmed to be real and not a false positive.
Stack oveflow is still my only real lead, we'll see if reducing the stack usage for
hmq1725 helps.

Finishing deep algo first.
hero member
Activity: 700
Merit: 500
i have no more idea , except switching GCC or to install 14.04         step back     grrrrr
im not sure if it helps, but i have two laptops running ubuntu 16.10 (not sure about gcc version, seems to be 5.X or newer from what i remember) and it works fine with default build.sh

edit: woops we are talking about hmq1725, nevermind, never used that algo Cheesy

It would still be a valid test as it would provide another datapoint. Are these Intel or AMD, another datapoint maybe?

You can use the following if no wallet/address.
Code:
./cpuminer -a hmq1725 -o stratum+tcp://yiimp.ccminer.org:3535 -u benchmark -p stats
As soon as you see a hash report you can kill it because it got past the stack smashing detector.

If you get stack smashing detected crash then try compiling with -fno-stack-protector to see if that
prevents the crash or alters the crash.

I was hoping integrale would do it, but I think he's given up.

If you keep testing for me I might have to start paying you. Wink

Edit: meanwhile I found a suspicious line of code with apparently ambiguous operator precedence:

Code:
  msg[0] = _mm_load_si128 ( (__m128i*)&state->buffer[4] );

State is a struct, buffer is uint32[]
It looks to me like the index is 4 * (sizeof __m128i)
when it should be                      4 * (sizeof uint32_t) or 1 x (sizeof __m128i)

As this is accessing data on the stack it could cause a stack violation if it's wrong.
I'm just not sure where the cast applies: (spacing exagerated for emphasis)

Code:
( (__m128i*) &state )              ->buffer[4]    // should fail to compile
( (__m128i*) &state->buffer )              [4]    // Wrong index, should be 1
( (__m128i*) &state->buffer[4] )                  // result is data not a ptr

The only one that seems to fit the logic is the second one, but if so the index is wrong.

Edit: I missread the code. The following works which confirms precedence was correct:

Code:
(__m128i*)( &state->buffer )[4] )

I will probabaly change it as this is the only place in the function where a uint32 offset is used.
Everything else is __m128i.

haha Wink

on my xeon it crashes as well (3.5.7 is also affected, dont have any older versions here):

Code:
./cpuminer -a hmq1725 -o stratum+tcp://yiimp.ccminer.org:3535 -u benchmark -p stats

         **********  cpuminer-opt 3.5.7  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU:      Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Feb 14 2017 with GCC 5.4.0
SW features: SSE2 AES AVX
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

[2017-02-22 19:39:14] Starting Stratum on stratum+tcp://yiimp.ccminer.org:3535
[2017-02-22 19:39:14] 8 miner threads started, using 'hmq1725' algorithm.
[2017-02-22 19:39:21] Stratum difficulty set to 0.024 (0.00000)
[2017-02-22 19:39:21] hmq1725 block 627861, diff 2.101
*** stack smashing detected ***: ./cpuminer terminated
Aborted

i have then compiled the latest version with the fno option you mentioned and it didnt crash, though it outputted quite some text with -D supplied

i have uploaded the output files here:

http://pastebin.com/UevnP84w (with debug enabled, not compiled with fno option)
http://pastebin.com/end94mZb (with debug enabled, compiled with fno option, cut (original file from 8 sec run was 90mb))
http://pastebin.com/YtQwAaNT (without debug, compiled with fno option)
full member
Activity: 144
Merit: 100
Eager to learn
sklave@miner3-HP-500B:~/joblo350$ ./cpuminer -a yescrypt -t 2 -o stratum+tcp://mine.zpool.ca:6233 -u 1Cjq5f4ASXL5CpURWThYeNbtyB4ph98ex8 -p x,c=BTC,stats

         **********  cpuminer-opt 3.5.0  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU: AMD Athlon(tm) II X2 240 Processor
CPU features: SSE2
SW built on Feb 22 2017 with GCC 5.4.0
SW features: SSE2
Algo features: SSE2
Start mining with SSE2

[2017-02-22 18:53:52] Starting Stratum on stratum+tcp://mine.zpool.ca:6233
[2017-02-22 18:53:52] 2 miner threads started, using 'yescrypt' algorithm.
[2017-02-22 18:53:58] Stratum difficulty set to 1 (0.00002)
[2017-02-22 18:53:58] yescrypt block 122725, diff 0.036
Unvalid  Machineinstruction (Speicherabzug geschrieben)


compiled with     -fno-stack-protector  (Linux 16.04 LTS    GCC5.4.0       Cpuminer-opt 3.5.0      AMD athlon II x240

tested with Algo M7M , Cryptonight   works
legendary
Activity: 1470
Merit: 1114
thats a good joke   paying me  , btw. my poor knowledge of mining-software is under the GPL  license    ergo FREE       loool   im happy you deliver us like allways good working and powerfull miner-software


no problem if i can spend time , other machines runnig on newer OS and GCC  


I meant Felix, if he's testing your problem. I assume there is already enough incentive to test
your own problems.
full member
Activity: 144
Merit: 100
Eager to learn
thats a good joke   paying me  , btw. my poor knowledge of mining-software is under the GPL  license    ergo FREE       loool   im happy you deliver us like allways good working and powerfull miner-software


no problem if i can spend time , other machines runnig on newer OS and GCC  
full member
Activity: 144
Merit: 100
Eager to learn
im sorry , for my late response , i havent give up  but in that case  i played around with changing gcc     and killed them totally so no chance to bring them back to live or install any other either a newer on   grrr         i must install the whole system from ground up  , this happens to me   make me considering to step backward to 14.04 LTS  and   now on gcc 4.8.4 and Mner Version 3.5.8 it works.  again im sorry that im not been able to help out with some more info    about that Stacksmashing case 
legendary
Activity: 1470
Merit: 1114
i have no more idea , except switching GCC or to install 14.04         step back     grrrrr
im not sure if it helps, but i have two laptops running ubuntu 16.10 (not sure about gcc version, seems to be 5.X or newer from what i remember) and it works fine with default build.sh

edit: woops we are talking about hmq1725, nevermind, never used that algo Cheesy

It would still be a valid test as it would provide another datapoint. Are these Intel or AMD, another datapoint maybe?

You can use the following if no wallet/address.
Code:
./cpuminer -a hmq1725 -o stratum+tcp://yiimp.ccminer.org:3535 -u benchmark -p stats
As soon as you see a hash report you can kill it because it got past the stack smashing detector.

If you get stack smashing detected crash then try compiling with -fno-stack-protector to see if that
prevents the crash or alters the crash.

I was hoping integrale would do it, but I think he's given up.

If you keep testing for me I might have to start paying you. Wink

Edit: meanwhile I found a suspicious line of code with apparently ambiguous operator precedence:

Code:
  msg[0] = _mm_load_si128 ( (__m128i*)&state->buffer[4] );

State is a struct, buffer is uint32[]
It looks to me like the index is 4 * (sizeof __m128i)
when it should be                      4 * (sizeof uint32_t) or 1 x (sizeof __m128i)

As this is accessing data on the stack it could cause a stack violation if it's wrong.
I'm just not sure where the cast applies: (spacing exagerated for emphasis)

Code:
( (__m128i*) &state )              ->buffer[4]    // should fail to compile
( (__m128i*) &state->buffer )              [4]    // Wrong index, should be 1
( (__m128i*) &state->buffer[4] )                  // result is data not a ptr

The only one that seems to fit the logic is the second one, but if so the index is wrong.

Edit: I missread the code. The following works which confirms precedence was correct:

Code:
(__m128i*)( &state->buffer )[4] )

I will probabaly change it as this is the only place in the function where a uint32 offset is used.
Everything else is __m128i.
hero member
Activity: 700
Merit: 500
i have no more idea , except switching GCC or to install 14.04         step back     grrrrr
im not sure if it helps, but i have two laptops running ubuntu 16.10 (not sure about gcc version, seems to be 5.X or newer from what i remember) and it works fine with default build.sh

edit: woops we are talking about hmq1725, nevermind, never used that algo Cheesy
legendary
Activity: 1470
Merit: 1114
i have no more idea , except switching GCC or to install 14.04         step back     grrrrr

So what happened with -fno-stack-protector?
full member
Activity: 144
Merit: 100
Eager to learn
i have no more idea , except switching GCC or to install 14.04         step back     grrrrr
legendary
Activity: 1470
Merit: 1114

CPU:         Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz <<----------- isnt it strange   missdetecting my CPU  built in is i7 2600

I get the same on my i7-6700K. Valgrind emulates a software CPU, no AVX2 yet.
legendary
Activity: 1470
Merit: 1114
not yet tried , but i will do so


here the last lines from report.log

==28886== LEAK SUMMARY:
==28886==    definitely lost: 0 bytes in 0 blocks
==28886==    indirectly lost: 0 bytes in 0 blocks
==28886==      possibly lost: 2,240 bytes in 7 blocks
==28886==    still reachable: 305,618 bytes in 785 blocks
==28886==         suppressed: 0 bytes in 0 blocks
==28886==
==28886== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
==28886== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)


No difference from mine. It looks like whatever is trippping the stack protection isn't
being seen by valgrind.
full member
Activity: 144
Merit: 100
Eager to learn
sklave@miner-HP-Compaq-8200:~/joblo8$ valgrind --leak-check=full -v --log-file=report.log ./cpuminer -a hmq1725 -t 4 -o stratum+tcp://yiimp.ccminer.org:3747 -u B6DE2BVjsbWp6u4EukeHKTU2auhBn5YUNU -p x,c=BOAT,stats

         **********  cpuminer-opt 3.5.8  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU:         Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz <<----------- isnt it strange   missdetecting my CPU  built in is i7 2600
CPU features: SSE2 AES AVX
SW built on Feb 16 2017 with GCC 5.4.0
SW features: SSE2 AES AVX
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

[2017-02-21 20:21:57] 4 miner threads started, using 'hmq1725' algorithm.
[2017-02-21 20:21:58] Starting Stratum on stratum+tcp://yiimp.ccminer.org:3747
[2017-02-21 20:21:58] Stratum difficulty set to 8 (0.00012)
[2017-02-21 20:21:58] hmq1725 block 32712, diff 1.793
*** stack smashing detected ***: ./cpuminer terminated
*** stack smashing detected ***: ./cpuminer terminated
killed
sklave@miner-HP-Compaq-8200:~/joblo8$


like this it takes about 30sec to crash

without :    valgrind --leak-check=full -v --log-file=report.log   it detects corect type of CPU
full member
Activity: 144
Merit: 100
Eager to learn
not yet tried , but i will do so


here the last lines from report.log

==28886== LEAK SUMMARY:
==28886==    definitely lost: 0 bytes in 0 blocks
==28886==    indirectly lost: 0 bytes in 0 blocks
==28886==      possibly lost: 2,240 bytes in 7 blocks
==28886==    still reachable: 305,618 bytes in 785 blocks
==28886==         suppressed: 0 bytes in 0 blocks
==28886==
==28886== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
==28886== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Pages:
Jump to: