Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 144. (Read 444131 times)

legendary
Activity: 1470
Merit: 1114
@joblo

i got a strange buffer overflow, you might know if this is miner related:

system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram

miner got terminated, my log (stdout/err from cpuminer) displayed the following:

https://paste.felixbrucker.com/paste/avy2w


I've never seen anything like this before. If it happens with all algos and only on proxmox I'd assume
it's proxmox related.
hero member
Activity: 700
Merit: 500
@joblo

i got a strange buffer overflow, you might know if this is miner related:

system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram

miner got terminated, my log (stdout/err from cpuminer) displayed the following:

https://paste.felixbrucker.com/paste/avy2w
legendary
Activity: 1470
Merit: 1114
cpuminer 3.4.4 is released.

Source: https://drive.google.com/file/d/0B0lVSGQYLJIZcWN3ZE5ma0FWRnM/view?usp=sharing

Windows: https://drive.google.com/file/d/0B0lVSGQYLJIZdG50THdjZEo5c1U/view?usp=sharing

V4.4.4 adds support for mining cryptonight algo at nicehash with AES optimizations. Some stale share rejects have
been observed when mining cryptonight at Nicehash that don't occur at other pools. These rejects are believed to
be a pool issue.

Also fixed is a compile error when using gcc 6.1.

An interim fix for a compile error in Hodl code on Westmere CPUs was submitted. This interim fix should allow hodl
to compile, however, it will not be an optimum build. Further investigation into this issue is underway with a goal
of enabling AES on Westmere CPUs.
legendary
Activity: 1470
Merit: 1114
Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.

I have some systems lying around with AMD CPUs.  I'll see what I've got that is running and run some tests if I can.

That would be nice.

I'm a little confused about your compile problem related to AES256CBC. The min/max issue is resolved.

In looking at the code more closely, it took a while to remember what I was thinking when I made those changes,
I realized the AVX checks were intended to seperate the original Wolf AES optimizations from the recent Optiminer
AVX enhancements. I assumed all the optiminer code required AVX so if it was not available the compiler would revert to
the original Wolf code which was AES enhanced.

The way it is coded only one instance of AES256CBC should be compiled, either the new Optiminer version or the Wolf version.
I really would like to see your compile errors to understand this better. I need to understand the compile error. The code from
3.4.3 should compile the Wolf code on your CPU.

The AVX checks in hodl-wolf make the assumption that if AVX is present AES is also present. They are present to seperate the
original Wolf code from the Optimier code. The AES checks are only to prevent compile errors on non-AES CPUs. None of the
Wolf code is actually run on a non-AES CPU. Perhaps I should block it all out if AES isn't available.

The intended result is:

AES+AVX: run Optiminer modded code in hodl-wolf.c and aes.c.

AES only: run all Wolf code in hodl-wolf.c and aes.c.

no AES: run the unoptimized c++ code.

That was based on assumptions. You now have some actual data from a CPU with AES but not AVX.
Your data shows that only the Optiminer code in GenerateGarbageCore contains AVX code. The remainder
of the Optiminer code will run on your AES-only Westmere.

This raises another question. Is the Optiminer AES code in aes.c and scanhash_hodl_wolf faster than the corresponding
pure Wolf code? Since you weren't able to compile the code as released it points back to understanding why it didn't
compile. Once it does you can test both and I can implement it whichever is faster.

I know I'm pushy and I know it's a lot of work but it's rare to find a Westmere owner willing and able to do some dirty work.
I really appreciate your help.
newbie
Activity: 14
Merit: 0
Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.

I have some systems lying around with AMD CPUs.  I'll see what I've got that is running and run some tests if I can.
legendary
Activity: 1470
Merit: 1114
Thx a lot, so i have to wait new release?

Yes. I though that was clear from the recent discussions in this thread.
legendary
Activity: 1470
Merit: 1114
Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.
newbie
Activity: 14
Merit: 0
Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.
newbie
Activity: 8
Merit: 0
Thx a lot, so i have to wait new release?
sr. member
Activity: 292
Merit: 250
Hello Joblo. Sry for poor English.

1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..."
2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.

Big thx for you work.

@hardkod

v3.4.3 does not yet support cryptonight mining at nicehash.

There are instructions inside README.md in the source code for building on linux.

From README.md
Code:
Building on linux prerequisites:

It is assumed users know how to install packages on their system and
be able to compile standard source packages. This is basic Linux and
beyond the scope of cpuminer-opt.

Make sure you have the basic development packages installed.
Here is a good start:

http://askubuntu.com/questions/457526/how-to-install-cpuminer-in-ubuntu

Install any additional dependencies needed by cpuminer-opt. The list below
are some of the ones that may not be in the default install and need to
be installed manually. There may be others, read the error messages they
will give a clue as to the missing package.

The folliwing command should install everything you need on Debian based
packages:

sudo apt-get install build-essential libssl-dev libcurl4-openssl-dev libjansson-dev libgmp-dev automake

Building on Linux, see below for Windows.

Dependencies

build-essential  (for Ubuntu, Development Tools package group on Fedora)
automake
libjansson-dev
libgmp-dev
libcurl4-openssl-dev
libssl-dev
pthreads
zlib

tar xvzf [file.tar.gz]
cd [file]

Run build.sh to build on Linux or execute the following commands.

./autogen.sh
CFLAGS="-O3 -march=native -Wall" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-curl
make

Start mining.

./cpuminer -a algo ...
newbie
Activity: 8
Merit: 0
Hello Joblo. Sry for poor English.

1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..."
2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.

Big thx for you work.
legendary
Activity: 1470
Merit: 1114
I have an update on supporting cryptonight at nicehash.

I implemented the changes and they seem to work and they don't break other pools so there was no need to
impmement pool-specific code.

My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems
to have stopped. The latest session is up to 36 accepts @ 100%, and counting.

I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count
of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself.
This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate.

This is what it looks like:

Code:
[2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s

More testing to do.  

thanks for this!

I think I found the bug causing the messy output. The bug has existed for a long time but didn't seem to have an effect before.
It also wasn't specific to cryptonight or the Nicehash mod. The fix requires a small design change affecting all algos so extensive
testing will be required. If it goes smoothly I should release it in a day or so.

Edit:

The output flood is fixed but I'm still concerned about stale shares. These rejects are intermittant. Last night was not good
with rejects rates over 20% at times. Today is better at less than 5%. Sometimes it changes from session to session. A session
could be runing clean but if I stop and restart it I may start producing rejects. These rejects are only produced when mining cryptonight
at Nicehash. Moneropool is always clean.

I'll poke around some more but If I don't find anything and the reject rate is manageable I'll release it as is.

Edit2:

I noticed something interesting while testing. I was mining three CPUs and had been running clean. They the all reported
a cluster of 3 or 4 rejects at the same time. This is too much of a coincidence so it seems the stale share rejects appear to be
a pool issue at Nicehash. I consider the issue closed and cryptonight support for Nicehash is ready for release.

There is one more pending issue involving Westmere CPUs. If it isn't resolved quickly I'll release cryptoninght anyway.
legendary
Activity: 1470
Merit: 1114
joblo ....

Redid my set of changes on a clean copy of your 3.4.3 codebase.  With these changes it compiles on my westmere CPU with -march=westmere  Here are the diffs:

[snipped]


Thanks.

I'm getting flashbacks to a AMD problem. It might be that some of that code won't compile on some AMD CPUs
which would explain the presence of the AVX hooks in aes.c. I recently read that AMD was working on SSE5 when Intel was
developping AVX. This may have created a mess with different implementations. Eventually AMD's SSE5 and Intel's AVX were merged.
This might also be related to the compile error I encountered trying to build for amdfam10, it was AVX related.

I'm going to have to dig deeper to understand all the ramifications. It could take a while. You seem to have a workaround and I know
of no other Westmere users, well, not any that complained, so I won't rush it.

For the time being I'll tighten up the check so it compiles on Westmere out of the box, but without AES performance.
The min/max issue will be fixed in the next release.

I hope you'll be available to test my fixes. It must be tested on appropriate HW. AMD testers would also help.
newbie
Activity: 14
Merit: 0
joblo ....

Redid my set of changes on a clean copy of your 3.4.3 codebase.  With these changes it compiles on my westmere CPU with -march=westmere  Here are the diffs:

$ diff miner.h miner.h.orig
49a50,56
> #ifndef min
> #define min(a,b) (a>b ? b : a)
> #endif
> #ifndef max
> #define max(a,b) (a> #endif
>

$ diff algo/blake/decred.c algo/blake/decred.c.orig
9,10d8
< #define min(a,b) (a>b ? b : a)
<
$ diff algo/hodl/aes.c algo/hodl/aes.c.orig
85a86,87
> #ifdef __AVX__
>
149a152,178
>
> #else    // NO AVX
>
> static inline __m128i AES256Core(__m128i State, const __m128i *ExpandedKey)
> {
>         State = _mm_xor_si128(State, ExpandedKey[0]);
>
>         for(int i = 1; i < 14; ++i) State = _mm_aesenc_si128(State, ExpandedKey);
>
>         return(_mm_aesenclast_si128(State, ExpandedKey[14]));
> }
>
> void AES256CBC(__m128i *Ciphertext, const __m128i *Plaintext, const __m128i *ExpandedKey, __m128i IV, uint32_t BlockCount)
> {
>         __m128i State = _mm_xor_si128(Plaintext[0], IV);
>         State = AES256Core(State, ExpandedKey);
>         Ciphertext[0] = State;
>
>         for(int i = 1; i < BlockCount; ++i)
>         {
>                 State = _mm_xor_si128(Plaintext, Ciphertext[i - 1]);
>                 State = AES256Core(State, ExpandedKey);
>                 Ciphertext = State;
>         }
> }
>
> #endif
$ diff algo/hodl/hodl-wolf.c algo/hodl/hodl-wolf.c.orig
58a59
> #ifdef __AVX__
129a131,196
>
> #else  // no AVX
>
>     uint32_t *pdata = work->data;
>     uint32_t *ptarget = work->target;
>     uint32_t BlockHdr[22], FinalPoW[8];
>     CacheEntry *Garbage = (CacheEntry*)hodl_scratchbuf;
>     CacheEntry Cache;
>     uint32_t CollisionCount = 0;
>
>     swab32_array( BlockHdr, pdata, 20 );
>         // Search for pattern in psuedorandom data
>         int searchNumber = COMPARE_SIZE / opt_n_threads;
>         int startLoc = threadNumber * searchNumber;
>
>         for(int32_t k = startLoc; k < startLoc + searchNumber && !work_restart[threadNumber].restart; k++)
>         {
>            // copy data to first l2 cache
>            memcpy(Cache.dwords, Garbage + k, GARBAGE_SLICE_SIZE);
> #ifndef NO_AES_NI
>            for(int j = 0; j < AES_ITERATIONS; j++)
>            {
>                 CacheEntry TmpXOR;
>                 __m128i ExpKey[16];
>
>                 // use last 4 bytes of first cache as next location
>                 uint32_t nextLocation = Cache.dwords[(GARBAGE_SLICE_SIZE >> 2)
>                                    - 1] & (COMPARE_SIZE - 1); //% COMPARE_SIZE;
>
>                 // Copy data from indicated location to second l2 cache -
>                 memcpy(&TmpXOR, Garbage + nextLocation, GARBAGE_SLICE_SIZE);
>                 //XOR location data into second cache
>                 for( int i = 0; i < (GARBAGE_SLICE_SIZE >> 4); ++i )
>                    TmpXOR.dqwords = _mm_xor_si128( Cache.dqwords,
>                                                       TmpXOR.dqwords );
>                 // Key is last 32b of TmpXOR
>                 // IV is last 16b of TmpXOR
>
>                 ExpandAESKey256( ExpKey, TmpXOR.dqwords +
>                                  (GARBAGE_SLICE_SIZE / sizeof(__m128i)) - 2 );
>                 AES256CBC( Cache.dqwords, TmpXOR.dqwords, ExpKey,
>                         TmpXOR.dqwords[ (GARBAGE_SLICE_SIZE / sizeof(__m128i))
>                                                              - 1 ], 256 );                 }
> #endif
>            // use last X bits as solution
>            if( ( Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 1 ]
>                                          & (COMPARE_SIZE - 1) ) < 1000 )
>            {
>               BlockHdr[20] = k;
>               BlockHdr[21] = Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 2 ];
>               sha256d( (uint8_t *)FinalPoW, (uint8_t *)BlockHdr, 88 );
>               CollisionCount++;
>               if( FinalPoW[7] <= ptarget[7] )
>               {
>                   pdata[20] = swab32( BlockHdr[20] );
>                   pdata[21] = swab32( BlockHdr[21] );
>                   *hashes_done = CollisionCount;
>                   return(1);
>               }
>            }
>         }
>
>     *hashes_done = CollisionCount;
>     return(0);
>
> #endif
legendary
Activity: 1470
Merit: 1114
Hi, could you tell, whether assembled under windows x86 32-bit? Sorry for my English....

32 bit is not supported.
hero member
Activity: 700
Merit: 500
I have an update on supporting cryptonight at nicehash.

I implemented the changes and they seem to work and they don't break other pools so there was no need to
impmement pool-specific code.

My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems
to have stopped. The latest session is up to 36 accepts @ 100%, and counting.

I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count
of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself.
This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate.

This is what it looks like:

Code:
[2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s

More testing to do. 

thanks for this!
newbie
Activity: 5
Merit: 0
Hi, could you tell, whether assembled under windows x86 32-bit? Sorry for my English....
legendary
Activity: 1470
Merit: 1114
Yes --- corei7 is definitely running without AES_NI. 

For HODL, you are excluding a whole bunch of AES_NI code that doesn't require AVX to execute.  The only part of Wolf's implementation that requires AVX is the SHA512 function in the initial scratchpad generation routine.  If you take out the AVX checks and "non-AVX" code from the rest of the implementation in algo/hodl/aes.c and algo/hodl/hodl-wolf.c it compiles for westmere and runs just fine with AES-NI enabled.  Running 24 threads with no affinity on my server I'm seeing about 215H/s without AES and close to 375H/s average with the modified version to allow the AES_NI code to run.

I would need to see your code changes and I also want to see the compile errors. I need to make sure your changes don't break
other CPus.
newbie
Activity: 14
Merit: 0
Yes --- corei7 is definitely running without AES_NI. 

For HODL, you are excluding a whole bunch of AES_NI code that doesn't require AVX to execute.  The only part of Wolf's implementation that requires AVX is the SHA512 function in the initial scratchpad generation routine.  If you take out the AVX checks and "non-AVX" code from the rest of the implementation in algo/hodl/aes.c and algo/hodl/hodl-wolf.c it compiles for westmere and runs just fine with AES-NI enabled.  Running 24 threads with no affinity on my server I'm seeing about 215H/s without AES and close to 375H/s average with the modified version to allow the AES_NI code to run.
legendary
Activity: 1470
Merit: 1114
Joblo ---

I flattened the code int algo/hodl/aes.c and algo/hodl/hodl-wolf.c to remove the "non-AVX" code versions for everything but the SHA512 Function at the top of hodl-wolf.c and the code now compiles and runs for -march=westmere.

For cpuminer-corei7.exe from your download mining HODL to nicehash with 12 threads, isolated to the six cores on one CPU I am getting in the 120-130 H/s range performance

For cpuminer-westmere.exe that I compiled using the above modifications using the same configuration on the other CPU in my server I am seeing 240-250 H/s and it indicates AES optimizations ARE enabled.



This confirms that the corei7 build has AES disabled.
Jump to: