Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.
AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.
What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.
I have some systems lying around with AMD CPUs. I'll see what I've got that is running and run some tests if I can.
That would be nice.
I'm a little confused about your compile problem related to AES256CBC. The min/max issue is resolved.
In looking at the code more closely, it took a while to remember what I was thinking when I made those changes,
I realized the AVX checks were intended to seperate the original Wolf AES optimizations from the recent Optiminer
AVX enhancements. I assumed all the optiminer code required AVX so if it was not available the compiler would revert to
the original Wolf code which was AES enhanced.
The way it is coded only one instance of AES256CBC should be compiled, either the new Optiminer version or the Wolf version.
I really would like to see your compile errors to understand this better. I need to understand the compile error. The code from
3.4.3 should compile the Wolf code on your CPU.
The AVX checks in hodl-wolf make the assumption that if AVX is present AES is also present. They are present to seperate the
original Wolf code from the Optimier code. The AES checks are only to prevent compile errors on non-AES CPUs. None of the
Wolf code is actually run on a non-AES CPU. Perhaps I should block it all out if AES isn't available.
The intended result is:
AES+AVX: run Optiminer modded code in hodl-wolf.c and aes.c.
AES only: run all Wolf code in hodl-wolf.c and aes.c.
no AES: run the unoptimized c++ code.
That was based on assumptions. You now have some actual data from a CPU with AES but not AVX.
Your data shows that only the Optiminer code in GenerateGarbageCore contains AVX code. The remainder
of the Optiminer code will run on your AES-only Westmere.
This raises another question. Is the Optiminer AES code in aes.c and scanhash_hodl_wolf faster than the corresponding
pure Wolf code? Since you weren't able to compile the code as released it points back to understanding why it didn't
compile. Once it does you can test both and I can implement it whichever is faster.
I know I'm pushy and I know it's a lot of work but it's rare to find a Westmere owner willing and able to do some dirty work.
I really appreciate your help.