I took a quick look at the patch but I want to pursue the cryptonight issue a bit more first, got an idea.
Before I implememnt the patch I need to understand the changes. I am particularly concerned about
the change to configure.ac. Given this file isn't being used currently by cpuminer-opt what will this do?
It works currently so trying to fix something that isn't broken can often break it.
Is this truly a diif vs 3.3.5? If so it' won't have some changes so the cpuid functions which didn't work correctly.
I moved the boolean algebra out of the bit definitions to the functions to keep the bit definitions pure.
I also defined symbolic names for the register array indexes, Some of the has_feature functions don't work in
3.3.5. As long as your changes don't rely on these bugs they should be ok with my changes.
Will touch base when I dig deeper into implementing the changes.
I've merged most of the changes but have some concerns.
I would still like an explanation regarding configure.ac. I don't want to break it, I've done enough of that already
when I don't fully understand what I'm doing.
I have changed the implementation of the flags. The flags are defined as per the HW definition, not a mask that defines the feature
they represent. This applies primarilly to AVX1 which requires 3 bits to define the feature.
// http://en.wikipedia.org/wiki/CPUID
#define EAX_Reg (0)
#define EBX_Reg (1)
#define ECX_Reg (2)
#define EDX_Reg (3)
#define XSAVE_Flag (1 << 26)
#define OSXSAVE_Flag (1 << 27)
//#define AVX1_Flag ((1 << 28)|OSXSAVE_Flag)
#define AVX1_Flag (1 << 28)
#define XOP_Flag (1 << 11)
#define FMA3_Flag ((1 << 12)|AVX1_Flag|OSXSAVE_Flag)
#define AES_Flag (1 << 25)
#define SSE42_Flag (1 << 20)
#define SSE_Flag (1 << 25) // EDX
#define SSE2_Flag (1 << 26) // EDX
#define AVX2_Flag (1 << 5) // ADV EBX
(stuff snipped)
// westmere and above
bool has_avx1()
{
#ifdef __arm__
return false;
#else
int cpu_info[4] = { 0 };
cpuid( 1, cpu_info );
return ( ( cpu_info[ ECX_Reg ] & AVX1_Flag ) != 0 )
&& ( ( cpu_info[ ECX_Reg ] & XSAVE_Flag ) != 0 )
&& ( ( cpu_info[ ECX_Reg ] & OSXSAVE_Flag ) != 0 );
#endif
}
The current implementation or ORing the bits to create a mask doesn't work as intended.
It will return true if
any of the mask bits are set, It should only return true if
all the mask bits are set. AVX1 is only available if all three bits are set.
To do this with your mask you would need to do:
has_avx = ( ( cpu_info[ ECX_Reg ] & AVX1_Flag ) == AVX1_Flag );
The same issue exists with FMA3 but I don't use it so didn't change it.
Edit:
I defined masks for AVX1 and FMA3:
#define AVX1_mask (AVX1_Flag|XSAVE_Flag|OSXSAVE_Flag)
#define FMA3_mask (FMA3_Flag|AVX1_mask)
bool has_avx1()
{
#ifdef __arm__
return false;
#else
int cpu_info[4] = { 0 };
cpuid( 1, cpu_info );
return ( ( cpu_info[ ECX_Reg ] & AVX1_mask ) == AVX1_mask );
#endif
}