Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 155. (Read 444067 times)

member
Activity: 81
Merit: 1002
It was only the wind.

I can show you a fun example of corrupting the stack deliberately with C in order to skip instructions

Wouldn't you need to use ASM to alter the stack pointer? Corrupting the contents is trivial and wouldn't
cause the stack check to crash unless it verifies the frames which would be too time consuming. Corrupt
stack contents would only crash on accessing a local variable, a subroutine call or on return.

Nope! I don't need to modify the stack pointer directly, I need to CONTROL it.

Now, if we know what the function epilogue is going to be (say a pop esp/rsp) - clobber the address on the stack using an invalid array index, and you've gained control of esp/rsp.
newbie
Activity: 35
Merit: 0
When i try script:n with n=17, I get unknown algo.

I am trying to mine hempcoin/hmp outside of the wallet/purse. Still trying to find the right setup. Scryptjane loads but I get stratum_recv_line failed error  when pulling from server setup purse. So thought maybe scrypt:17 but that errors out as unknown also.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.3.6 is released. Windows binaries now available.

Cryptonight on Windows is fixed.

Fixed reporting of AVX support on startup.

Mergerd bench test from TPruvot fork.

https://drive.google.com/file/d/0B0lVSGQYLJIZZWctdjQtUmR2NW8/view?usp=sharing

Windows binaries
http://cryptomining-blog.com/wp-content/download/cpuminer-opt-3-3-6-windows.zip
member
Activity: 81
Merit: 1002
It was only the wind.
But the problem is only on Windows, oh well.

msys comes with gdb, it should be able to catch the segfault, then you can inspect the registers, stack and variables.

Compiling with "-O0 -g3" instead of "-O3" should help gdb give you more info.

Tried that. Compile fails in another algo with -O0 asm has impossible constraints.

Use -ggdb3 and step through it.

Progress. If ___chkstk_ms is checking for stackoverflow then I know what the problem is.

Looking for solutions, looks like a Makefile.am edit.

Edit: doubling the stacksize spec in Makefile.am didn't work.

if HAVE_WINDOWS
#cpuminer_CFLAGS += -Wl,--stack,10485760
cpuminer_CFLAGS += -Wl,--stack,20971520
endif

Edit2: I couldn't find any documentation for --stack so I don't know what makefile is doing.

I found -fno-stack-limit but it still crashes at the same place.

I know it's crashing in ___chkstak_ms and I assume that means a stack overflow. A corrupt stack
pointer should not occur in compiled code.

From my experience the stack limit is fixed when the process is created, I'm not aware of any way to
increase that after process creation so there must be some overriding limit beyond the scope of the compiler.

Most of the info I found deals with limiting the stack, not growing it. It seems infinite recursion is a bigger
issue than legitimate large stack use.



I can show you a fun example of corrupting the stack deliberately with C in order to skip instructions
legendary
Activity: 1470
Merit: 1114

Tried -std=gnu11.

he is using Arch linux, so i guess gcc 6.1.1

Downgraded to compile this.

As long as you're compiling for AES you don't need GRS, just rip it out.

If he's compiling for AES he shouldn't even need to have non-AES version compiled at all.

Correct, but I haven't hooked it out because it was tedious work and wasn't causing any problems.
It might make the compile a little faster but that's trivial.

The entire groestl/sse2 dir can be deleted with only one linked source file to be removed from Makefile.am.
That will work for an AES compile*. To make it compile for SSE2 replace all the references to the GRS macros
in the algo files with the SPH version. In some cases the code is still there commented out.

*I should qualify that. I don't recall if I have all the GRS refs hooked out of an AES compile.

I'm considering doing that permanently as the benefit of GRS over SPH seems minimal.
member
Activity: 83
Merit: 10

Tried -std=gnu11.

he is using Arch linux, so i guess gcc 6.1.1

Downgraded to compile this.

As long as you're compiling for AES you don't need GRS, just rip it out.

If he's compiling for AES he shouldn't even need to have non-AES version compiled at all.
legendary
Activity: 1470
Merit: 1114

Tried -std=gnu11.

he is using Arch linux, so i guess gcc 6.1.1

Downgraded to compile this.

As long as you're compiling for AES you don't need GRS, just rip it out.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
he is using Arch linux, so i guess gcc 6.1.1
legendary
Activity: 1470
Merit: 1114
member
Activity: 81
Merit: 1002
It was only the wind.
But the problem is only on Windows, oh well.

msys comes with gdb, it should be able to catch the segfault, then you can inspect the registers, stack and variables.

Compiling with "-O0 -g3" instead of "-O3" should help gdb give you more info.

Tried that. Compile fails in another algo with -O0 asm has impossible constraints.

Use -ggdb3 and step through it.
legendary
Activity: 1470
Merit: 1114
legendary
Activity: 1470
Merit: 1114
tx, but yep the compiled/cpu flags was made fast

also the sysinfo could be cleaned, to get the cpu model from cpu id all the time... just dont had amd cpus to test it so i left the /proc/cpuinfo for now (better for arm or other weird SoC also)

configure.ac is what generate cpuminer-config.h on linux and mingw, the first line change the package defines

I moved the existing code to get the cpu model from cpu-miner.c to sysinfos.c, it's called cpu_brand_string. It only uses cpuid.
I know it works on AMD, I just have to add the ARM hook.

I'm also converting the has_* functions to inline for local usage with a wrapper for external use. It will avoid all function
calls/returns in cpu_bestfeature and any other code in sysinfos.c that wants to quickly check a specific feature.

I'm going to play with confgure.ac a bit to understand it better. If it works like make using timestamps I may be preventing
regenerating configure by manually editting it. Considering the only change I've made to configure is the package version
the untouched configure.ac should still be good and I should be able to go back to doing it the right way.



legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
tx, but yep the compiled/cpu flags was made fast

also the sysinfo could be cleaned, to get the cpu model from cpu id all the time... just dont had amd cpus to test it so i left the /proc/cpuinfo for now (better for arm or other weird SoC also)

configure.ac is what generate cpuminer-config.h on linux and mingw, the first line change the package defines
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.3.6 is released.

Cryptonight on Windows is fixed.

Fixed reporting of AVX support on startup.

Mergerd bench test from TPruvot fork.

https://drive.google.com/file/d/0B0lVSGQYLJIZZWctdjQtUmR2NW8/view?usp=sharing

Watch Cryptomining Blog for updated binaries.

legendary
Activity: 1470
Merit: 1114
@TPruvot

I've finished merging the bench code with the following changes.

I haven't implemented the configure.ac change yet. I need to understand better. Currently this file is not
used so the change would have no effect, unless of course there is some magic using timestamps.
And if there is I want to understand it before I mess with it.

I've defined more symbolics in sysinfos.c for register access to improve readability.

I've implemented the AVX1 check using a mask. I've defined a mask for FM3 but not implemented
a function to use it
.

Considering the cryptonight fix is in the upcoming release I don't want to wait too long.

Edit: Wrote functions to detect all features.
Rewrote cpu_bestfeature to use functions instead of reading flags directly.
Only tested functions I use.

Here's a link to a tool I used as a guide. Very complete and detailed.

https://bitbucket.org/ariya/cpu-detect/src
legendary
Activity: 1470
Merit: 1114
I took a quick look at the patch but I want to pursue the cryptonight issue a bit more first, got an idea.

Before I implememnt the patch I need to understand the changes. I am particularly concerned about
the change to configure.ac. Given this file isn't being used currently by cpuminer-opt what will this do?
It works currently so trying to fix something that isn't broken can often break it.

Is this truly a diif vs 3.3.5? If so it' won't have some changes so the cpuid functions which didn't work correctly.
I moved the boolean algebra out of the bit definitions to the functions to keep the bit definitions pure.
I also defined symbolic names for the register array indexes, Some of the has_feature functions don't work in
3.3.5. As long as your changes don't rely on these bugs they should be ok with my changes.

Will touch base when I dig deeper into implementing the changes.

I've merged most of the changes but have some concerns.

I would still like an explanation regarding configure.ac. I don't want to break it, I've done enough of that already
when I don't fully understand what I'm doing.

I have changed the implementation of the flags. The flags are defined as per the HW definition, not a mask that defines the feature
they represent. This applies primarilly to AVX1 which requires 3 bits to define the feature.

Code:
// http://en.wikipedia.org/wiki/CPUID
#define EAX_Reg  (0)
#define EBX_Reg  (1)
#define ECX_Reg  (2)
#define EDX_Reg  (3)

#define XSAVE_Flag    (1 << 26)
#define OSXSAVE_Flag  (1 << 27)
//#define AVX1_Flag    ((1 << 28)|OSXSAVE_Flag)
#define AVX1_Flag     (1 << 28)
#define XOP_Flag      (1 << 11)
#define FMA3_Flag    ((1 << 12)|AVX1_Flag|OSXSAVE_Flag)
#define AES_Flag      (1 << 25)
#define SSE42_Flag    (1 << 20)

#define SSE_Flag      (1 << 25) // EDX
#define SSE2_Flag     (1 << 26) // EDX

#define AVX2_Flag     (1 << 5) // ADV EBX

(stuff snipped)

// westmere and above
bool has_avx1()
{
#ifdef __arm__
        return false;
#else
        int cpu_info[4] = { 0 };
        cpuid( 1, cpu_info );
        return ( (  cpu_info[ ECX_Reg ] & AVX1_Flag    ) != 0 )
            && ( ( cpu_info[ ECX_Reg ] & XSAVE_Flag   ) != 0 )
            && ( ( cpu_info[ ECX_Reg ] & OSXSAVE_Flag ) != 0 );
#endif
}

The current implementation or ORing the bits to create a mask doesn't work as intended.
It will return true if any of the mask bits are set, It should only return true if
all the mask bits are set. AVX1 is only available if all three bits are set.

To do this with your mask you would need to  do:

Code:
has_avx = ( ( cpu_info[ ECX_Reg ] & AVX1_Flag ) == AVX1_Flag ); 

The same issue exists with FMA3 but I don't use it so didn't change it.

Edit:

I defined masks for AVX1 and FMA3:

Code:
#define AVX1_mask      (AVX1_Flag|XSAVE_Flag|OSXSAVE_Flag)
#define FMA3_mask     (FMA3_Flag|AVX1_mask)

bool has_avx1()
{
#ifdef __arm__
        return false;
#else
        int cpu_info[4] = { 0 };
        cpuid( 1, cpu_info );
        return ( ( cpu_info[ ECX_Reg ] & AVX1_mask ) == AVX1_mask );
#endif
}

legendary
Activity: 1470
Merit: 1114
I have a solution for cryptonight on Windows.

Moving the ctx from local to global was the right fix but I neglected to make it thread safe. As a result
each thread was clobbering the same ctx. need to do some regression testing but my optimism has
returned.

Once that issue is closed I will merge the TPruvot bench test  and release v3.3.6.
legendary
Activity: 1470
Merit: 1114
I took a quick look at the patch but I want to pursue the cryptonight issue a bit more first, got an idea.

Before I implememnt the patch I need to understand the changes. I am particularly concerned about
the change to configure.ac. Given this file isn't being used currently by cpuminer-opt what will this do?
It works currently so trying to fix something that isn't broken can often break it.

Is this truly a diif vs 3.3.5? If so it' won't have some changes so the cpuid functions which didn't work correctly.
I moved the boolean algebra out of the bit definitions to the functions to keep the bit definitions pure.
I also defined symbolic names for the register array indexes, Some of the has_feature functions don't work in
3.3.5. As long as your changes don't rely on these bugs they should be ok with my changes.

Will touch base when I dig deeper into implementing the changes.
member
Activity: 81
Merit: 1002
It was only the wind.
Cryptonight update.

As previously mentioned cryptonight is broken on Windows from v3.3 to present. Prior to that Windows was
not supported. Linux is not affected.

I have localized the problem to a simple function call that goes bad and crashes the miner. The line before
the call is executed but the first line of the called function is never reached.

This is a simple call to a constant function, nothing fancy, no algo-gate function pointers, just a basic call to the hash
function. It is the exact same code that runs fine on Linux. There are no OS hooks anywhere to be found.

It was first discovered in the CMB prebuilt binaries but I can easilly reproduce it with a different compiler version.

It's difficult to wrap my head around this kind of problem, especially when it's OS specific. The crash suggests the
function address was invalid. It seems either the compiler messed up linking the function address or it got overwritten
after compilation. c/c++ always scares me with the lack of built in buffer overflow protection. I'm not used to working
without a net. But even if there is a buffer overflow corrupting a function why only on Windows?

I'll review all the data defined in that file for anything suspicious but after that I'll be pretty stuck.



I LOVE that about C/C++.
Jump to: