Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 156. (Read 444067 times)

legendary
Activity: 1470
Merit: 1114
hero member
Activity: 979
Merit: 510
legendary
Activity: 1470
Merit: 1114

I can show you a fun example of corrupting the stack deliberately with C in order to skip instructions

Wouldn't you need to use ASM to alter the stack pointer? Corrupting the contents is trivial and wouldn't
cause the stack check to crash unless it verifies the frames which would be too time consuming. Corrupt
stack contents would only crash on accessing a local variable, a subroutine call or on return.

Nope! I don't need to modify the stack pointer directly, I need to CONTROL it.

Now, if we know what the function epilogue is going to be (say a pop esp/rsp) - clobber the address on the stack using an invalid array index, and you've gained control of esp/rsp.

As I said modifying the stack contents is trivial and would cause a different crash than the one I'm seeing.
But it's irrelevent now as it no longer crashes after moving ctx to global.

This tangent is bringing back some old memories of writing opcode patches for a stack based processor. It was like using
an HP RPN calculator but was a bitch to keep the stack coherent.
legendary
Activity: 1470
Merit: 1114

I can show you a fun example of corrupting the stack deliberately with C in order to skip instructions

Wouldn't you need to use ASM to alter the stack pointer? Corrupting the contents is trivial and wouldn't
cause the stack check to crash unless it verifies the frames which would be too time consuming. Corrupt
stack contents would only crash on accessing a local variable, a subroutine call or on return.
legendary
Activity: 1470
Merit: 1114
For the future, gcc has a flag: -fstack-usage

when compiling it generates *.su files that have info how much bytes a function would need for stack.

After adding that, recompiling cpuminer-opt on core2 and then printing all *.su files, sorted by size:

Code:
$ find . -iname '*.su' -print0 | xargs -0 cat |sort -k2n|tail|column -t
x17.c:212:6:x17hash_alt                    3904     static
cpu-miner.c:2689:5:main                    4144     static
x17.c:87:13:x17hash                        4256     static
hodl-wolf.c:28:5:scanhash_hodl_wolf        4304     static
scrypt.c:696:12:scanhash_scrypt            7680     dynamic,bounded
hmq1725.c:143:13:hmq1725hash               7744     static
scrypt.c:648:13:scrypt_1024_1_1_256_24way  9088     dynamic,bounded
m7mhash.c:195:5:scanhash_m7m_hash          12464    dynamic,bounded
api.c:511:13:api                           17136    dynamic,bounded
cryptonight.c:172:6:cryptonight_hash_ctx   2097648  static

You might want to increase max stack size in makefile to 3mb or more, setting it to 2MB isn't enough because you need 2097648 bytes just for that function.

Thanks for the info. I increased the stack size in Makefile.am to 3 MB but it made no difference.
AES still produces rejects and non-aes still crashes.

There is apparently another problem with the AES version other than the stack overflow (that is a huge stack
compared with the other algos) because I solved that by reducing the local variables.

So the situation now for AES seems the same code with no superficial Windows hooks works on Linux but produces rejects
on Windows. By superficial I mean checks for Windows in cryptonight code. There may be some low level hooks
in common code also used by other algos.

The core2 build still crashes after moving ctx to global and increasing the stacksize to 3 MB in Makefile.am. It's either
the same crash or a different one. I haven't followed up because my focus is on AES first.
member
Activity: 83
Merit: 10
For the future, gcc has a flag: -fstack-usage

when compiling it generates *.su files that have info how much bytes a function would need for stack.

After adding that, recompiling cpuminer-opt on core2 and then printing all *.su files, sorted by size:

Code:
$ find . -iname '*.su' -print0 | xargs -0 cat |sort -k2n|tail|column -t
x17.c:212:6:x17hash_alt                    3904     static
cpu-miner.c:2689:5:main                    4144     static
x17.c:87:13:x17hash                        4256     static
hodl-wolf.c:28:5:scanhash_hodl_wolf        4304     static
scrypt.c:696:12:scanhash_scrypt            7680     dynamic,bounded
hmq1725.c:143:13:hmq1725hash               7744     static
scrypt.c:648:13:scrypt_1024_1_1_256_24way  9088     dynamic,bounded
m7mhash.c:195:5:scanhash_m7m_hash          12464    dynamic,bounded
api.c:511:13:api                           17136    dynamic,bounded
cryptonight.c:172:6:cryptonight_hash_ctx   2097648  static

You might want to increase max stack size in makefile to 3mb or more, setting it to 2MB isn't enough because you need 2097648 bytes just for that function.

Edit2: I couldn't find any documentation for --stack so I don't know what makefile is doing.

"-Wl," means "pass this to linker", which means ld, so you need to check the documentation of ld -- http://linux.die.net/man/1/ld

On Windows, stack size limit is specified in the binary. On Linux, the limit is set by system administrator. On current debian, default is 8Mb.

I found -fno-stack-limit but it still crashes at the same place.

This is different feature from above, and -fno-stack-limit is used to negate -fstack-limit-register/-fstack-limit-symbol, by default this feature is not set.
legendary
Activity: 1470
Merit: 1114
More cryptonight progress.

If I can't make the stack bigger use less.

I made the definition of ctx global instead of local and it doesn't crash. Just testing on a live pool
to confirm.

If all goes well I should have it fixed by the end of the day.

Edit: My optimism was premature. Although the AES version doesn't crash it produces only rejects.
And a non-AES compile still crashes.
legendary
Activity: 1470
Merit: 1114
But the problem is only on Windows, oh well.

msys comes with gdb, it should be able to catch the segfault, then you can inspect the registers, stack and variables.

Compiling with "-O0 -g3" instead of "-O3" should help gdb give you more info.

Tried that. Compile fails in another algo with -O0 asm has impossible constraints.

Use -ggdb3 and step through it.

Progress. If ___chkstk_ms is checking for stackoverflow then I know what the problem is.

Looking for solutions, looks like a Makefile.am edit.

Edit: doubling the stacksize spec in Makefile.am didn't work.

if HAVE_WINDOWS
#cpuminer_CFLAGS += -Wl,--stack,10485760
cpuminer_CFLAGS += -Wl,--stack,20971520
endif

Edit2: I couldn't find any documentation for --stack so I don't know what makefile is doing.

I found -fno-stack-limit but it still crashes at the same place.

I know it's crashing in ___chkstak_ms and I assume that means a stack overflow. A corrupt stack
pointer should not occur in compiled code.

From my experience the stack limit is fixed when the process is created, I'm not aware of any way to
increase that after process creation so there must be some overriding limit beyond the scope of the compiler.

Most of the info I found deals with limiting the stack, not growing it. It seems infinite recursion is a bigger
issue than legitimate large stack use.

legendary
Activity: 1470
Merit: 1114
But the problem is only on Windows, oh well.

msys comes with gdb, it should be able to catch the segfault, then you can inspect the registers, stack and variables.

Compiling with "-O0 -g3" instead of "-O3" should help gdb give you more info.

Tried that. Compile fails in another algo with -O0 asm has impossible constraints.
member
Activity: 83
Merit: 10
But the problem is only on Windows, oh well.

msys comes with gdb, it should be able to catch the segfault, then you can inspect the registers, stack and variables.

Compiling with "-O0 -g3" instead of "-O3" should help gdb give you more info.
legendary
Activity: 1470
Merit: 1114
Code:
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x695a): undefined reference to `__stack_chk_fail'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6a8c): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6c78): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6d61): undefined reference to `__stack_chk_fail'
c:/msys/opt/windows_64/bin/../lib64/gcc/x86_64-w64-mingw32/4.8.3/../../../../x86_64-w64-mingw32/bin/ld.exe: cpuminer-cpu-miner.o: bad reloc address 0x0 in section `.pdata'
collect2.exe: error: ld returned 1 exit status
make[2]: *** [cpuminer.exe] Error 1

I don't know what this means but it does mention pdata, an argument to the function that is failing. Cryptonight is
coded the same as every other algo and every algo hash function references pdata the same way.

It means that your build of gcc doesn't have stack protection support library (functions __stack_chk_fail() and __stack_chk_guard()). It builds fine on debian linux though.

But the problem is only on Windows, oh well.

member
Activity: 83
Merit: 10
Code:
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x695a): undefined reference to `__stack_chk_fail'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6a8c): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6c78): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6d61): undefined reference to `__stack_chk_fail'
c:/msys/opt/windows_64/bin/../lib64/gcc/x86_64-w64-mingw32/4.8.3/../../../../x86_64-w64-mingw32/bin/ld.exe: cpuminer-cpu-miner.o: bad reloc address 0x0 in section `.pdata'
collect2.exe: error: ld returned 1 exit status
make[2]: *** [cpuminer.exe] Error 1

I don't know what this means but it does mention pdata, an argument to the function that is failing. Cryptonight is
coded the same as every other algo and every algo hash function references pdata the same way.

It means that your build of gcc doesn't have stack protection support library (functions __stack_chk_fail() and __stack_chk_guard()). It builds fine on debian linux though.
legendary
Activity: 1470
Merit: 1114
You can enable gcc's stack protection.

-fstack-protector

Interesting, compile failed.

Code:
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x695a): undefined reference to `__stack_chk_fail'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6a8c): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6c78): undefined reference to `__stack_chk_guard'
cpuminer-cpu-miner.o:cpu-miner.c:(.text+0x6d61): undefined reference to `__stack_chk_fail'
c:/msys/opt/windows_64/bin/../lib64/gcc/x86_64-w64-mingw32/4.8.3/../../../../x86_64-w64-mingw32/bin/ld.exe: cpuminer-cpu-miner.o: bad reloc address 0x0 in section `.pdata'
collect2.exe: error: ld returned 1 exit status
make[2]: *** [cpuminer.exe] Error 1

I don't know what this means but it does mention pdata, an argument to the function that is failing. Cryptonight is
coded the same as every other algo and every algo hash function references pdata the same way.
member
Activity: 83
Merit: 10
You can enable gcc's stack protection.

-fstack-protector
legendary
Activity: 1470
Merit: 1114
Cryptonight update.

As previously mentioned cryptonight is broken on Windows from v3.3 to present. Prior to that Windows was
not supported. Linux is not affected.

I have localized the problem to a simple function call that goes bad and crashes the miner. The line before
the call is executed but the first line of the called function is never reached.

This is a simple call to a constant function, nothing fancy, no algo-gate function pointers, just a basic call to the hash
function. It is the exact same code that runs fine on Linux. There are no OS hooks anywhere to be found.

It was first discovered in the CMB prebuilt binaries but I can easilly reproduce it with a different compiler version.

It's difficult to wrap my head around this kind of problem, especially when it's OS specific. The crash suggests the
function address was invalid. It seems either the compiler messed up linking the function address or it got overwritten
after compilation. c/c++ always scares me with the lack of built in buffer overflow protection. I'm not used to working
without a net. But even if there is a buffer overflow corrupting a function why only on Windows?

I'll review all the data defined in that file for anything suspicious but after that I'll be pretty stuck.



I LOVE that about C/C++.

I've had religious debates with c/c++ proponents. My key argument is just to look at all the buffer overflow
exploits that have existed over the years and continue to exist. Never would have happened with array bounds
checking and no mixing of a[] and a*.

legendary
Activity: 1470
Merit: 1114

I'm volunteering for testing. Smiley
Can test on AMD (SSE2, no AVX) both Windows and Linux, aswell on Core i7-4790K (AVX) on Linux and AMD FX-7600P (AVX) on Windows.

Thanks, check your PM.

Reports from four machines send via PM.

Awesome report. It had everything I needed.

Everything looks good from the CPU capabilities check. I saw no inconsistencies.

The only issue I found was the mapping of -march=native on non-AES AMD CPUs. As a long known issue it is already
documented in README.md. I'll review your data in more detail and may update the documented workaround based
on the info you provided.

I think this gives me enough confidence about the CPU capabilities check to release it, but I'll wait a while to
give me a chance to look deeper into the cryptonight problem.

sr. member
Activity: 312
Merit: 250

I'm volunteering for testing. Smiley
Can test on AMD (SSE2, no AVX) both Windows and Linux, aswell on Core i7-4790K (AVX) on Linux and AMD FX-7600P (AVX) on Windows.

Thanks, check your PM.

Reports from four machines send via PM.
legendary
Activity: 3416
Merit: 1912
The Concierge of Crypto
E5640 here. Windows 10 64 bit in a VM. Still trying to figure out Debian, but hmage's simple instructions should help. If you have a compiled binary exe, I'll run it. If I need to compile something, I'll try.
legendary
Activity: 1470
Merit: 1114
Great job Joblo,

v.3.3.5 working like charm on Gainestown Xeon's with cpuminer-sse2.exe build.

Checking CPU capatibility...
Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
CPU features: SSE2 AVX AVX2
SW built on Jun  5 2016 with GCC 5.3.0
Build features: SSE2
Algo features: SSE2 AES
AES not available, starting mining with SSE2 optimizations...

Thanks for testing. The erroneous display of AVX and AVX2 support for the CPU should be fixed in the next
release. It's being tested now. Everything else looks good.
Jump to: