Stack trace of the instructions before (or after?) the misaligned access error instruction:
(cuda-gdb) x/10i $pc
=> 0x5555564d08e0 <_Z25keyFinderKernelWithDoubleii+58848>:������LD.E.SYS R0, [R22]
0x5555564d08f0 <_Z25keyFinderKernelWithDoubleii+58864>:
SHF.L.W.U32.HI R2, R3.reuse, 0x1a, R3.reuse
0x5555564d0900 <_Z25keyFinderKernelWithDoubleii+58880>:
ULOP3.LUT UR5, UR4, UR7, URZ, 0x3c, !UPT
0x5555564d0910 <_Z25keyFinderKernelWithDoubleii+58896>:
SHF.L.W.U32.HI R5, R3.reuse, 0x15, R3.reuse
0x5555564d0920 <_Z25keyFinderKernelWithDoubleii+58912>:
IMAD.MOV.U32 R28, RZ, RZ, c[0x3][0x35c]
0x5555564d0930 <_Z25keyFinderKernelWithDoubleii+58928>:
SHF.L.W.U32.HI R4, R3.reuse, 0x7, R3
0x5555564d0940 <_Z25keyFinderKernelWithDoubleii+58944>:
IMAD.MOV.U32 R14, RZ, RZ, c[0x3][0x3a4]
0x5555564d0950 <_Z25keyFinderKernelWithDoubleii+58960>:
LOP3.LUT R6, R3, c[0x3][0x374], RZ, 0xc, !PT
0x5555564d0960 <_Z25keyFinderKernelWithDoubleii+58976>:
ULOP3.LUT UR5, UR5, UR6, URZ, 0xc0, !UPT
0x5555564d0970 <_Z25keyFinderKernelWithDoubleii+58992>:
LOP3.LUT R5, R4, R2, R5, 0x96, !PT
I have no idea what sense to make out of the resulting CUDA disassembly (the error
cannot be reproduced in debug mode forcing me to rely on the instruction pointer $pc (program counter))
Ok here's a better disassembly:
(cuda-gdb) x/20i $pc-256
0x5555567b75e0 <_Z25keyFinderKernelWithDoubleii+58592>:������STL [R1+0x5c], R10
0x5555567b75f0 <_Z25keyFinderKernelWithDoubleii+58608>:������STL [R1+0x60], R12
0x5555567b7600 <_Z25keyFinderKernelWithDoubleii+58624>:
IMAD.MOV.U32 R4, RZ, RZ, R55
0x5555567b7610 <_Z25keyFinderKernelWithDoubleii+58640>:
IMAD.MOV.U32 R5, RZ, RZ, R54
0x5555567b7620 <_Z25keyFinderKernelWithDoubleii+58656>:������CALL.ABS.NOINC 0x0
0x5555567b7630 <_Z25keyFinderKernelWithDoubleii+58672>:������BSYNC B6
0x5555567b7640 <_Z25keyFinderKernelWithDoubleii+58688>:
IMAD.MOV.U32 R0, RZ, RZ, 0x2
0x5555567b7650 <_Z25keyFinderKernelWithDoubleii+58704>:
LOP3.LUT R0, R0, c[0x0][0x164], RZ, 0xfc, !PT
0x5555567b7660 <_Z25keyFinderKernelWithDoubleii+58720>:
ISETP.NE.AND P0, PT, R0, 0x2, PT
0x5555567b7670 <_Z25keyFinderKernelWithDoubleii+58736>:������@P0 BRA 0x17250
0x5555567b7680 <_Z25keyFinderKernelWithDoubleii+58752>:
IMAD R23, R60, 0x7, R53
0x5555567b7690 <_Z25keyFinderKernelWithDoubleii+58768>:
IMAD.MOV.U32 R3, RZ, RZ, c[0x3][0x36c]
0x5555567b76a0 <_Z25keyFinderKernelWithDoubleii+58784>:
IMAD.MOV.U32 R9, RZ, RZ, c[0x3][0x3a0]
0x5555567b76b0 <_Z25keyFinderKernelWithDoubleii+58800>:
ULDC UR4, c[0x3][0x364]
0x5555567b76c0 <_Z25keyFinderKernelWithDoubleii+58816>:
IMAD.WIDE R22, R23, 0x4, R78
0x5555567b76d0 <_Z25keyFinderKernelWithDoubleii+58832>:
ULDC.64 UR6, c[0x3][0x35c]
=> 0x5555567b76e0 <_Z25keyFinderKernelWithDoubleii+58848>:������LD.E.SYS R0, [R22]
0x5555567b76f0 <_Z25keyFinderKernelWithDoubleii+58864>:
SHF.L.W.U32.HI R2, R3.reuse, 0x1a, R3.reuse
0x5555567b7700 <_Z25keyFinderKernelWithDoubleii+58880>:
ULOP3.LUT UR5, UR4, UR7, URZ, 0x3c, !UPT
0x5555567b7710 <_Z25keyFinderKernelWithDoubleii+58896>:
SHF.L.W.U32.HI R5, R3.reuse, 0x15, R3.reuse
Now the faulty instruction is the one with the arrow on the left, and the last 20 or so instructions are the ones accessing bad memory.
We already know that the problem function is "doIterationWithDouble" so all searching should be done there
We just need to see how this CUDA Pseudo-C code translates into PTX assembly
Take three
Full disassembly of the CUDA function "_Z25keyFinderKernelWithDoubleii" (whatever that means!
) is available at
https://files.notatether.com/public/temp/bitcrack/gdblog.txt for anyone who's interested. It stands at 685KB large.
Matching the statements to their disassembled instructions may be our only chance of identifying the problematic regions of memory (the faulty statement has already been
found but we don't yet know where the misalignment is coming from).
This disassembly might also include functions other than doIterationWithDouble, because the entire kernel was dumped (so that means that anything compiled in the same nvcc command is potentially in there).
Most interesting is this line:
0x00005555567b76d0 <+58832>: ULDC.64 UR6, c[0x3][0x35c] <-- Crash location
It seems to be doing some 64 bit operation on an array in memory located at... 0x35c (bingo: 0xc is not a multiple of 64 bits = 0x8: there is our misaligned access error).
Now the question is which array does this correspond to.
small edit: fix broken link