OK, I had to swap around the #undef and #include statements
at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:
I'll correct that, thanks.
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1
Oh, I didn't thought about that. The assembly code is incompatible with the 64bit operative system.
Feel like we're getting somewhere.
I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:
FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL
Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:
while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 :
Implementation SSE2 is not allowed.
Actually I wanted to compare the SSE2 version with the same CPU.
If you can't use a 32 bit OS, I guess I can repeat some of the charts and compare results independently.
After removing the SSE2 version, to not having those errors it is necessary to change the running main files (chronoBuffers.cpp, chronoConnections.cpp, chronoFunctions.cpp) to remove the SSE2 option, but those errors aren't really important.
Anyway, the changes would be something like this:
In chronoBuffers.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION, 3, IT_C, IT_SSE2, IT_CUDA);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA);
In chronoConnections.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 4, IT_C, IT_SSE2, IT_CUDA, IT_CUDA_REDUC, IT_CUDA_INV);
Maintain
linesLoop.addInnerLoop(new EnumLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA));
for
chronoFunctions.cpp
However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:
while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 :
The maximum float input size is 4032.
Complete output from this run has been
pasted here.
Don't worry, that's expected. One CUDA implementation doesn't allow certain sizes.
I don't understand why it says
Cmake: *** [cuda_emu] Interrupt
By default it should be [all] and not [cuda_emu] but it doesn't matter.
The cuda part is compiled without the "--device-emulation" so everything's fine:
(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).
Sorry for not answering earlier. I would prefer to run this with a 32 bit OS and the SSE2 version, but again, if that's not possible, I'll work it out somehow.
But the other errors are fine, is just an implementation which is more limited trying to run greater sizes for the layers than it can. That's expected.
Thank you again for all your effort and patience.