Author

Topic: Run my program in your Nvidia GPU (for bitcoins) (Read 4648 times)

legendary
Activity: 2730
Merit: 1034
Needs more jiggawatts
Wow, Skynet in the making Cheesy
legendary
Activity: 1372
Merit: 1002
Ok, I found out why the initial makefile didn't work for you. A stupid thing.  Embarrassed
You have to call "make all" explicitly and not just "make".
I was used to compile from eclipse and it put "all" there by default, but from console I did never actually tried just "make". Usually "make sse2" or "make all_emu". I should have noticed that before and you wouldn't had to take CXX_BASE out of the conditions. I thought "all" was in $(MAKECMDGOALS) by default when you call just "make". And don't know why I thought I tested it. My fault, sorry.
I've prepared this other makefile for you:

http://content.wuala.com/contents/jtimon/temp/MakefileForLuceo?dl=1

use "make cuda" to avoid the sse2 stuff

For trying the multilib thing, use this one:

http://content.wuala.com/contents/jtimon/temp/MakefileForLuceoSse2?dl=1

for this one use "make all"

I think that the permissive flag may not be necessary after all, but maybe that's unrelated.
Can you try removing that too (from the first one if it works)?
legendary
Activity: 1372
Merit: 1002
I'm not sure how difficult it will be to get this working, and how that compares to the difficulty of porting the assembly code.

Uff, after reading some webs and manuals, and trying it for less than half an hour I give up. Porting the assembly seems to really be a pain. I thought 64 bits wasn't going to be that different. Besides I don't touch that code for years and although it's documented, reading assembly...

Please, try to put the -m32 flag on nvcc too. If that doesn't work, I'll search for the new error (if it's different) or I'll use separate charts to compare the performance of each implementation.
sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
I tried the Makefile you PM'd me, got this output.

I tried modifying my 'working' Makefile (the one I sent you output from for CUDA), but no luck:

Code:
(3:648)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/util.cpp -o build/common/util.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/interface.cpp -o build/neural/interface.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/connection.cpp -o build/neural/connection.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/layer.cpp -o build/neural/layer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/task.cpp -o build/genetic/task.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/population.cpp -o build/genetic/population.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/game/board.cpp -o build/game/board.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/plot.cpp -o build/loop/plot.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/loop.cpp -o build/loop/loop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/test.cpp -o build/loop/test.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -L/opt/cuda-toolkit/lib64 -lcudart  build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe
/usr/bin/ld: i386 architecture of input file `build/test/testMemoryLosses.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/chronometer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/dummy.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/enumerations.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/util.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/optimization/factory.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/interface.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/connection.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/neuralNet.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/layer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/inputLayer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/buffer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/individual.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/task.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/population.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/game/reversiBoard.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/game/board.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/reversiTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/binaryTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/classificationTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/rangeLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/enumLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/genericPlotter.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/plot.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/loop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/joinEnumLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/test.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/parametersMap.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loopTest/taskPlotter.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loopTest/chronoPlotter.o' is incompatible with i386:x86-64 output
build/test/testMemoryLosses.o: In function `testPopulation(ParametersMap*)':
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:65: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:66: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:67: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:81: undefined reference to `operator new(unsigned int)'
build/test/testMemoryLosses.o: In function `main':
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:103: undefined reference to `operator new(unsigned int)'
build/test/testMemoryLosses.o:/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:106: more undefined references to `operator new(unsigned int)' follow
build/common/dummy.o: In function `std::basic_string, std::allocator > std::operator+, std::allocator >(char const*, std::basic_string, std::allocator > const&)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/bits/basic_string.tcc:702: undefined reference to `std::string::reserve(unsigned int)'
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/bits/basic_string.tcc:703: undefined reference to `std::string::append(char const*, unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator >::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/optimization/factory.o: In function `Buffer* func_newBuffer<(BufferType)0, float>(unsigned int, ImplementationType)':
/home/luceo/Downloads/preann/src/optimization/configFactory.h:28: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/optimization/configFactory.h:37: undefined reference to `operator new(unsigned int)'
build/optimization/factory.o:/home/luceo/Downloads/preann/src/optimization/configFactory.h:48: more undefined references to `operator new(unsigned int)' follow
build/optimization/factory.o: In function `XmmConnection<(BufferType)2, unsigned int>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)1, unsigned int>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)3, unsigned char>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)0, float>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/neural/neuralNet.o: In function `NeuralNet::addLayer(unsigned int, BufferType, FunctionType)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:33: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `NeuralNet::addInputLayer(Interface*)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:49: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `NeuralNet::load(_IO_FILE*)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:196: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:199: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o:/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: more undefined references to `operator new(unsigned int)' follow
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

I do have a lot of the stuff from multilib installed (I'm a gamer, so have to have a lot of it installed).

I'm not sure how difficult it will be to get this working, and how that compares to the difficulty of porting the assembly code.
legendary
Activity: 2128
Merit: 1073
I'm using Arch Linux x86_64.
I just checked that Arch Linux has support for multilib. This means that 64-bit OS can run 32-bit programs, provided that:

1) the multilib support packages are installed;
2) gcc/g++ are invoked with -m32 flag.

So there's no need to laboriously rewrite the assembly code. All you need to do is modify the makefiles.

Have fun.
legendary
Activity: 1372
Merit: 1002
I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.

In my old computer with poor communication between CPU and GPU memories the SSE2 implementation was actually superior. That's what I didn't want to show to my teachers. But yes, I can compare SSE2 and CUDA against C separately.
But if I find easy to port the assembly code to 64 bits I'll do it just to make it nicer.
sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.
legendary
Activity: 1372
Merit: 1002
OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

I'll correct that, thanks.

Code:
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Oh, I didn't thought about that. The assembly code is incompatible with the 64bit operative system.

Feel like we're getting somewhere. Tongue I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:
FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:
while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 :
Implementation SSE2 is not allowed.

Actually I wanted to compare the SSE2 version with the same CPU.
If you can't use a 32 bit OS, I guess I can repeat some of the charts and compare results independently.
After removing the SSE2 version, to not having those errors it is necessary to change the running main files (chronoBuffers.cpp, chronoConnections.cpp, chronoFunctions.cpp) to remove the SSE2 option, but those errors aren't really important.
Anyway, the changes would be something like this:

In chronoBuffers.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION, 3, IT_C, IT_SSE2, IT_CUDA);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA);

In chronoConnections.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 4, IT_C, IT_SSE2, IT_CUDA, IT_CUDA_REDUC, IT_CUDA_INV);

Maintain
linesLoop.addInnerLoop(new EnumLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA));
for
chronoFunctions.cpp

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:
while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 :
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Don't worry, that's expected. One CUDA implementation doesn't allow certain sizes.
I don't understand why it says
Cmake: *** [cuda_emu] Interrupt

By default it should be [all] and not [cuda_emu] but it doesn't matter.
The cuda part is compiled without the "--device-emulation" so everything's fine:

Code:
(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o


Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).

Sorry for not answering earlier. I would prefer to run this with a 32  bit OS and the SSE2 version, but again, if that's not possible, I'll work it out somehow.
But the other errors are fine, is just an implementation which is more limited trying to run greater sizes for the layers than it can. That's expected.

Thank you again for all your effort and patience.
sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

Code:
(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/interface.cpp -o build/neural/interface.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/connection.cpp -o build/neural/connection.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/layer.cpp -o build/neural/layer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/task.cpp -o build/genetic/task.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/population.cpp -o build/genetic/population.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/board.cpp -o build/game/board.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/plot.cpp -o build/loop/plot.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/loop.cpp -o build/loop/loop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/test.cpp -o build/loop/test.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -L/opt/cuda-toolkit/lib64 -lcudart  build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/sse2_code.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Feel like we're getting somewhere. Tongue I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:
FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:
while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 :
Implementation SSE2 is not allowed.

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:
while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 :
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).
legendary
Activity: 1372
Merit: 1002
No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

...

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:
#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Thank you. I'll find out what this means. Probably desabling new stuff by default or something.

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

...

That happens when you change things and you only test your "improvements" with the emulator. Sorry again.
I've uploaded the file again to the repository with a change to try to solve that (and your change).
Can you try it again?

sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

Code:
(3:600)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_00006ce2_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:
#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

Code:
(3:602)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)2> ") is not allowed

34 errors detected in the compilation of "/tmp/tmpxft_00006d0d_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2
legendary
Activity: 1372
Merit: 1002
Sorry about all the problems you're having. I've compiled the program with older software and this is the result.
Being CUDA propietary software, when they decided to remove the emulator I got stuck with version 2.3 which means g++ 4.3 and I'm using make 3.81.
The permisive flag seems to solve the problems with the newer g++ and you've hacked the makefile to make it work with the newer make.
I forgot that the path for nvcc is different in newer versions.

It doesn't run well because it hasn't compiled correctly. I don't even know why the executables are created.

I've changed cuda_code.cu to try to solve those problems. You can get it here:
http://preann.svn.sourceforge.net/viewvc/preann/preann/src/optimization/cuda_code.cu?view=log

If you have more problems, please, post them. Maybe the easiest solution (if you can get a live CD with g++-4.3) is to install the legacy version of CUDA: http://developer.nvidia.com/cuda-toolkit-23-downloads

Thank you for taking the time to try this.
sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
I've had a look at the code and I'm happy running this on my machine 'naked'.

A LiveCD wouldn't affect performance as I'd reboot into it and everything'd be running in RAM, but getting all the software on one would be a pain.

I'm using Arch Linux x86_64.

Edit: CXX_BASE isn't set by default, set that. Next issue:

(3:527)~ make
g++ -ggdb -I src/   -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -ggdb -I src/   -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -ggdb -I src/   -c src/common/dummy.cpp -o build/common/dummy.o
g++ -ggdb -I src/   -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -ggdb -I src/   -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: error: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
make: *** [build/common/util.o] Error 1

Set CXX_BASE to -fpermissive, not sure if this'll affect the test.

Also have to change the cuda directory. Makefile looks pretty different.

Hmm, didn't run right, any ideas?

...

EDIT 2: OK, got a little further. Your 'if' loop in the Makefile was being ignored, so I manually set these in the Makefile:

FACT_OBJ = $(FULL_OBJ)
FACT_FLAGS += -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL

Then I added my CUDA 'bin' path to PATH variable.

Now, I get the following when I try to compile:

Code:
(3:587)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

src/optimization/cuda_code.cu(60): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, unsigned int)"
            function "min(unsigned int, unsigned int)"
            argument types are: (unsigned long, unsigned int)

src/optimization/cuda_code.cu(175): error: more than one instance of overloaded function "min" matches the argument list:
            function "min(int, unsigned int)"
            function "min(unsigned int, unsigned int)"
            argument types are: (unsigned long, unsigned int)

4 errors detected in the compilation of "/tmp/tmpxft_000046f1_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

Any suggestions? Please PM me, I'd like to get this running as I've already spent a fair amount of time hacking at the Makefile to get it to.
legendary
Activity: 1372
Merit: 1002
I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.

I guess CentOS will have g++ and gnuplot in its repositories, but not so sure about nasm (the assembler). I've develeoped everything under ubuntu but it shouldn't be a problem. I also guess Sandy Bridge is compatible with SSE2 and the XMM co-processor, after all is intel.
Don't you know what GPU model do you have? 96 cores will be fine anyway, but I'm not sure if your machine is better or worse than speedmann's.

What does your program do? Mining? Or are you cracking government passwords or something sketchy?

It has different names: Evolutionary artificial neural networks, neuro-evolution...Is basically a machine learning technique combining neural networks and genetic algorithms.
What does my program learn?
To correctly calculate AND or OR from 2 bit vectors (very basic classification tasks), XOR (harder, you need NN with more than 1 layer) and to play reversi/othello, a board game.
sr. member
Activity: 471
Merit: 256
What does your program do? Mining? Or are you cracking government passwords or something sketchy?
newbie
Activity: 10
Merit: 0
I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.
legendary
Activity: 1372
Merit: 1002
Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad) with Intel Core i3 370M Processor?
If yes, please contact me Wink

Thanks for you offer.
Luceo's GTX 570 was more attractive but I'm not sure if he's still willing to do it.
If you don't mind I'll wait until next week for his answer (or others) and tell you then, ok?
Even if your version is mobile, still has 96 cores instead of my 32 and the CPU has 2 cores instead of one (so system processes shouldn't get in the way).
If no one says anything else, you're the winner for now Smiley
newbie
Activity: 63
Merit: 0
Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad) with Intel Core i3 370M Processor?
If yes, please contact me Wink
legendary
Activity: 1372
Merit: 1002
I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

Great!! That should be more than enough.
It doesn't need internet access but I'm afraid of the impact on performance that a live CD could have, since the purpose is precisely to measure performance.
The project may be large, but you won't be actually running all of it.
The tasks, game, genetic, etc, parts aren't necessary.
If you use the version here, the makefile is prepared to just compile and run chronoBuffers, chronoConnections and chronoFunctions (I forgot to include this one the last time I put the thing prepared there) and these main files don't use the whole code. I think it's just neural, optimization, tamplate, common and loop parts plus ChronoPlotter and these main .cpp files. But you can be sure following the includes.

If the reason you want to use the live CD is that you're concern with what the program can do, we could first run a test with the live CD and small values for the loops so you can be sure that it won't broke anything.
Wait...that won't work because you need to install nasm, nvcc and gnuplot first and you can't install things using the liveCD, right?

Probably the safest solution for you is then to run it in a separate partition, but that will make you spend more time.
I'm open to other suggestions but I fear the liveCD won't be a feasible solution.
legendary
Activity: 1493
Merit: 1003
I have an onboard GT 8600, are you interested?

I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible.
Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep),
 so maybe I'm interested in your offer if no one else shows up.
My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway.

Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.


Anytime you need, you're welcome!

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

That's a great idea!
sr. member
Activity: 350
Merit: 250
Per aspera ad astra!
I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.
legendary
Activity: 1372
Merit: 1002
I have an onboard GT 8600, are you interested?

I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible.
Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep),
 so maybe I'm interested in your offer if no one else shows up.
My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway.

Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.
legendary
Activity: 1493
Merit: 1003
I have an onboard GT 8600, are you interested?
legendary
Activity: 1372
Merit: 1002
Do you have a relatively new Nvidia graphic card (must support CUDA)? Do you have linux installed?
You could run my project so that I can compare performance times with a better graphic than my GT 9500 and a newer CPU than my old single core AMD.
You will need to install a couple of programs: nvcc (the CUDA compiler), nasm (an assembler), gnuplot (to draw some charts) and maybe some basic C++ dev tools if you don't have them yet. I'll help you set up the program and then you only need to wait for the program to run. I'm not sure how much will it take but it can take a few hours in which you shouldn't be using your computer much.

Here's the software prepared to be installed and run with just "make":
http://content.wuala.com/contents/jtimon/temp/preann.tar.gz

The project is free software and anyone can also download it from this repository:
http://sourceforge.net/projects/preann/develop

Please, contact me if you're interested in helping me with your computing power.
Although I have no idea about what would be fair, I'm willing to pay you some bitcoins for this.
I can also pay you with village's hours.
Jump to: