Run my program in your Nvidia GPU (for bitcoins)

DrHaribo

legendary

Activity: 2730

Merit: 1034

Needs more jiggawatts

Wow, Skynet in the making Cheesy

jtimon

legendary

Activity: 1372

Merit: 1002

Ok, I found out why the initial makefile didn't work for you. A stupid thing. Embarrassed

You have to call "make all" explicitly and not just "make".
I was used to compile from eclipse and it put "all" there by default, but from console I did never actually tried just "make". Usually "make sse2" or "make all_emu". I should have noticed that before and you wouldn't had to take CXX_BASE out of the conditions. I thought "all" was in $(MAKECMDGOALS) by default when you call just "make". And don't know why I thought I tested it. My fault, sorry.
I've prepared this other makefile for you:

http://content.wuala.com/contents/jtimon/temp/MakefileForLuceo?dl=1

use "make cuda" to avoid the sse2 stuff

For trying the multilib thing, use this one:

http://content.wuala.com/contents/jtimon/temp/MakefileForLuceoSse2?dl=1

for this one use "make all"

I think that the permissive flag may not be necessary after all, but maybe that's unrelated.
Can you try removing that too (from the first one if it works)?

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Luceo on June 21, 2012, 06:07:07 AM

I'm not sure how difficult it will be to get this working, and how that compares to the difficulty of porting the assembly code.

Uff, after reading some webs and manuals, and trying it for less than half an hour I give up. Porting the assembly seems to really be a pain. I thought 64 bits wasn't going to be that different. Besides I don't touch that code for years and although it's documented, reading assembly...

Please, try to put the -m32 flag on nvcc too. If that doesn't work, I'll search for the new error (if it's different) or I'll use separate charts to compare the performance of each implementation.

Luceo

sr. member

Activity: 350

Merit: 250

Per aspera ad astra!

I tried the Makefile you PM'd me, got this output.

I tried modifying my 'working' Makefile (the one I sent you output from for CUDA), but no luck:

Code:

(3:648)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/common/util.cpp -o build/common/util.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -g -G -c -arch sm_11   src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/interface.cpp -o build/neural/interface.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/connection.cpp -o build/neural/connection.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/layer.cpp -o build/neural/layer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/task.cpp -o build/genetic/task.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/genetic/population.cpp -o build/genetic/population.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/game/board.cpp -o build/game/board.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/plot.cpp -o build/loop/plot.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/loop.cpp -o build/loop/loop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/test.cpp -o build/loop/test.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -m32 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DCUDA_IMPL -DSSE2_IMPL -L/opt/cuda-toolkit/lib64 -lcudart  build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe
/usr/bin/ld: i386 architecture of input file `build/test/testMemoryLosses.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/chronometer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/dummy.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/enumerations.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/common/util.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/optimization/factory.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/interface.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/connection.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/neuralNet.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/layer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/inputLayer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/neural/buffer.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/individual.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/task.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/genetic/population.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/game/reversiBoard.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/game/board.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/reversiTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/binaryTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/tasks/classificationTask.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/rangeLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/enumLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/genericPlotter.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/plot.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/loop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/joinEnumLoop.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/test.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loop/parametersMap.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loopTest/taskPlotter.o' is incompatible with i386:x86-64 output
/usr/bin/ld: i386 architecture of input file `build/loopTest/chronoPlotter.o' is incompatible with i386:x86-64 output
build/test/testMemoryLosses.o: In function `testPopulation(ParametersMap*)':
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:65: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:66: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:67: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:81: undefined reference to `operator new(unsigned int)'
build/test/testMemoryLosses.o: In function `main':
/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:103: undefined reference to `operator new(unsigned int)'
build/test/testMemoryLosses.o:/home/luceo/Downloads/preann/src/test/testMemoryLosses.cpp:106: more undefined references to `operator new(unsigned int)' follow
build/common/dummy.o: In function `std::basic_string, std::allocator > std::operator+, std::allocator >(char const*, std::basic_string, std::allocator > const&)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/bits/basic_string.tcc:702: undefined reference to `std::string::reserve(unsigned int)'
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/bits/basic_string.tcc:703: undefined reference to `std::string::append(char const*, unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/common/util.o: In function `__gnu_cxx::new_allocator >::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/optimization/factory.o: In function `Buffer* func_newBuffer<(BufferType)0, float>(unsigned int, ImplementationType)':
/home/luceo/Downloads/preann/src/optimization/configFactory.h:28: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/optimization/configFactory.h:37: undefined reference to `operator new(unsigned int)'
build/optimization/factory.o:/home/luceo/Downloads/preann/src/optimization/configFactory.h:48: more undefined references to `operator new(unsigned int)' follow
build/optimization/factory.o: In function `XmmConnection<(BufferType)2, unsigned int>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)1, unsigned int>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)3, unsigned char>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/optimization/factory.o: In function `XmmConnection<(BufferType)0, float>::_calculateAndAddTo(Buffer*)':
/home/luceo/Downloads/preann/src/template/xmmConnection.h:102: undefined reference to `XMMreal'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:110: undefined reference to `XMMbinario'
/home/luceo/Downloads/preann/src/template/xmmConnection.h:118: undefined reference to `XMMbipolar'
build/neural/neuralNet.o: In function `NeuralNet::addLayer(unsigned int, BufferType, FunctionType)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:33: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `NeuralNet::addInputLayer(Interface*)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:49: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `NeuralNet::load(_IO_FILE*)':
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:196: undefined reference to `operator new(unsigned int)'
/home/luceo/Downloads/preann/src/neural/neuralNet.cpp:199: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o: In function `__gnu_cxx::new_allocator::allocate(unsigned int, void const*)':
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: undefined reference to `operator new(unsigned int)'
build/neural/neuralNet.o:/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/new_allocator.h:94: more undefined references to `operator new(unsigned int)' follow
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

I do have a lot of the stuff from multilib installed (I'm a gamer, so have to have a lot of it installed).

I'm not sure how difficult it will be to get this working, and how that compares to the difficulty of porting the assembly code.

2112

legendary

Activity: 2128

Merit: 1074

Quote from: Luceo on June 18, 2012, 01:32:14 PM

I'm using Arch Linux x86_64.

I just checked that Arch Linux has support for multilib. This means that 64-bit OS can run 32-bit programs, provided that:

1) the multilib support packages are installed;
2) gcc/g++ are invoked with -m32 flag.

So there's no need to laboriously rewrite the assembly code. All you need to do is modify the makefiles.

Have fun.

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Luceo on June 20, 2012, 07:03:50 AM

I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.

In my old computer with poor communication between CPU and GPU memories the SSE2 implementation was actually superior. That's what I didn't want to show to my teachers. But yes, I can compare SSE2 and CUDA against C separately.
But if I find easy to port the assembly code to 64 bits I'll do it just to make it nicer.

Luceo

sr. member

Activity: 350

Merit: 250

Per aspera ad astra!

I actually only have 64-bit operating systems, but since SSE2 is CPU-based, it's not really comparable to the CUDA (GPU) results and should be able to be tested on just about anything.

If you want to update the assembler code for a 64-bit CPU, I'd be willing to test it, but I expect that's a lot of work and not as simple as just using 64-bit registers (my asm knowledge is very rusty).

I'll run the program next time I nap and post results here if it finishes during that nap.

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Luceo on June 19, 2012, 05:56:46 PM

OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

I'll correct that, thanks.

Quote from: Luceo on June 19, 2012, 05:56:46 PM

Code:

/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Oh, I didn't thought about that. The assembly code is incompatible with the 64bit operative system.

Quote from: Luceo on June 19, 2012, 05:56:46 PM

Feel like we're getting somewhere. Tongue

I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:

FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:

while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 :
Implementation SSE2 is not allowed.

Actually I wanted to compare the SSE2 version with the same CPU.
If you can't use a 32 bit OS, I guess I can repeat some of the charts and compare results independently.
After removing the SSE2 version, to not having those errors it is necessary to change the running main files (chronoBuffers.cpp, chronoConnections.cpp, chronoFunctions.cpp) to remove the SSE2 option, but those errors aren't really important.
Anyway, the changes would be something like this:

In chronoBuffers.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION, 3, IT_C, IT_SSE2, IT_CUDA);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA);

In chronoConnections.cpp
EnumLoop linesLoop(ET_IMPLEMENTATION);
for
EnumLoop linesLoop(ET_IMPLEMENTATION, 4, IT_C, IT_SSE2, IT_CUDA, IT_CUDA_REDUC, IT_CUDA_INV);

Maintain
linesLoop.addInnerLoop(new EnumLoop(ET_IMPLEMENTATION, 2, IT_C, IT_CUDA));
for
chronoFunctions.cpp

Quote from: Luceo on June 19, 2012, 05:56:46 PM

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:

while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 :
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Don't worry, that's expected. One CUDA implementation doesn't allow certain sizes.
I don't understand why it says
Cmake: *** [cuda_emu] Interrupt

By default it should be [all] and not [cuda_emu] but it doesn't matter.
The cuda part is compiled without the "--device-emulation" so everything's fine:

Code:

(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o

Quote from: Luceo on June 19, 2012, 05:56:46 PM

Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).

Sorry for not answering earlier. I would prefer to run this with a 32 bit OS and the SSE2 version, but again, if that's not possible, I'll work it out somehow.
But the other errors are fine, is just an implementation which is more limited trying to run greater sizes for the layers than it can. That's expected.

Thank you again for all your effort and patience.

Luceo

sr. member

Activity: 350

Merit: 250

Per aspera ad astra!

OK, I had to swap around the #undef and #include statements at the top of the cuda_code.cu (undef directives must come before importing the header), and add /opt/cuda-toolkit/open64/bin to path, got past that issue and found another one:

Code:

(3:624)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/optimization/factory.cpp -o build/optimization/factory.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/interface.cpp -o build/neural/interface.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/connection.cpp -o build/neural/connection.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/neuralNet.cpp -o build/neural/neuralNet.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/layer.cpp -o build/neural/layer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/inputLayer.cpp -o build/neural/inputLayer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/neural/buffer.cpp -o build/neural/buffer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/individual.cpp -o build/genetic/individual.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/task.cpp -o build/genetic/task.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/genetic/population.cpp -o build/genetic/population.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/reversiBoard.cpp -o build/game/reversiBoard.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/game/board.cpp -o build/game/board.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/reversiTask.cpp -o build/tasks/reversiTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/binaryTask.cpp -o build/tasks/binaryTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/tasks/classificationTask.cpp -o build/tasks/classificationTask.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/rangeLoop.cpp -o build/loop/rangeLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/enumLoop.cpp -o build/loop/enumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/genericPlotter.cpp -o build/loop/genericPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/plot.cpp -o build/loop/plot.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/loop.cpp -o build/loop/loop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/joinEnumLoop.cpp -o build/loop/joinEnumLoop.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/test.cpp -o build/loop/test.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loop/parametersMap.cpp -o build/loop/parametersMap.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/taskPlotter.cpp -o build/loopTest/taskPlotter.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/loopTest/chronoPlotter.cpp -o build/loopTest/chronoPlotter.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -L/opt/cuda-toolkit/lib64 -lcudart build/test/testMemoryLosses.o build/common/chronometer.o build/common/dummy.o build/common/enumerations.o build/common/util.o build/optimization/factory.o build/neural/interface.o build/neural/connection.o build/neural/neuralNet.o build/neural/layer.o build/neural/inputLayer.o build/neural/buffer.o build/genetic/individual.o build/genetic/task.o build/genetic/population.o build/game/reversiBoard.o build/game/board.o build/tasks/reversiTask.o build/tasks/binaryTask.o build/tasks/classificationTask.o build/loop/rangeLoop.o build/loop/enumLoop.o build/loop/genericPlotter.o build/loop/plot.o build/loop/loop.o build/loop/joinEnumLoop.o build/loop/test.o build/loop/parametersMap.o build/loopTest/taskPlotter.o build/loopTest/chronoPlotter.o build/optimization/sse2_code.o build/optimization/cuda_code.o -o bin/testMemoryLosses.exe
/usr/bin/ld: i386 architecture of input file `build/optimization/sse2_code.o' is incompatible with i386:x86-64 output
collect2: error: ld returned 1 exit status
make: *** [bin/testMemoryLosses.exe] Error 1

Feel like we're getting somewhere. Tongue

I'll try not compiling SSE2 code since I assume you only need me to test the CUDA stuff, so makefile:

Code:

FULL_OBJ = [s]$(SSE2_OBJ)[/s] $(CUDA_OBJ)
FACT_FLAGS += -DCPP_IMPL [s]-DSSE2_IMPL[/s] -DCUDA_IMPL

Maybe I edited the Makefile wrong, because I got a bunch of outputs similar to:

Code:

while executing GenericPlotFillAction ChronoFillAction Connection_reset at state Size_450000 :
Implementation SSE2 is not allowed.

However, it looks like the software compiled. I'm now getting outputs (seperated by reasonably long pauses) like:

Code:

while executing GenericPlotFillAction ChronoFillAction Connection_calculateAndAddTo at state Size_4500 :
The maximum float input size is 4032.

Complete output from this run has been pasted here.

Can you confirm if this is expected behaviour as I'm assuming it's not (the 4500 step is being repeated over and over, I assume due to trying to use a float too high)? If so, I'll leave this running overnight (next time I nap, probably in about 5-6 hours).

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Luceo on June 19, 2012, 12:55:16 PM

No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

...

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:

#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Thank you. I'll find out what this means. Probably desabling new stuff by default or something.

Quote from: Luceo on June 19, 2012, 12:55:16 PM

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

...

That happens when you change things and you only test your "improvements" with the emulator. Sorry again.
I've uploaded the file again to the repository with a change to try to solve that (and your change).
Can you try it again?

Luceo

sr. member

Activity: 350

Merit: 250

Per aspera ad astra!

No problem, it clearly needs working through so glad I can be of assistance. Current output still fails, though:

Code:

(3:600)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_00006ce2_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

This issue was present in some of the CUDA demos, and was fixed by adding this to the top of cuda_code.cu (although I'm totally unsure what this does, it's just a forum suggestion):

Code:

#undef _GLIBCXX_ATOMIC_BUILTINS
#undef _GLIBCXX_USE_INT128

Then, the following return was given to a 'make' (this could be related to my edit as described in the last codeblock, though):

Code:

(3:602)# make
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(382) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(457) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumBitsInvertedConnectionsKernel<(BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)0> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)1> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)512u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)256u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)128u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)64u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)32u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)16u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)8u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)4u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)2u, (BufferType)2> ") is not allowed

src/optimization/cuda_code.cu(583) (col. 22): error: calling a host function("std::min ") from a __device__/__global__ function("SumConnectionsKernel<(unsigned int)1u, (BufferType)2> ") is not allowed

34 errors detected in the compilation of "/tmp/tmpxft_00006d0d_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

jtimon

legendary

Activity: 1372

Merit: 1002

Sorry about all the problems you're having. I've compiled the program with older software and this is the result.
Being CUDA propietary software, when they decided to remove the emulator I got stuck with version 2.3 which means g++ 4.3 and I'm using make 3.81.
The permisive flag seems to solve the problems with the newer g++ and you've hacked the makefile to make it work with the newer make.
I forgot that the path for nvcc is different in newer versions.

It doesn't run well because it hasn't compiled correctly. I don't even know why the executables are created.

I've changed cuda_code.cu to try to solve those problems. You can get it here:
http://preann.svn.sourceforge.net/viewvc/preann/preann/src/optimization/cuda_code.cu?view=log

If you have more problems, please, post them. Maybe the easiest solution (if you can get a live CD with g++-4.3) is to install the legacy version of CUDA: http://developer.nvidia.com/cuda-toolkit-23-downloads

Thank you for taking the time to try this.

Luceo

sr. member

Activity: 350

Merit: 250

Per aspera ad astra!

I've had a look at the code and I'm happy running this on my machine 'naked'.

A LiveCD wouldn't affect performance as I'd reboot into it and everything'd be running in RAM, but getting all the software on one would be a pain.

I'm using Arch Linux x86_64.

Edit: CXX_BASE isn't set by default, set that. Next issue:

(3:527)~ make
g++ -ggdb -I src/ -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -ggdb -I src/ -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -ggdb -I src/ -c src/common/dummy.cpp -o build/common/dummy.o
g++ -ggdb -I src/ -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -ggdb -I src/ -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: error: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
make: *** [build/common/util.o] Error 1

Set CXX_BASE to -fpermissive, not sure if this'll affect the test.

Also have to change the cuda directory. Makefile looks pretty different.

Hmm, didn't run right, any ideas?

...

EDIT 2: OK, got a little further. Your 'if' loop in the Makefile was being ignored, so I manually set these in the Makefile:

FACT_OBJ = $(FULL_OBJ)
FACT_FLAGS += -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL

Then I added my CUDA 'bin' path to PATH variable.

Now, I get the following when I try to compile:

Code:

(3:587)# make
mkdir -p build/common
mkdir -p build/optimization
mkdir -p build/neural
mkdir -p build/genetic
mkdir -p build/game
mkdir -p build/tasks
mkdir -p build/loop
mkdir -p build/loopTest
mkdir -p build/test/
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/test/testMemoryLosses.cpp -o build/test/testMemoryLosses.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/chronometer.cpp -o build/common/chronometer.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/dummy.cpp -o build/common/dummy.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/enumerations.cpp -o build/common/enumerations.o
g++ -fpermissive -L/opt/cuda-toolkit/lib64 -ggdb -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -c src/common/util.cpp -o build/common/util.o
src/common/util.cpp: In static member function ‘static void MemoryManagement::free(void*)’:
src/common/util.cpp:61:48: warning: cast from ‘void*’ to ‘unsigned int’ loses precision [-fpermissive]
nasm -f elf src/optimization/sse2_code.asm -o build/optimization/sse2_code.o
/usr/local/cuda/bin/nvcc -I src/ -I /opt/cuda-toolkit/include/ -DCPP_IMPL -DSSE2_IMPL -DCUDA_IMPL -g -G -c -arch sm_11 src/optimization/cuda_code.cu -o build/optimization/cuda_code.o
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(48): error: identifier "__atomic_fetch_add" is undefined

/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/../../../../include/c++/4.7.0/ext/atomicity.h(52): error: identifier "__atomic_fetch_add" is undefined

src/optimization/cuda_code.cu(60): error: more than one instance of overloaded function "min" matches the argument list:
function "min(int, unsigned int)"
function "min(unsigned int, unsigned int)"
argument types are: (unsigned long, unsigned int)

src/optimization/cuda_code.cu(175): error: more than one instance of overloaded function "min" matches the argument list:
function "min(int, unsigned int)"
function "min(unsigned int, unsigned int)"
argument types are: (unsigned long, unsigned int)

4 errors detected in the compilation of "/tmp/tmpxft_000046f1_00000000-4_cuda_code.cpp1.ii".
make: *** [build/optimization/cuda_code.o] Error 2

Any suggestions? Please PM me, I'd like to get this running as I've already spent a fair amount of time hacking at the Makefile to get it to.

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Jello on June 17, 2012, 04:34:14 PM

I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.

I guess CentOS will have g++ and gnuplot in its repositories, but not so sure about nasm (the assembler). I've develeoped everything under ubuntu but it shouldn't be a problem. I also guess Sandy Bridge is compatible with SSE2 and the XMM co-processor, after all is intel.
Don't you know what GPU model do you have? 96 cores will be fine anyway, but I'm not sure if your machine is better or worse than speedmann's.

Quote from: mokahless on June 17, 2012, 07:07:04 PM

What does your program do? Mining? Or are you cracking government passwords or something sketchy?

It has different names: Evolutionary artificial neural networks, neuro-evolution...Is basically a machine learning technique combining neural networks and genetic algorithms.
What does my program learn?
To correctly calculate AND or OR from 2 bit vectors (very basic classification tasks), XOR (harder, you need NN with more than 1 layer) and to play reversi/othello, a board game.

mokahless

sr. member

Activity: 471

Merit: 256

What does your program do? Mining? Or are you cracking government passwords or something sketchy?

Jello

newbie

Activity: 10

Merit: 0

I have a card with 96 CUDA cores on an 2nd gen Sandy Bridge on CentOS. Not sure if that's enough for your needs.

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: speedmann on June 14, 2012, 12:05:01 PM

Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad

) with Intel Core i3 370M Processor?
If yes, please contact me Wink

Thanks for you offer.
Luceo's GTX 570 was more attractive but I'm not sure if he's still willing to do it.
If you don't mind I'll wait until next week for his answer (or others) and tell you then, ok?
Even if your version is mobile, still has 96 cores instead of my 32 and the CPU has 2 cores instead of one (so system processes shouldn't get in the way).
If no one says anything else, you're the winner for now

speedmann

newbie

Activity: 63

Merit: 0

Hi there,

are you interested in testing an Geforce GT 420m (yea mobile version Sad

) with Intel Core i3 370M Processor?
If yes, please contact me Wink

jtimon

legendary

Activity: 1372

Merit: 1002

Quote from: Luceo on June 12, 2012, 06:30:11 AM

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

Great!! That should be more than enough.
It doesn't need internet access but I'm afraid of the impact on performance that a live CD could have, since the purpose is precisely to measure performance.
The project may be large, but you won't be actually running all of it.
The tasks, game, genetic, etc, parts aren't necessary.
If you use the version here, the makefile is prepared to just compile and run chronoBuffers, chronoConnections and chronoFunctions (I forgot to include this one the last time I put the thing prepared there) and these main files don't use the whole code. I think it's just neural, optimization, tamplate, common and loop parts plus ChronoPlotter and these main .cpp files. But you can be sure following the includes.

If the reason you want to use the live CD is that you're concern with what the program can do, we could first run a test with the live CD and small values for the loops so you can be sure that it won't broke anything.
Wait...that won't work because you need to install nasm, nvcc and gnuplot first and you can't install things using the liveCD, right?

Probably the safest solution for you is then to run it in a separate partition, but that will make you spend more time.
I'm open to other suggestions but I fear the liveCD won't be a feasible solution.

Jay_Pal

legendary

Activity: 1493

Merit: 1003

Quote from: jtimon on June 12, 2012, 04:15:36 AM

Quote from: Jay_Pal on June 11, 2012, 01:37:29 PM

I have an onboard GT 8600, are you interested?

I don't know. It seems to have the same core config (32 cores) than my 9500 GT. And with that I obtained "disappointing" results last time I tried. Maybe it was the old motherboard and CPU what was responsible.
Now it seems that my old Desktop pc has broken down just by not being turned on for so long (or maybe it's just the power supply, no beep),
so maybe I'm interested in your offer if no one else shows up.
My hope was getting to use an NVIDIA from the 500 or 600 series, with at least 192 cores or so (ideally a GeForce GTX 680 which has 1536 of them), but thank you for your interest, anyway.

Another clarification: I don't need towers with various GPUs because my program is not adapted to work with multiple GPUs anyway.

Anytime you need, you're welcome!

Quote from: Luceo on June 12, 2012, 06:30:11 AM

I have a GTX 570, will check over the source code and make sure I'd be comfortable running it.

Edit: Hmm, it's bigger than I expected. Does it need net access? If not, I can run it off an isolated LiveCD.

Edit: CPU is a Phenom II X6 1100T.

That's a great idea!

Topic: Run my program in your Nvidia GPU (for bitcoins) (Read 4662 times)