Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1113. (Read 3426918 times)

newbie
Activity: 41
Merit: 0
My mistake, it shouldn't be there - at the moment -O1 for ld only turns on some optimizations for shared libraries, not the program binary.

-O3 takes whopping 5Khs/ out of mine super fast GT460 Smiley
newbie
Activity: 41
Merit: 0

That doesn't qualify as almost! Wink


xa-xa close enough. I think i observed same errors before the patch. so it 9hopefully) something simple.
hero member
Activity: 756
Merit: 502

That doesn't qualify as almost! Wink

newbie
Activity: 41
Merit: 0
Posted an April 22nd release.

Please let me know how it compiles on Linux 64 bit, and how it performs on Titan now.

The patch posted earlier wasn't really doing things right. CUDA textures should have stayed ulong2 and ulong4 type, but the uint32_t type needed to be moved over to unsigned long (from unsigned int previously) because otherwise there would be a mismatch with the texture types.




Christian,
configure works fine if i run:
./configure -with-cuda=/usr/local/cuda

instead of ./configure.sh

And it almost compiles - http://pastebin.com/raw.php?i=JZb62Jtd
newbie
Activity: 14
Merit: 0
My mistake, it shouldn't be there - at the moment -O1 for ld only turns on some optimizations for shared libraries, not the program binary.
hero member
Activity: 756
Merit: 502

hmm, the patch posted earlier suggests the following configure line for 64 bits

./configure "CFLAGS=-O3" "CXXFLAGS=-O3" "LDFLAGS=-Wl,-O1" --with-cuda=/usr/local/cuda

not sure what the -Wl,-O1 linker flag is supposed to do.
hero member
Activity: 756
Merit: 502
Posted an April 22nd release.

Please let me know how it compiles on Linux 64 bit, and how it performs on Titan now.

The patch posted earlier wasn't really doing things right. CUDA textures should have stayed ulong2 and ulong4 type, but the uint32_t type needed to be moved over to unsigned long (from unsigned int previously) because otherwise there would be a mismatch with the texture types.


hero member
Activity: 756
Merit: 502
autoadjust does not find the best values for my titan, had to find the best values Sad
it works with 70x4 280khash/s

it's autotune (TM) (R).

how's 35x8 ?

Christian
legendary
Activity: 1078
Merit: 1001
autoadjust does not find the best values for my titan, had to find the best values Sad


e: now i checked -D option
    it works with 70x4 280khash/s
hero member
Activity: 756
Merit: 502
_shared__ uint32_t X[WARPS_PER_BLOCK][WU_PER_WARP][16+4];

Thanks! This helped. I did not know about newly added alignment restrictions in shared memory targeting SM 2.0 and higher. I guess that's because they're now having a unified pointer and addressing scheme. So if there's an alignment requirement, it applies to everything.

Finally the Titan kernel will get my large memory transaction fixes, which should boost performance notably.

Christian
newbie
Activity: 14
Merit: 0
I've just run into the same compiler issue that borked the Titan kernels when I tried to compile salsa_kernel.cu for sm_30. The kernel will just crash.

Maybe using the NSight debugger I can figure out why this occurs.

Does the crash produce: CUDA_EXCEPTION_6, Warp Misaligned Address ?

I've been able to compile & run salsa_kernel for sm_21, without tex-cache, when accesses to X variable are 128-bit aligned,

ie. when it's declared like this:
Code:
_shared__ uint32_t X[WARPS_PER_BLOCK][WU_PER_WARP][16+4];
legendary
Activity: 1792
Merit: 1008
/dev/null
how much are you guys getting with a 580?

240KH/s give or take 10KH/s
sweet, i got ~257 Smiley (slightly OC)
as soon ive mined some coins il send a donation for sure Wink
newbie
Activity: 47
Merit: 0
how much are you guys getting with a 580?

240KH/s give or take 10KH/s
legendary
Activity: 1792
Merit: 1008
/dev/null
how much are you guys getting with a 580?
hero member
Activity: 756
Merit: 502
I assume you've seen this Kepler thread?

https://bitcointalk.org/index.php?topic=163750.0;topicseen

Seen this.

The challenges with the scrypt hashing are a bit greater than just using the funnel shifter for rotation. One issue is the speed and efficiency of memory access, the other issue is getting enough occupancy on Kepler's SMX (multiprocessor) units - shared memory and register limits are an issue. This mainly affects the GTX 660Ti, GTX 670, 680 and Titan devices which currently perform rather poor in comparison to the 5xx series.
hero member
Activity: 756
Merit: 502

I've seen reports of a single overclocked Titan doing 290 kHash/s, using a somewhat earlier code version.


hero member
Activity: 756
Merit: 502
I've just run into the same compiler issue that borked the Titan kernels when I tried to compile salsa_kernel.cu for sm_30. The kernel will just crash.

Maybe using the NSight debugger I can figure out why this occurs.



full member
Activity: 196
Merit: 100
full member
Activity: 196
Merit: 100
Christian,

Success,

copied from settings above but seems to be only 260kH/s per TITAN.
full member
Activity: 126
Merit: 100
this is my 670gtx (GIGABYTE GV-N670OC-2GD) doing over 200khash/s




Code:
cudaminer.exe --url http://notroll.in:6332/ --userpass jasonharty24.4:12345 -i 0 -m 1 -C 2 -l 70x4
JESUS CHRIST that is a great OC!
Jump to: