Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 994. (Read 3426989 times)

newbie
Activity: 1
Merit: 0
Hi Guys

I have a GTX 460 and with this setting (cudaminer.exe -H 1 -d 0 -i 1 -l F28x4 -C 1 -m 0 -o stratum+tcp://coinotron.com:3334 -O USER:PWD)
I have reached 122 khash/s

With new v111 cudamine and the same setting i reached just the half
which could be the reason? Sorry, i'm a newbie Smiley

Thanks!
full member
Activity: 168
Merit: 100
Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.

I did not apply extra overclocking (apart from the factory OC), as I am on Linux here.



Ohhhh, yeah i'm applying default 1 for the overclock because it works the best.

Is this the new code then i take it??
hero member
Activity: 756
Merit: 502
Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.

I did not apply extra overclocking (apart from the factory OC), as I am on Linux here.
member
Activity: 106
Merit: 10
so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work

very very greatfull to everybody for your help.


For normal scrypt use the official release 2013-12-18. It's on the first page of this thread. At least for me that is the fastest for normal scrypt.
Also there is not much to be done for tuning in that release.
For middlecoin.com and the 2013-12-18 release I use this start line:
cudaminer.exe -i 1 -H 2 -C 1 -l F8x16 -o stratum+tcp://middlecoin.com:3333 -O bitcoinaddress:password
Please note that this line is for a Fermi based card (Quadro 4000) with 2 GB of VRAM on a desktop where I also work on (-i 1 and not 0).
full member
Activity: 168
Merit: 100

However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...

fixed already Wink

Code:
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s
[2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s
[2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)

[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s
[2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s
[2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)

A GTX 660Ti breaking 300 kHash/s. Nice.


Not with my miners right now, but my 660ti break 300kh all the time, on the 18/12/13 code.
hero member
Activity: 756
Merit: 502

However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...

fixed already Wink

Code:
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:32:31] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-22 12:32:31] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:32:31] GPU #0: using launch configuration Y21x28
[2014-01-22 12:32:31] GPU #0: GeForce GTX 660 Ti, 155.33 khash/s
[2014-01-22 12:32:42] GPU #0: GeForce GTX 660 Ti, 299.98 khash/s
[2014-01-22 12:32:42] accepted: 1/1 (100.00%), 299.98 khash/s (yay!!!)

[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 12:45:40] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 12:45:40] GPU #0: 32 hashes / 4.0 MB per warp.
[2014-01-22 12:45:40] GPU #0: using launch configuration Y14x32
[2014-01-22 12:45:40] GPU #0: GeForce GTX 660 Ti, 153.06 khash/s
[2014-01-22 12:46:00] GPU #0: GeForce GTX 660 Ti, 304.43 khash/s
[2014-01-22 12:46:00] accepted: 1/1 (100.00%), 304.43 khash/s (yay!!!)

A GTX 660Ti (Asus Direct CU II OC) breaking 300 kHash/s on Linux. Nice.

In comparison, here is the 29ae4821fc31e8e55060f8aed7f8ae13e33b1827 revision from github (the one before I started committing anything scrypt-jane related). This one already supports the texture cache in David Andersen's kernels.

Code:
[2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti with compute capability 3.0
[2014-01-22 13:02:08] GPU #0: interactive: 0, tex-cache: 1D, single-alloc: 1
[2014-01-22 13:02:08] GPU #0: using launch configuration K14x32
[2014-01-22 13:02:08] GPU #0: GeForce GTX 660 Ti, 103.61 khash/s
[2014-01-22 13:02:32] GPU #0: GeForce GTX 660 Ti, 269.33 khash/s

So that's a 13% improvement then?

NOTE: The nVidia submitted kernels are now also in the Windows project files.

We're now in the strange situation that scrypt-jane and scrypt require completely different kernel implementations to run at best efficiency.  I need to think about how I can come up with a good auto-selection of kernels based on whether scrypt or scrypt-jane is used.
newbie
Activity: 12
Merit: 0
I built the latest commit (111) for you.
Please note that this comes without any warranties or anything. Donations please go to cbuchner!
Thanks @cbuchner for your continued work!
64-bit: https://www.dropbox.com/s/7qp3cwgufivu5jt/cudaminer_commit_111_x64.rar
32-bit: https://www.dropbox.com/s/z6aenjphoew7xs1/cudaminer_commit_111_x86.rar

Thank you for the commit !

However, I'm experiencing some problems with this commit.
First : The hasharate is slower than the other unofficial commit
And sometime my NVIDIA pilote crash.
legendary
Activity: 1400
Merit: 1050
Two new experimental kernels added to github - currently for Linux only. The Visual C++
project has not yet been updated. You will want to run ./autogen.sh and configure after
doing a git pull.

"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt.
"Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.

I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels
from the current github code.

Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail.
Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it.
Best config for "Y" is (guessing) No. of SMX x 32   - or just autotune.

The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).

When you make kHash/s benchmarks compare with the best scrypt values achieved with the
2013-12-18 release.

I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which
might be slightly faster than what the 2013-12-18 release delivered.

Christian


I did a rapid test on the Z kernel on the gtx780ti on windows (I just added the line to vcxproj and vcxproj.user in the same way it was done for the other kernel using the compiler option given in the nv_kernel).
It is slightly faster I was able to get to 724khash/s (against 700~705khash/s I usually got with the post lookup_gap files).
However, the core clock runs a bit higher which may-be the reason why I get that extra 20 khash/s.

However, I have a problem, each time I give it a config (the one it just found in the previous run), it says it does not validate and start a new autotune...
hero member
Activity: 756
Merit: 502

I haven't tested this, but I suspect it's caused by too much overclock.

except that I am barely overclocking them. It must be some kind of code bug.

Christian
hero member
Activity: 756
Merit: 502

cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd


You forgot an -L 2 there.

The -L is not yet rolled into the kernel launch configurations. This is intended, but not done yet.

Later on the launch config might look like this instead -K59x2/2. Then the only use for passing -L would be to tell autotune about the intended Lookup gap.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.

I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this.

Christian


I haven't tested this, but I suspect it's caused by too much overclock.
newbie
Activity: 11
Merit: 0
At first glance, 111 is much slower then the previous... but im off to work dont have really much time to test it right now but from 0.53 went to 0.18/0.20.
ktf
newbie
Activity: 24
Merit: 0
Hi Christian,

 Any idea why the cudaminer fails when I run it with -l parameter ? If I let it autotune with -L 2 and I see what value it select and try to start it again manually using that value, I get loads of errors :

[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaEventRecord(context_serialize[stream][thr_id], context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 820)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaMemcpyAsync(X, context_odata[stream][thr_id], mem_size, cudaMemcpyDeviceToHost, context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 852)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamQuery(context_streams[stream][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 826)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[0][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 163)
[2014-01-22 12:22:34] GPU #1: cudaError 4 (unspecified launch failure) calling 'cudaStreamSynchronize(context_streams[1][thr_id])' (C:/__test/CudaMiner-master/salsa_kernel.cu line 164)

 I used :

cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

With :

cudaminer.exe  --algo=scrypt-jane -d 1 -l K59x1  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

it works, but ofc it is way too slow.

 And with :

cudaminer.exe  --algo=scrypt-jane -d 1 -L 2  -H 0 -o stratum+tcp://yac.coinmine.pl:9088 -O user:pwd

 it works, but sometimes it doesn't select the best performance, plus it takes quite a long time to autotune.
member
Activity: 85
Merit: 10
oops I did it again  forgot to thank patoberli  you rock
member
Activity: 85
Merit: 10
so sorry I forgot to say thanks for the new v111 cudaminer I have tried it and it runs no diff but thanks for all of your very hard work

very very greatfull to everybody for your help.

member
Activity: 85
Merit: 10
Two new experimental kernels added to github - currently for Linux only. The Visual C++
project has not yet been updated. You will want to run ./autogen.sh and configure after
doing a git pull.

"Z" code submission by nVidia for Compute 3.5 devices (GTX 780 etc...). Good for scrypt.
"Y" code submission by nVidia, modified to run on Compute 3.0 devices also. Good for scrypt.

I find that scrypt-jane still runs faster with the "X" (Fermi) and "K/T" (Kepler/Titan) kernels
from the current github code.

Test away... Especially the Z kernel is expected to rule. I haven't tested it yet in detail.
Best config for "Z" is No. of SMX x 24, according to the engineer who wrote it.
Best config for "Y" is (guessing) No. of SMX x 32   - or just autotune.

The Z kernel is best run with -C 0 (it supports C 1 and C2, but that is mostly pointless).

When you make kHash/s benchmarks compare with the best scrypt values achieved with the
2013-12-18 release.

I got 86 kHash/s on GTX 750M with the -C2 flag and -l Y4x32 in some quick tests, which
might be slightly faster than what the 2013-12-18 release delivered.

Christian


thanks for the help but you have lost me I don't understand what you mean

I am not as smart as you and others here.. here is my .bat file below what else should I be putting in the bat file

cudaminer -o stratum+tcp://asia.middlecoin.com:3333 -u 1MU4EAB6p5xcRPhZ8gFKZSq9znchJpt2iE -p 123

what else do I need to put to try to get a better hash rate.

my second lappy has a nvida gtx 670 m 3gb gpu and its getting about 75khps and has some thingy f56x2 and I use the same bat file I know its a different card so I know I will have to put some extra in it what do I do please can some help me please



hero member
Activity: 756
Merit: 502
Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.

I am also having some validation issues with -L 5 on my GTX 780 Ti cards at 4.7 kHash/s. I wonder what is causing this.

Christian
member
Activity: 70
Merit: 10
I built the latest commit (111) for you.
Please note that this comes without any warranties or anything. Donations please go to cbuchner!
Thanks @cbuchner for your continued work!
64-bit: https://www.dropbox.com/s/7qp3cwgufivu5jt/cudaminer_commit_111_x64.rar
32-bit: https://www.dropbox.com/s/z6aenjphoew7xs1/cudaminer_commit_111_x86.rar

Many Thanks for this.

Using Patoberli's build of commit 111 I was able to play around a bit. T kernel in Windows on my Titan is very unstable during autotune unfortunately anything that allocates more then 3GB of VRAM just crashes Cudaminer outright. Not sure what direct limitation is causing this but this is a consistent observation with several hours of manual configurations. The Titan Kernel also heavily favors multiples of the old T16x1 such as T64x1 -L 1, T64x2 -L 2, etc. Not sure why but it makes picking out optimal settings easy Smiley

On my Titan I was able to test and get 5.6-5.8 kh/s (varies but fairly even spread) using -i 0 -H 1 -l T32x8 -L 4 -a scrypt-jane:YAC with a mild Core OC of +250.

I will submit this and full details to the spreadsheet after a full night of stable submissions Smiley

Edit: I have broken 6Kh/s, but only about 80% were validated Sad nice to have a high range but 80% of 6 is 4.8 so no real benefit lol.
hero member
Activity: 756
Merit: 502

thanks for trying to help me. I am very greatful

passing -C 2 might help a bit.

Also -i 0 if you can accept some sluggish video output.

Also remember the strongest configurations that autotune found for you and pass them with the -l flag.
Saves some time the next time you start it and it will always deliver the same performance.

Christian
hero member
Activity: 756
Merit: 502
I built the latest commit (111) for you.
Please note that this comes without any warranties or anything. Donations please go to cbuchner!
Thanks @cbuchner for your continued work!
64-bit: https://www.dropbox.com/s/7qp3cwgufivu5jt/cudaminer_commit_111_x64.rar
32-bit: https://www.dropbox.com/s/z6aenjphoew7xs1/cudaminer_commit_111_x86.rar

Thanks for the public service. Wink
Jump to: