Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 947. (Read 3426921 times)

full member
Activity: 120
Merit: 100
Astrophotographer and Ham Radioist!
I have been away for a while, but I'm glad people are using my and Christian's binaries for good. I'm still mining Microcoin with them. Gives me about 168 kilohashes on an OC'ed 560 Ti. It even found around two blocks since yesterday. ON towards becoming an MRC Millionaire! If anyone needs any help, feel free to get in touch.

Christian, could you please explain why the F and X and f autotuning give much much lower values and choose something else then F16x8 / F8x16 on my card? Those give me 168 kilohashes, anything else is 120-130 and autotune isn't showing those configurations to give more then 70-80 kilohashes, so another one gets chosen. I'm glad I found my "sweet spot", but others may not.
hero member
Activity: 840
Merit: 1000
Anyone been able to get a evga gtx780ti with acx cooling to 640khash?

Highest I can get is around 400.  I'm just guessing different values now.  But the old version could get very close to 650
cudaminer -a scrypt:2048 -d gtx780ti -H 1 -l t15x16 -C 2 -i 0

For vertcoin getting 181khash/sec.  The gpu reaches 1.1Ghz when running cudaminer.  I should be getting over 650.  With the old version gpu would only get up to around 1000mhz.  So now I'm getting 100 more mhz but very bad hash rates.



Have you tried dropping -C. I have genrally seen that -C 1 and -C 2 drop the hashrate.
hero member
Activity: 840
Merit: 1000
Quote from: Quote from Readme
Example for Litecoin Mining on coinotron pool with GTX 660 Ti

cudaminer -d gtx660ti -l K28x32 -C 2 -i 0 -o stratum+tcp://coinotron.com:3334 -O workername:password

Anyone else getting infinite "result does not validate on CPU" errors with this settings?
I have an Asus GTX 660Ti OC

I think you need to use k instead of K for 28x32, as far as I understood from christian's post. K = Y in new release. I am using K7x32 for my 660Ti non-OC.

Here is how launch configs from the cudaminer 2013-12-18 release translate to cudaminer 2014-02-02 release
to get equivalent performance. This can be handy if you find some older launch configs posted by others

 L b x w  ->                   sorry, legacy kernel was replaced by Fermi kernel. Autotune the F kernel.

 F b x w  ->  F b x w       ( no change to this one )

 K b x w  ->  k 4*b x w    (the previous K kernel is now named k and no. of blocks has to be quadrupled)

 T b x w ->  t 4*b x w     (the previous T kernel is now named t and no. of blocks has to be quadrupled)

 S b x w  ->                   Spinlock kernel is GONE.


These following kernels are new in the cudaminer 2014-02-02 release:  T (nVidia), K (nVidia), f (ported over from David Andersen's code).

Christian

member
Activity: 69
Merit: 10
Vertcoin on GTX780 with core overclocked to 1254Mhz = ~335kh/s
cudaminer.exe --algo=scrypt:2048 -d 0 -H 2 -C 1 -l T12x20 -i 0

GTX 670 overclocked to 1200mhz (using CPU for sha256) = ~155kh/s
cudaminer.exe --algo=scrypt:2048 -d 1 -H 1 -l K35x6 -i 0

Use these!

Donations welcome!
VTC = VouDyTwF5QNvmGukvdHgsZNwtJKjcY6aRD

I have a GTX 780 and with those arguments I get around 25kh/s. Adding -L 2 gets me to 200kh/s.
What model is your 780?

cheers bro the -L 2 took me to 200 or so on each card.
full member
Activity: 145
Merit: 101
A brave tester with 8 Fermi cards Tesla M2090 (thanks Choseh) just figured out the performance regression between 2013-12-18 and 2014-02-02. If you change the #if 0 in the fermi_kernel.cu to #if 1 (thereby enabling the previous version of the Salsa20/8 round function) you should see the previous performance figures again. Those who can compile the code themselves and want to mine on Fermi are welcome to make this change themselves. EDIT: False alarm apparently. My tester cannot reproduce this now

also there seems to be a bug in the autotuning code in salsa_kernel.cu

                            hash_sec = (double)WU_PER_LAUNCH / tdelta;

should very likely be

                            hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;

to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!

So, it seems I should release fixes (new binary release) for these problems tonight.

Christian


I've been experiencing problems with the 2-2 and 2-4 releases, both dropped my Kh/s about 20-30 on my GTX 560 Ti (Fermi). I've been using the 12-18 release to maximize my hashing power.

Here's my config:

cudaminer.exe --no-autotune -O user.worker:pass -o stratum+tcp://pool.com:3333 -C 1 -i 0 -H 1 -l F8x16


I have noticed that the newer releases report that the maximum warps as 209, whereas 12-18 shows maximum warps as 211. I did a benchmark on 2-4 with -C 1 -H 1 and -i 0 flags included, which gave me a config of F32x4. According to all of the resources I've read, F8x16 is the maximum my card can handle before giving CPU validation errors.

I have same problem with my GTX 560Ti.
newbie
Activity: 10
Merit: 0
Vertcoin on GTX780 with core overclocked to 1254Mhz = ~335kh/s
cudaminer.exe --algo=scrypt:2048 -d 0 -H 2 -C 1 -l T12x20 -i 0

GTX 670 overclocked to 1200mhz (using CPU for sha256) = ~155kh/s
cudaminer.exe --algo=scrypt:2048 -d 1 -H 1 -l K35x6 -i 0

Use these!

Donations welcome!
VTC = VouDyTwF5QNvmGukvdHgsZNwtJKjcY6aRD

I have a GTX 780 and with those arguments I get around 25kh/s. Adding -L 2 gets me to 200kh/s.
What model is your 780?
member
Activity: 116
Merit: 10
for what it's worth:
ultracoin
cudaminer -C 2 -H 2 -l -o xxxx -u xxxx -p xxxx --algo=scrypt-jane:UTC

gtx 570 | f30x32  (173-193 Kh/s)
--------------------------------
old 8500| f2x7     (2.5 Kh/s)


gtx 570 | F15x16  (193-223 Kh/s)   260           *occasionally drops to ~114 for a few sec.
--------------------------------   warps
old 8500| F6x4     (2.1 Kh/s)          105



*added warps from F set.
newbie
Activity: 1
Merit: 0
A brave tester with 8 Fermi cards Tesla M2090 (thanks Choseh) just figured out the performance regression between 2013-12-18 and 2014-02-02. If you change the #if 0 in the fermi_kernel.cu to #if 1 (thereby enabling the previous version of the Salsa20/8 round function) you should see the previous performance figures again. Those who can compile the code themselves and want to mine on Fermi are welcome to make this change themselves. EDIT: False alarm apparently. My tester cannot reproduce this now

also there seems to be a bug in the autotuning code in salsa_kernel.cu

                            hash_sec = (double)WU_PER_LAUNCH / tdelta;

should very likely be

                            hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;

to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!

So, it seems I should release fixes (new binary release) for these problems tonight.

Christian


I've been experiencing problems with the 2-2 and 2-4 releases, both dropped my Kh/s about 20-30 on my GTX 560 Ti (Fermi). I've been using the 12-18 release to maximize my hashing power.

Here's my config:

cudaminer.exe --no-autotune -O user.worker:pass -o stratum+tcp://pool.com:3333 -C 1 -i 0 -H 1 -l F8x16


I have noticed that the newer releases report that the maximum warps as 209, whereas 12-18 shows maximum warps as 211. I did a benchmark on 2-4 with -C 1 -H 1 and -i 0 flags included, which gave me a config of F32x4. According to all of the resources I've read, F8x16 is the maximum my card can handle before giving CPU validation errors.
newbie
Activity: 37
Merit: 0
Vertcoin on GTX780 with core overclocked to 1254Mhz = ~335kh/s
cudaminer.exe --algo=scrypt:2048 -d 0 -H 2 -C 1 -l T12x20 -i 0

GTX 670 overclocked to 1200mhz (using CPU for sha256) = ~155kh/s
cudaminer.exe --algo=scrypt:2048 -d 1 -H 1 -l K35x6 -i 0

Use these!

Donations welcome!
VTC = VouDyTwF5QNvmGukvdHgsZNwtJKjcY6aRD
member
Activity: 84
Merit: 10
SizzleBits
CudaManager GUI got updated with the latest hotfix.
Failover features plus the dev added all the -H arguments and custom thermal limits.

http://www.reddit.com/r/CudaManager/comments/1x17bn/release_cuda_manager_v11_cudaminer_2414_added/
member
Activity: 112
Merit: 10
with gtx560ti im getting about 145 khz vs 160 khz before.
member
Activity: 67
Merit: 10
vertcoin cudaminer latest how to mining?
hero member
Activity: 938
Merit: 1000
any idea settings for vertcoin for gtx 780? I am getting 380 total from two cards with autotune (which was the best from all the switches I tried). Have no idea about vertcoin!

I get 250 kH/s per card with T12x20, but this setting only works well under Linux. In Windows I can only get up to 190 kH/s per card, about the same as you.

Amazing. Thanks you. I got from 220khs to 280khs with your kernel setting. Autotune is very clumsy  Huh
member
Activity: 67
Merit: 10
gtx 650 ti i get 153kh/s

cudaminer.exe -D -o 127.0.0.1:12588 -u u -p p -i 0 -H 1 -m 1 -C 1 -s 1
hero member
Activity: 938
Merit: 1000
So i broke something.  Hopefully someone can enlighten me to where my problem lies (other than in the chair)

Please keep in mind i make no claim to knowing what the hell i'm doing, but only one way to learn, right?

2 670 GTXs, not in sli, sep .bat files for each, x64 cudaminer


  On VTC, K7x32 was producing ~133kh per card.  After tinkering with some different kernals received cuda error 30 and display driver crashed.   Rebooted and went back to K7x32 but that now produced cuda error 30 and a crash.  Reinstalled cuda and vid drivers and was able to use K7x32 with same results (133/per).  Once again tinkered with different kernals (using WHQL driver instead of beta this time to see if there was any difference) and received cuda error 30 etc etc but this time a cuda/driver reinstall didn't fix issue.

Currently running K7x20 with 155kh on one card and 135 on the other atm, but what did i break exactly and how?   

maybe the x32 config is at the limits of what the WDDM graphics card driver will allow you to allocate. Sometimes it works, and sometimes it doesn't. Leave away any -m1 or -C 1/2 options or reduce the x32 to something slightly smaller e.g. x30, x28



Hi Christian,

Could you give me some advices how to finetune Kepler and Titan cards for mining low-factor scrypt-jane coins? For example with Vertcoin N=10 I got:
Fermi GTX580: 130khs
Kepler GTX680: 150khs
Titan GTX780: 210khs

All of those were from autotune because I don't know how to tune card for scrypt-jane. The b x w should lower than the total warps, but the number of warps is very different even with cards of the same manufacture, right ?

There are a lot that I still don't understand.
Thank in advance, Christian.

Gute Nacht und besten Gruesse, Wink I assume that you are German Wink
hero member
Activity: 840
Merit: 1000
any idea settings for vertcoin for gtx 780? I am getting 380 total from two cards with autotune (which was the best from all the switches I tried). Have no idea about vertcoin!

I get 250 kH/s per card with T12x20, but this setting only works well under Linux. In Windows I can only get up to 190 kH/s per card, about the same as you.
newbie
Activity: 37
Merit: 0

On a more positive note: New build working great!

EVGA GTX660 SC w/slight oc (+71 engine, 110% power)
240Kh/s - scrypt (-l K5x32 -i 0 -H 1 -m 1 -C 1)
2.65Kh/s - scrypt-jane:YAC (-b 32768 -L 3 -l K59x3 -s 120 -i 0 -C 0 -m 0 -H 2)


Not sure why, but I tried my YAC config with a older commit_133 build, and I'm now getting Pi-Kh/s! (3.14Kh/s) I'm supprised at how much more coin i'm getting with just the little bump in hash rate. Haven't tested the 2/4/14 build yet..

And props to bathrobehero for my YAC config! Couldn't have done it without you.

Note: my 660 has 2 monitors plugged into it.
member
Activity: 69
Merit: 10
any idea settings for vertcoin for gtx 780? I am getting 380 total from two cards with autotune (which was the best from all the switches I tried). Have no idea about vertcoin!
newbie
Activity: 53
Merit: 0
I posted a new release 2014-02-04 fixing two important bugs.

- autotune underreporting kHash/s values if the kernel finished in under 50ms (forgot to divide time elapsed by number of measurements, doh!)
- Multi-GPU support was not working - it is now.



I've posted the Mac OS X binaries of the 2014-02-04 release for 10.6, 10.7, 10.8, and 10.9 here: http://www.johnchapman.net/crypto-currency/cudaminer-2-4-2014-release-now-available-for-os-x-10-6-10-7-10-8-and-10-9/

With this release I am seeing about a 15% increase in performance from my GT 650M.  Went from about 65 kh/s to 75 kh/s.  Well done!  Thanks.
newbie
Activity: 21
Merit: 0
Well for vertcoin its even worse sub 200.  But for LTC the best I can get is around 450 which is around 200 less than the old cudaminer could do.

cudaminer -d gtx780ti -H 1 -l t15x20 -C 2 -i 0

15x20 got me around 479.  Anything else starts to go low weather the number goes up or down.
Jump to: