Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 782. (Read 3426922 times)

member
Activity: 68
Merit: 10
hero member
Activity: 756
Merit: 502
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

try -C 2

I think an -i 0 would make sense here as well, otherwise autotune will have a severely smaller search space.
legendary
Activity: 1400
Merit: 1050
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

try -C 2

My 780 wont get higher then 180khash/s which I believe is the same as christian is seeing?
I am a bit surprised by this number too, this is what I get with the 780ti (~320khash) full throttle.


I just installed a panda p2pool node on my computer, curious to see if this is better than running on a far far away p2pool node... (right... first payment 6 panda... )
how come that there is tiny panda block  (I saw one of 4000 pandas) ?
member
Activity: 67
Merit: 10
can't download zip file...
sr. member
Activity: 350
Merit: 250
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

try -C 2

My 780 wont get higher then 180khash/s which I believe is the same as christian is seeing?
newbie
Activity: 46
Merit: 0
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

try -C 2
member
Activity: 106
Merit: 10
Compiled commit 183 with cuda 6 now. When I run it with display driver 332.21 I get the error:
[2014-02-19 13:38:30] Unable to query number of CUDA devices! Is an nVidia driver installed?

Drivers > 331 should already have cuda 6 included. A quick check in nvidias control panel under Windows also shows cuda 6.0.1 driver installed.

This just as a heads up.
hero member
Activity: 756
Merit: 502
Did the new nvidia drivers and new cudaminer give any increases?

On keccak i seem to be getting 168mh/s... (before updating nvidia drivers). Is this good on a factory 780?

I am not getting more either. I will experiment with the way the 64 bit arithmetics is done. Maybe it will be faster using the uint2 type instead of uint64_t and by doing the 64 bit arithmetics with inline PTX. I don't trust the compiler - in particular because the performance on the T kernel is actually worse than with the K kernel, despite having more registers available and having the funnel shifter feature feature. Also seemingly small changes to the K kernel have catastrophic performance impact. Like manually unrolling the first and last loop iterations and removing a few variables known to be zero or irrelevant for the result.

Christian
hero member
Activity: 756
Merit: 502
I would wonder what would happen if the whole computation would be run inside the GPU on-die L2 memory and directly bypass the GPU memory.
Would that create a bottleneck on the PCIe bus?
Can the actual work be slimmed down to utilize "only" those 2 MB which are available?
I guess it would probably not be feasible or more performing.

2 GB divided by 128 MB required per scrypt-1024 hash means, you can only do 16 hashes in parallel.

With the approach of David Andersen this would take 64 threads.

Now your average 750Ti has 5*128 = 640 CUDA cores.

You would only be using 1/10th of the card's compute cores. Even less so because there will still be stalls in the pipeline (read after write hazards, instruction latencies, etc) which cannot be compensated for when running at such a low occupancy. So there would be plenty of idle cycles.

So overall: bad idea.

Christian
legendary
Activity: 1400
Merit: 1050
I had a look to the new version from the github. The time-limit option is now pretty fast.
I didn't see any difference in performance with the new drivers (as someone reported yesterday).
I downloaded also cuda 6, but I didn't install it yet.

Regarding the autotuning and control-c.
Would that be possible when control-c (or some other keys) is typed during the autotuning to instead of killing the program it just leave the autotune and run with the best config found so far ?
newbie
Activity: 46
Merit: 0
Did the new nvidia drivers and new cudaminer give any increases?

On keccak i seem to be getting 168mh/s... (before updating nvidia drivers). Is this good on a factory 780?
member
Activity: 106
Merit: 10
I would wonder what would happen if the whole computation would be run inside the GPU on-die L2 memory and directly bypass the GPU memory.
Would that create a bottleneck on the PCIe bus?
Can the actual work be slimmed down to utilize "only" those 2 MB which are available?
I guess it would probably not be feasible or more performing.
newbie
Activity: 27
Merit: 0
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

I cant seem to break 230khs on my 780. My autotune always gives me something in the 120-126 range. Right now it's using T31x4 for example. When I use T12x20 it drops down to 10khs or less :/ Just seems like I should be getting more performance out of this. Mining Pandacoin btw. Anyone have any thoughts?

I'm overclocked to 1306core and 6800mem

Strange.

my 780 is clocked at 1080/1132 (Base/Boost) Cpu clock and 6258 memory clock. Am getting a current clock reading of 1189 Mhz from gpu-z. Is your gpu stable at that oc what kind of stresst test did you perform? Not really sure, try updating your driver and bios. Run auto tune a few times making sure it runs a different config everytime to see what performance you get from each config, that might shed some light.

I read about some user getting low hash rate on their amd cards and the fix was using a different bios but I wouldn't dabble there until you have stressed all your options.

Yeah can't figure it out. I was running skyn3t's custom bios so I decided to revert back to stock. Still same issue. When I mine normal Scrypt coins like Litecoin I hit 600khs easy - and it actually defaults to T12x20 from autotune! I'm only having this issue mining Vertcoins and Panda (scrypt 2048). Autotune will usually give me T126x1 or T60x2. Just seems like I should easily be hitting 300khs with my settings. Running the newest 334.89 drivers.

I also reverted my OC and I boost to 1267core on stock settings. I have the 780 classified hydro copper.
member
Activity: 84
Merit: 10
This evening, I installed the new cudaminer release & from reading the forum here, found out there are new nVidia drivers. Both are installed and running ( so far ) in heavy OC & OV conditions exactly as they did before with N-Scrypt on my GTX650 Ti. ( I have yet to try any other algos with the new software for performance tests ) It's time to turn in for me, this rig runs 24/7 except for coin changes. So far.. so good! :-D 

Thank You Christian & I'm really jazzed up about fail-overs on the way!

-Happy Mining
full member
Activity: 182
Merit: 100
Those new nVidia cards will be great for laptops, if it uses 60W it can easily be used by a laptop these days!
Why a laptop? Well it's possible to get free power for days at a time of course Roll Eyes
It will probably take a bit for them to come out though, and my current laptop has shitty Intel HD3000 Sad
But yeah, a laptop beating the old GTX 670 with 300khash/s is just crazy!
I_M
full member
Activity: 135
Merit: 100
You know, dollar for dollar, the 750ti is also better than one of those 5 chip gridseed things.
member
Activity: 103
Merit: 10
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

I cant seem to break 230khs on my 780. My autotune always gives me something in the 120-126 range. Right now it's using T31x4 for example. When I use T12x20 it drops down to 10khs or less :/ Just seems like I should be getting more performance out of this. Mining Pandacoin btw. Anyone have any thoughts?

I'm overclocked to 1306core and 6800mem

Strange.

my 780 is clocked at 1080/1132 (Base/Boost) Cpu clock and 6258 memory clock. Am getting a current clock reading of 1189 Mhz from gpu-z. Is your gpu stable at that oc what kind of stresst test did you perform? Not really sure, try updating your driver and bios. Run auto tune a few times making sure it runs a different config everytime to see what performance you get from each config, that might shed some light.

I read about some user getting low hash rate on their amd cards and the fix was using a different bios but I wouldn't dabble there until you have stressed all your options.
newbie
Activity: 27
Merit: 0
Started mining pandacoin getting 300khash/s with my 780. Is that the highest one can get with the 780?

This is my bat file: "cudaminer.exe --algo=scrypt:2048 -o stratum+tcp://panda.nitro.org:3338 -O user:pwd"    

Launch config is T12x20
Bench hash/sec: 329142 hash/s
Max total warps (BxW): 322


-i 1 -C 0 -H 0 (Default)

I cant seem to break 230khs on my 780. My autotune always gives me something in the 120-126 range. Right now it's using T31x4 for example. When I use T12x20 it drops down to 10khs or less :/ Just seems like I should be getting more performance out of this. Mining Pandacoin btw. Anyone have any thoughts?

I'm overclocked to 1306core and 6800mem
full member
Activity: 173
Merit: 100
Torn between buying three of these GTX 750ti's or waiting till their big brothers (800 series) arrive...
member
Activity: 103
Merit: 10
Get 315-320 kh/s with a GTX680 (not overclocked, no cudaminer parameters) mining standard Scrypt

For mining ltc and other scrypt coins seems to be in the range of the google spreadsheet Christian presented. The highest was a 415 Khash/sec.
Jump to: