Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 974. (Read 3426921 times)

newbie
Activity: 5
Merit: 0
Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis


get it from github, or wait for the next official release (only a few more days...)

I have Visual Studio 2012, but I can't load the solution file. So I probably have to wait a few more days.
sr. member
Activity: 350
Merit: 250
That would be nice to see
I will gladly test it christian if you want to sent it through while your working on it
hero member
Activity: 756
Merit: 502
So my 780 getting over 5 isnt too bad then

but 6 or 7 would be nicer.

I have one optimization in mind that swaps the state of threads within the lookup_gap loop. The intention is to order threads by the loop trip count (some have to run for 0 loops, others a couple more up to the specified lookup_gap). By ordering them, some of the warps will terminate much earlier and not consume any computational resources.

This would (in theory) reduce the workload nearly by factor 2, but it introduces some overhead for sorting the threads, and for shuffling the state around. Whether a net speed gain remains ,  that is yet to be seen.

I will save that optimization for February (it would delay this release...)

Christian

sr. member
Activity: 350
Merit: 250
So my 780 getting over 5 isnt too bad then
hero member
Activity: 756
Merit: 502
Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.

How much is your 660Ti getting christian?

3.7 kHash/s give or take.
hero member
Activity: 756
Merit: 502
Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis


get it from github, or wait for the next official release (only a few more days...)
hero member
Activity: 756
Merit: 502
I've been experimenting with streams on the Y kernel.  So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti.  Too bad it doesn't validate on the CPU though.  The kernel must not be concurrent safe, =).  

yes. right. there is one scratchpad but two streams. The scrypt_core kernels have to be serialized, or they would destroy each other's scratchpad. This is why I am using CUDA events.  

Some overlap of memcpy and kernels would be desired (not happening now due to issue order of commands), and possibly the SHA256/Keccak kernels of one stream could be executed concurrently with the scrypt_core kernels of the other stream. This is also not happening now because my CUDA events currently also serialize these (need to change when events are generated and synchronized upon).

I intend to get rid of memcpy alltogether by checking hashes on the GPU instead, so the memcpy/kernel overlap issue is moot.

Christian
full member
Activity: 125
Merit: 100
I've been experimenting with streams on the Y kernel.  So far I've tested this on YAC and got 5.3 khash/s on my 660 Ti.  Too bad it doesn't validate on the CPU though.  The kernel must not be concurrent safe, =). 
full member
Activity: 154
Merit: 100
sr. member
Activity: 350
Merit: 250
Ok so just a quick drop of numbers before I post the results tonight. Latest cudaminer on my 780  is now getting 5.03khash/s

I have use of my desktop with it. And that is with T69x4 and -L5. That and my cpu is slightly more free now so my cpu does 0.72khash/s constant now. So ive gone from 4.3khash/s to 5.8khash/s

Not a bad jump
newbie
Activity: 5
Merit: 0
Hi Christian,

I noticed some nice development on the Nvidia Developer Zone. When will you make this version available?
https://devtalk.nvidia.com/default/topic/643428/cuda-programming-and-performance/could-anyone-benchmark-this-for-me-on-a-780-ti-or-titan-/

Dennis

sr. member
Activity: 350
Merit: 250
Yes that is weird. Im going to do a full run from -L 2 to -L 6 and will post what I get for each.

How much is your 660Ti getting christian?
hero member
Activity: 756
Merit: 502
What build are you using and what os?
i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s

./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D

i may be missing some parameters, like i don't have -c set

you want a lookup-gap of up to 6 on GTX 780 cards, specify it with the -L parameter.

Try with -L 2 first, let it autotune and increase the lookup gap one by one.
WARNING: autotuning may take long with enabled gap.

Stop when you find a power consumption vs kHash/s rate that suits you.

I find that a my 660Ti makes my GTX 780 card look poor in comparison. Not sure why that is, exactly (as the 780 has Compute 3.5 and way more SMX'es to work with).

Christian
sr. member
Activity: 350
Merit: 250
What build are you using and what os?
i just compiled the latest cudaminer from github, running with this config i still only get 3.76khash/s

./cudaminer --algo=scrypt-jane -H 0 -i 0 -d 0 -l T20x1 -o http://127.0.0.1:3339 -u user -p pass -D

i may be missing some parameters, like i don't have -c set
13G
newbie
Activity: 17
Merit: 0
Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it

I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now

Win7 x64, 332.21, memory usage 2360MB from 6144MB


GTX 780 hits 4.4khash/s Huh

post please parameters, gpu Mhz and build..incredible :-)
full member
Activity: 182
Merit: 100
My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.

It is using 2.89gb of memory which is the main limiter
My GTX 780 hits 4.1khash/s in interactive and 4.4khash/s when not interactive, it has a stock overclock though... (asus one)
sr. member
Activity: 350
Merit: 250
My 780 is at stock with a build from the start of january as I havent botherd to compile a new one yet.

It is using 2.89gb of memory which is the main limiter
ktf
newbie
Activity: 24
Merit: 0
3.7kh/s seems a bit low on 780 seeing how my GTX 660 gets 3.3 or so @ 1200mhz.
sr. member
Activity: 350
Merit: 250
Is that windows or linux? My 780 gets 3.77khash/s and sometimes higher with only 3gb memory. On linux that it

I found it best to run a stock config and check how much memory was used. Then up it till its close to full and compare the rates. I believe my yacoin is 20x1 right now
13G
newbie
Activity: 17
Merit: 0
any tipps for the best settings for Yacoin on GTX TITAN?

3.73khash @ 1097Mhz with "-a scrypt-jane -d 0 -i 0 -H 2 -C 0 -m 0 -b 32768 -L 3 -l K95x2 -s 120"
(custom bios clocks 930/6600 instead of stock 836/6000)

cudaminer 2014-01-20 (beta) x64 version downloaded from this forum
Jump to: