Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1024. (Read 3426989 times)

hero member
Activity: 756
Merit: 502
Gotcha, thanks!

Finally got around to installing Visual Studio and compiling the newer version, and I got a drop of about 20 khash/s on my GTX 760  (~290 with the 12/18 build down to ~270). Is this normal? From what I read, there was suppose to be a bump up in hash rate if you used -H 2 now instead of -H 1. I built it in Release mode for x64.

the added flexibility for scrypt-jane indeed costs some performance. I may later decide to roll out dedicated scrypt kernels and separate out the jane code. But for now, I am focused on getting the scrypt-jane feature completed first (there's more mining profits in it)





member
Activity: 101
Merit: 10
Miner / Engineer
Require GPU-Z validation links (to verify the speed, voltages, etc) and a screenshot of cudaminer's output along side your MSI Afterburner/PrecisionX/Nvidia Inspector overclock settings and graph.

Can't get any more valid then that.

Btw, the farthest I can downclock my Titan is 1.175 @ 1202 Mhz core and that is because it is limited in the BIOS (no matter how low you go, it only goes down that far).  It's locked in the BIOS.  I am running hte LLC mod which is another 0.025mv (it wasn't stable without it) for a total 1.2V.  Boo.

The "Temp Target" trick is to use your Temp Target limiter to bump down the core voltage even further.  I set mine at 80C, which down-volts my Titan further than I can do it manually.  In some of my screenshots I show 1.100 and 1.075V, at 1202 Mhz core.  Mucho power savings!

You have to remember: voltage = Power usage calculation.  If you lower voltage, you lower the Power usage (and can get away from the Power Target limiter).

The 780 Ti guy followed my lead and set a 72 or 74C or something temp limit, and got his Power usage to just under 111% to not hit the 115% Power Target limit.  By doing this, it down volted his card further than he could manually, and is running a pretty high core speed (1200 mhz I think he said).

His quote:

Quote
Thanks for the inspiration. My 780ti is chugging along, undervolted with 450w at the wall, doing 725khash at 73c relatively quietly. Very pleased and surprised. X86 does a lot better than x64 for me. Currently doing 15x32, 1200 core and using temp slider to throttle to 112tdp without bumping the core down. Memory OC doesn't do anything it seems.

ref: http://forums.evga.com/tm.aspx?m=2089066

 I call this the "Temp Target" down-volting trick.  Smiley  Enjoy!


EDIT: For clarification, the Temp Target limiter is part of Boost 2.0 which is only in the Nvidia 700 series.  Got 600 series, I recommend editing the BIOS yourself the lower the voltages at P3 states (or whatever the top P state is).
newbie
Activity: 8
Merit: 0
Gotcha, thanks!

Finally got around to installing Visual Studio and compiling the newer version, and I got a drop of about 20 khash/s on my GTX 760  (~290 with the 12/18 build down to ~270). Is this normal? From what I read, there was suppose to be a bump up in hash rate if you used -H 2 now instead of -H 1. I built it in Release mode for x64.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
Looking at some of the entries for my card (GTX 760), i find a couple results very suspicious. There's no way a 760 can get up to ~500 khash/s. 2x760 in SLI, maybe, but that entry doesn't have any sort of information. The most I've been able to get mine to get up to is ~300 khash/s, but it crashes after a while. Stable, about ~290-292 khash/s

I also don't see how you can underclock the GPU Memory Clock -442. As soon as I set those settings, it instantly froze and crashed (Although I don't know TOO much about OC, so I could be wrong on this part).

Well, I'm just handling the spreadsheet, not guaranteeing it's accuracy, though I have to admit some entries seem odd.


Memory speed is a confusing one, most cards today have GDDR5 memory on them which means their effective clock speed is 4 times the default.
So for example my car has 1502 Mhz memory clock, which is 6008 Mhz effective speed. So when I let's say downclock it by 2000 Mhz in OC Guru, it will end up being 1000 Mhz instead of 1502 due to the multiplier. To further confuse the matter, some OC tools are using a 2 times multiplier so for example in EVGA Precision a -502 Mhz downclock (max) results in ~1251 Mhz and some use no multiplier, just the default speed. Which is yet another reason to use GPU-Z when it comes to overclocking.
This was all new to me too and I didn't clarified things in the survey when I made the spreadsheet so the memory clock column is all over the place, so yeah.. that's on me.
newbie
Activity: 8
Merit: 0
We're using the "Temp Target" trick I figured out to undervolt the cards even further than MSI Afterburner/PrecisionX will allow u.

You could probably downclock your memory to get more headroom for GPU overclock.


Please spread the word about our spreadsheet/survey combo:

Have a look at the Google Docs Spreadsheet for configuration and performance data. Please enter new data using this form.

Looking at some of the entries for my card (GTX 760), i find a couple results very suspicious. There's no way a 760 can get up to ~500 khash/s. 2x760 in SLI, maybe, but that entry doesn't have any sort of information. The most I've been able to get mine to get up to is ~300 khash/s, but it crashes after a while. Stable, about ~290-292 khash/s

I also don't see how you can underclock the GPU Memory Clock -442. As soon as I set those settings, it instantly froze and crashed (Although I don't know TOO much about OC, so I could be wrong on this part).

It needs to have some sort of moderation, or require screenshot proof of achieved hash rates.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
We're using the "Temp Target" trick I figured out to undervolt the cards even further than MSI Afterburner/PrecisionX will allow u.

You could probably downclock your memory to get more headroom for GPU overclock.


Please spread the word about our spreadsheet/survey combo:

Have a look at the Google Docs Spreadsheet for configuration and performance data. Please enter new data using this form.
member
Activity: 101
Merit: 10
Miner / Engineer
Been trying to find out more about this scrypt-jane news... Anyone got a link, info?  Or, willing to write an overview of what it is?  

scrypt-jane is a feature under development and some fearless people are trying it out. It is used in Yacoin and other Alt-coins and currently nVidia cards have an edge (around factor 2) over ATI cards. Hence the mining returns (payouts) are higher - and power costs are reduced also.

scrypt-jane currently does 32768 iterations and requires 4 MB of memory per hash. scrypt does 1024 iterations and requires 128kb per hash. This causes scrypt-jane to work most efficiently on mid range GPUs with 2-4 GB of RAM. High end nVidia GPUs are a bit faster, but only slightly.

Thanks for that!  I might try this out.

my 2x Titans [are] getting 700.XX khash/s per GPU (1400-1430 khash/s for both cards).

Very good numbers! My GTX 780Ti only yield 540 kHash/s each (on Linux, with a mild BIOS based overclock).

Christian

We've got a 780 Ti reference getting 750 khash/s, and a 780 Ti Classified getting 720-ish khash/s (Classified doesn't like undervolting).  I can get about 660 khash/s from my 780 Classy.  Anything higher (I can get up to 730 khash/s @ 1380 Mhz) uses an ungodly amount of wattage.  Custom BIOS really lets the Classy loose with mining.

I believe they posted in my Titans @ 1350 khash/s thread I linked to above.  They are on stock BIOS as well.

We're using the "Temp Target" trick I figured out to undervolt the cards even further than MSI Afterburner/PrecisionX will allow u.
hero member
Activity: 756
Merit: 502
Been trying to find out more about this scrypt-jane news... Anyone got a link, info?  Or, willing to write an overview of what it is?  

scrypt-jane is a feature under development and some fearless people are trying it out. It is used in Yacoin and other Alt-coins and currently nVidia cards have an edge (around factor 2) over ATI cards. Hence the mining returns (payouts) are higher - and power costs are reduced also.

scrypt-jane currently does 32768 iterations and requires 4 MB of memory per hash. scrypt does 1024 iterations and requires 128kb per hash. This causes scrypt-jane to work most efficiently on mid range GPUs with 2-4 GB of RAM. High end nVidia GPUs are a bit faster, but only slightly.

my 2x Titans [are] getting 700.XX khash/s per GPU (1400-1430 khash/s for both cards).

Very good numbers! My GTX 780Ti only yield 540 kHash/s each (on Linux, with a mild BIOS based overclock).

Christian
member
Activity: 101
Merit: 10
Miner / Engineer
Been trying to find out more about this scrypt-jane news... Anyone got a link, info?  Or, willing to write an overview of what it is?  

I'm currently using the Dec 18th windows build of cudaminer on my 2x Titans and getting 700.XX khash/s per GPU (1400-1430 khash/s for both cards).

If I am reading that correctly, I am getting 1.4 Mhash/s, not 1.4 khash/s.  1.4 khash/s reads to me as just 1 khash/s, whereas I get 700 khash/s on my Titan.  

Am I missing something here?

cudaminer -d 0,1 -i 0,0 -m 1,1 -l T14x30,T14x30 -H 1
 
This combination now yields, at 80C on reference coolers:
 
[2014-01-05 12:58:32] GPU #0: GeForce GTX TITAN, 23802240 hashes, 700.59 khash/s
[2014-01-05 12:58:32] GPU #1: GeForce GTX TITAN, 10940160 hashes, 695.48 khash/s
[2014-01-05 12:58:39] GPU #0: GeForce GTX TITAN, 4945920 hashes, 701.44 khash/s
[2014-01-05 12:58:39] accepted: 360/363 (99.17%), 1397 khash/s (yay!!!)
[2014-01-05 12:58:43] GPU #0: GeForce GTX TITAN, 2244480 hashes, 704.56 khash/s
[2014-01-05 12:58:43] accepted: 361/364 (99.18%), 1400 khash/s (yay!!!)

Screenshots and proof are here:

http://forums.evga.com/tm.aspx?m=2089066

And here:

http://forums.evga.com/tm.aspx?m=2091247#2091885


As a comparison, my smallest GPU, EVGA GTX 460 EE, gets 140 khash/s overclocked and undervolted.
sr. member
Activity: 350
Merit: 250
Ahh, I have the ud3 at home in a draw  nice buy on the gpus though. Let us know how it performs :-)
hero member
Activity: 756
Merit: 502
What board is it? :s seems like it has way too many pci-x lanes

GA-990FXA-UD5

there is also an -UD7 version with one more PCI-x slot, but it's too pricey.

Christian
sr. member
Activity: 350
Merit: 250
What board is it? :s seems like it has way too many pci-x lanes
hero member
Activity: 756
Merit: 502
Just wanted to recompile newest version but received this error (downloaded up until commit 81):

I pushed a fix for that...

NOTE:

1) autotune was improved (with scrypt-jane it can use up to 4 GB memory now on 32 bit systems, 12 GB on 64 bit systems - untested)
2) the -C 2 option now works with Kepler kernels in scrypt-jane mode...
3) automatic upgrade from -C 1 to -C 2 if more than 2 GB of RAM are used (the -C 1 has an upper limit of 2 GB)
hero member
Activity: 756
Merit: 502
good luck! my gpu cost almost as much as that entire build so will be good to see how it runs. is it 3 or 4 660's your running?

4 x GTX 640  (4x90 Euro)
1 x GTX 660  (150 Euro)

20 GB of Video RAM total, 4 GB of system memory. And a puny AMD Sempron 145 single core as CPU.

The mainboard has 5 PCI-x x16 slots, however with single slot spacing. And I don't want to buy risers.

Mainboard, SSD boot disk, RAM, CPU and power supply and case total at around 300 Euros.
member
Activity: 106
Merit: 10
Just wanted to recompile newest version but received this error (downloaded up until commit 81):
Code:
 C:\Users\Patrick\Desktop\cudaminer\cudaminer>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin"  -I. -Icompat -Icompat\jansson -Icompat\getopt -I"..\pthreads\Pre-built.2\include" -I"..\curl-7.29.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" --opencc-options -LIST:source=on   --keep --keep-dir Release -maxrregcount=64 --ptxas-options=-v --machine 32 --compile -cudart static     -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o Release\salsa_kernel.cu.obj "C:\Users\Patrick\Desktop\cudaminer\cudaminer\salsa_kernel.cu"
[color=red][b]C:/Users/Patrick/Desktop/cudaminer/cudaminer/salsa_kernel.cu(798): error : identifier "opt_benchmark" is undefined[/b][/color]
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\BuildCustomizations\CUDA 5.5.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin\nvcc.exe" -gencode=arch=compute_10,code=\"sm_10,compute_10\" --use-local-env --cl-version 2012 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin"  -I. -Icompat -Icompat\jansson -Icompat\getopt -I"..\pthreads\Pre-built.2\include" -I"..\curl-7.29.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" --opencc-options -LIST:source=on   --keep --keep-dir Release -maxrregcount=64 --ptxas-options=-v --machine 32 --compile -cudart static     -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /Zi  /MD  " -o Release\salsa_kernel.cu.obj "C:\Users\Patrick\Desktop\cudaminer\cudaminer\salsa_kernel.cu"" exited with code 2.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I guess the second line is the problem?
sr. member
Activity: 350
Merit: 250
hmm, that sucks but its better then my rubbish 2.77khash/s in windows.
so literally on this all that matters is how much memory it has, not too much computing power. and Linux clearly makes a huge difference to performance

good luck! my gpu cost almost as much as that entire build so will be good to see how it runs. is it 3 or 4 660's your running?
hero member
Activity: 756
Merit: 502
cbuchner1 thi is my 780 running at over 3.7khash/s!
it is just a straight screen shot not cropped sorry

Not bad, but half of that 780's compute capability is sitting unused. A GTX 660Ti or GTX 760 could would probably achieve similar hash rates...

I started to build a cheap (800 Euro) 12 kHash/s machine. Wish me luck.

NOTE: there are GTX 760 and GTX 660 (non Ti) models with 4 Gigs of RAM. These can then run 31 warps simultaneously, however their compute capability is only 3.0.

For scrypt-jane, ideally per SMX you want to run 8 warps (internally this number is multiplied by 4 because 4 threads cooperate to compute 1 hash).

GTX 670Ti: 7 SMX  (not enough memory for optimal config K7x8)
GTX 760: 6 SMX     (not enough memory for optimal config K6x8)
GTX 660: 4 SMX (optimal config probably K4x8, but because of some CUDA memory overhead you have to use K31x1)
GTX 440: 2 SMX (optimal config: K2x8)

Christian


sr. member
Activity: 350
Merit: 250
I dont even know what a 51% attack is haha
hero member
Activity: 490
Merit: 501
so my gpu and cpu now mine together at over 4.4khash/s!!!!

are you planning a 51% attack?  Grin
newbie
Activity: 15
Merit: 0
Can't seem to compile latest git clone. I run ./autogen.sh (which doesn't output anything), ./configure then make, and I get the following errors:

Code:
nvcc -g -O2 -Xptxas "-abi=no -v" -arch=compute_10 --maxrregcount=64 --ptxas-options=-v -I./compat/jansson -o salsa_kernel.o -c salsa_kernel.cu
salsa_kernel.cu(479): error: too few arguments in function call

salsa_kernel.cu(742): error: more than one instance of overloaded function "cuda_scrypt_core" has "C" linkage

salsa_kernel.cu(760): error: too few arguments in function call

3 errors detected in the compilation of "/tmp/tmpxft_00004701_00000000-6_salsa_kernel.cpp1.ii".

Any ideas on fix?
Jump to: