[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1064.

FSKT

newbie

Activity: 33

Merit: 0

So my results for the two new version. I am using a 560 GTX a little OC with a i5 4670K.

01-12 => 07-12 : I lost 10khash/s (140 => 130)

01-12 => 07-12 : I lost 5khash/s (140 => 135)

I am using the x64 version with this options :

Code:

cudaminer.exe -o stratum+tcp://azertyuiop.com:1234 -O ME.1:1 -H 1 -i 0

My launch config on the 01-12 version which give me my best results (142khash/s)

Code:

 -l F111x2

01-12 => 07-12 & with -H 2 : I lost 7khash/s (140 => 133)

Am I doing something wrong?

Thanks !

DuckDodgers

newbie

Activity: 20

Merit: 0

The x86 binary of the new build is a tad faster on my GTX580. Still using the -H1 switch, since my oldie i7-920 is way too fast to be even bothered into a full power state anyway.

vosovich

newbie

Activity: 28

Merit: 0

Quote from: cbuchner1 on December 09, 2013, 06:58:01 PM

Tired of heavy CPU load? Change your -H 1 switch to -H 2 (or just remove the switch, as 2 is now the default). Your GPU will do ALL the work - it may run a little hotter though. This release is now suitable for 1 MHash/s mining rigs or even bigger, running on cheap mainboards with low end CPUs - even an Intel Atom would do Wink

Of course the required nVidia GPUs doing this kind of hash rates aren't cheap...

Download the 2013-12-10 release. I also cleaned up the Readme a bit, fixed a bug that negatively affected hash validation on some cards.

With -H 2 (full offloading to GPU) it may be more efficient to run the x86 binary of cudaminer as the x64 version has increased register pressure in some CUDA kernels, leading to slightly lower hash rates sometimes. Because the cudaminer binary is mostly idling now, there's almost no use running the more bloated x64 binary.

Christian

I just tried the newest version, both x64 and x86 with the various settings for -H. The x64 builds used to be the most efficient for my GTX560, but the 12-10 x86 build gives an all-around 4% hashrate improvement over the 12-07 x64 builds. This holds for all the -H settings. Excellent stuff!

blackraven1425

member

Activity: 98

Merit: 10

Quote from: cbuchner1 on December 09, 2013, 06:58:01 PM

2013-12-10 release

Using the new -H 2 option, with either x64 or x86, I'm sitting a few (3-4) khash lower than using -H 1. Obviously it's likely to have a much different effect on a lower end system like an Atom.

Vanderi

sr. member

Activity: 283

Merit: 250

Wow, great work Buchner. Again I love my twin GTX 680s a bit more, which isn't so little to begin with!

cbuchner1

hero member

Activity: 756

Merit: 502

Tired of heavy CPU load? Change your -H 1 switch to -H 2 (or just remove the switch, as 2 is now the default). Your GPU will do ALL the work - it may run a little hotter though. This release is now suitable for 1 MHash/s mining rigs or even bigger, running on cheap mainboards with low end CPUs - even an Intel Atom would do Wink

Of course the required nVidia GPUs doing this kind of hash rates aren't cheap...

Download the 2013-12-10 release. I also cleaned up the Readme a bit, fixed a bug that negatively affected hash validation on some cards.

With -H 2 (full offloading to GPU) it may be more efficient to run the x86 binary of cudaminer as the x64 version has increased register pressure in some CUDA kernels, leading to slightly lower hash rates sometimes. Because the cudaminer binary is mostly idling now, there's almost no use running the more bloated x64 binary.

Christian

Wizzard

newbie

Activity: 11

Merit: 0

I also thought so, but I have libcudart5.0 installed...

edit: finally, it compiled successfully, I don't understant why

cbuchner1

hero member

Activity: 756

Merit: 502

probably not using the CUDA 5.0 SDK? this function was added for Kepler type devices and probably came with CUDA release 5...

Wizzard

newbie

Activity: 11

Merit: 0

I am sorry, what I am doing wrong if I get this kind of output?

make[2]: Entering directory `/home/wizzard/CudaMiner'
g++ -g -O2 -o cudaminer -pthread -L/usr/local/cuda/lib64 cudaminer-cpu-miner.o cudaminer-util.o cudaminer-sha2.o cudaminer-scrypt.o salsa_kernel.o spinlock_kernel.o legacy_kernel.o fermi_kernel.o test_kernel.o titan_kernel.o -L/usr/lib/x86_64-linux-gnu -lcurl compat/jansson/libjansson.a -lpthread -lcudart -fopenmp
salsa_kernel.o: In function `find_optimal_blockcount(int, KernelInterface*&, bool&, int&)':
/home/wizzard/CudaMiner/salsa_kernel.cu:286: undefined reference to `cudaDeviceSetSharedMemConfig'
collect2: error: ld returned 1 exit status
make[2]: *** [cudaminer] Error 1
make[2]: Leaving directory `/home/wizzard/CudaMiner'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/wizzard/CudaMiner'
make: *** [all] Error 2

cbuchner1

hero member

Activity: 756

Merit: 502

the github repo contains something quite significant: SHA256 was moved onto the GPU! It isn't fully optimized code yet, but it works. Not only does it give an effective speed-up, it also lowers the CPU load to near zero.

Now when I upgrade the PSU on my dedicated mining rig to 1.1 kW, I should be able to power 3 GTX 780 TI cards, hopefully. At the moment the "800W" PSU craps out when I run two cards under load. The 12V Rail of that thing is just so under-dimensioned, it's not funny.

I also found the cause for validation problems on some newer cards (the code to detect the card's ability for overlapping kernel execution was broken)

the brave can try github.... The not so brave have to wait for a new release...

-H 0 : single threaded CPU SHA256 hashing
-H 1 : multi threaded CPU SHA256 hashing
-H 2 : GPU based SHA256 hashing (now the default)

I also found out that my code to overlap memory transfers and kernels was completely NOT working. Which is why moving the SHA256 part to the GPU results in an effective speed-up (there's now only memory copies from the GPU to the CPU - and it is much less data!). I will fix mentioned problem when I am in a fixing mood Wink

Christian

Wizzard

newbie

Activity: 11

Merit: 0

Thank you very much, but it does not work as I expected. After installing the necessary libcudart5.5 i386 from debian and libgomp i386 from Ubuntu, it crashes (segmentation fault) with previous error GPU #0: with compute capability 0.0

mrm0

member

Activity: 89

Merit: 10

Quote from: Wizzard on December 09, 2013, 10:04:12 AM

Any binary for Linux x64/x32, please? Cannot compile it in my KUbuntu x64 system.

A 32 bit, cudaminer version 2013-11-20 (alpha), compiled on Ubuntu 12.04.3 LTS

download 'whitecuda.jpg' image from http://postimg.org/image/ep07wmzjh/

verify the downloaded size, should be 4391396 Bytes

now do this:

Code:

$ dd if=whitecuda.jpg of=cudaminer bs=1 skip=1784

and there is your cudaminer binary - size 4389612 Bytes.

BTW: you really shouldn't trust binaries from the Internet...

Wizzard

newbie

Activity: 11

Merit: 0

Any binary for Linux x64/x32, please? Cannot compile it in my KUbuntu x64 system.

trell0z

newbie

Activity: 43

Merit: 0

Quote from: cbuchner1 on December 08, 2013, 09:55:04 PM

Quote from: trell0z on December 08, 2013, 05:27:45 PM

Problem is, with capital letters it's just "unknown", with non-capital the program just crashes, even with "f", which is the correct one for my card..

you'll get a warning that the launch config is unrecognized, but it does indeed work (assuming you use uppercase letters)

But you need to watch out for minimum requirements regarding compute capability.

Christian

Oh ok cool! Will try to experiment a bit then.
Do you think you could change the text there to be a bit more obvious about it actually working? I mean I understand this isn't exactly something everyone will do, but should be a minor code change I guess with just some text?

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: ?? on ??

EDIT: Hmmm, just restarted my computer for the first time today, immediately opened up the cudaminer upon getting back to my desktop and noticed the same problem where the GPU utilization is only at 50% or so (getting 114Khash/s). By the time I opened this browser and typed the first sentence of this edit, the GPU load was already at 97% and my hashrate climbed to 209Khash/s. Anyone else having similar issues?

What OS are you running on? I have experienced this kind of issue on Windows Server 2012 R2, which is somewhat similar to Windows 8.1, I suppose.

Greg121986

newbie

Activity: 8

Merit: 0

Quote from: ?? on ??

EDIT: Hmmm, just restarted my computer for the first time today, immediately opened up the cudaminer upon getting back to my desktop and noticed the same problem where the GPU utilization is only at 50% or so (getting 114Khash/s). By the time I opened this browser and typed the first sentence of this edit, the GPU load was already at 97% and my hashrate climbed to 209Khash/s. Anyone else having similar issues?

I am having the same sort of issue. I am using -i 0 flag which typically results in 99% utilization for my GTX 760. On the prior cudaminer release I was not getting more than 75%, usually at 50%. The odd thing with the 12-07 release is I get 99% utilization when I am using my PC. If I leave the PC I see the utilization jump up and down quite often. After I leave the system running untouched I return to see that my hash rate goes between 155-201.

Also, I really do not understand the varying use of kernals. Is there a list of the kernals we can try? Does this equate to optimizations available in the CUDA architecture for each generation of silicon?

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: trell0z on December 08, 2013, 05:27:45 PM

Problem is, with capital letters it's just "unknown", with non-capital the program just crashes, even with "f", which is the correct one for my card..

you'll get a warning that the launch config is unrecognized, but it does indeed work (assuming you use uppercase letters)

But you need to watch out for minimum requirements regarding compute capability.

Christian

trell0z

newbie

Activity: 43

Merit: 0

Yeah it's the whole -l F / -l f (and that's even the correct kernel for my card, 580) etc thing that doesn't work, at least as far as I can understand the readme we're supposed to be able to do something like that, and it will autotune with my choosen settings, and for the kernel that I specify from the list of L/F/K/T.
Only result I get though is either "Given launch config 'x' does not validate" or the program crashes so yeah.. Otherwise I'm using the given launch config of F16x14 which I got out of normal autotune run with -D yes.

vosovich

newbie

Activity: 28

Merit: 0

Quote from: blackraven1425 on December 08, 2013, 05:56:49 PM

Quote from: vosovich on December 08, 2013, 05:43:48 PM

So it does pick a configuration when you use (capital) F, but you don't know which? If so, use the -D flag for debugging mode. It will give you more information about the auto-tuning process.

Seems like using just a K for me shows "Given launch config 'K' does not validate".

Interesting aside, it seems the 98x2 I've been using doesn't show up in the debug for autotune, despite being the best config I've found so far. This debug panel gives a few more leads to check out, though.

I was convinced that using -l F had worked for me in the past. However, I just checked to make sure and I can confirm what you have said. It does not work for me either.

EDIT for your EDIT: Yes, that does seem to be the case. What I did is run autotune a bunch of times to pick a few candidates. I ran it until I was convinced that no new configurations would pop up. Then I let the candidates run for a while and finally I picked the one with the best hashrate.

blackraven1425

member

Activity: 98

Merit: 10

Quote from: vosovich on December 08, 2013, 05:43:48 PM

So it does pick a configuration when you use (capital) F, but you don't know which? If so, use the -D flag for debugging mode. It will give you more information about the auto-tuning process.

Seems like using just a K for me shows "Given launch config 'K' does not validate".

Interesting aside, it seems the 98x2 I've been using doesn't show up in the debug for autotune, despite being the best config I've found so far. This debug panel gives a few more leads to check out, though.

EDIT: Looking through the results of a couple tries, it seems the results of autotune are pretty inconsistent. Is each configuration being tested only once? Maybe an average more tests per config would work better as an option for people who are going to use the results to find an ideal configuration to set themselves instead of as a final configuration.

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1064. (Read 3426991 times)