Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1001. (Read 3426989 times)

newbie
Activity: 55
Merit: 0
Sorry for question, but can you help to create cudaminer for Microcoin? As I see it must be like for YAC so it won't be difficult, but I never did it before and have no skills Sad
legendary
Activity: 1400
Merit: 1050
I just tried to run the latest version on windows on scrypt with  my newest config of yesterday without L and it seems I lost 100khash/h (was running at 700 (OC...) and now it barely makes 600...)
Do I need to retune ? Or something has changed more drastically ?
newbie
Activity: 9
Merit: 0
Keep up the good work!

PS. Do you take Yacoin donations?

Yeah, you can donate to YBQ4hrUQqEb2EDip1NFwMAgZbvK8hJx5Tn

Good idea about starting a new thread for the scrypt-jane enabled cudaminer, once it is released.

I have made some changes to autotune reliability and speed. It will not assign less blocks than half the multiprocessor count in your card. For example on a GTX 780 it will start autotuning at 6 blocks now (the card has 12 SMX).

Also I made changes to how memory is allocated. The backoff value on Windows is currently 12% of the largest allocation it was able to make. On Linux it is a mere 2%. If I don't back off, autotune will crash pretty badly. It can still occasionally crash with launch timeouts though.

I find that my GTX 660Ti is a better investment than my new GTX 780 card (3 GB each, but 7 vs 12 SMX). At -L 2 the 660Ti totally beats my 780. Meh.

My GT 660 Ti uses -L 2 -l K64x2 -C 1 -b 32768 -i 0 and gets 3.7 kHash/s

Christian


Donation sent.

In case you guys didn't know they just released an update to the Yacoin wallet 0.42.
full member
Activity: 239
Merit: 103
No issues with the YAC wallet on Windows here, but mine does start horribly slowly on Linux (takes up to an hour). I pulled it from the official PPA repository for stable builds.

There is a new stable release on github which speeds up the time it takes to open the wallet on Linux. Not sure if it is in PPA already.
hero member
Activity: 756
Merit: 502
Keep up the good work!

PS. Do you take Yacoin donations?

Yeah, you can donate to YBQ4hrUQqEb2EDip1NFwMAgZbvK8hJx5Tn

Good idea about starting a new thread for the scrypt-jane enabled cudaminer, once it is released.

I have made some changes to autotune reliability and speed. It will not assign less blocks than half the multiprocessor count in your card. For example on a GTX 780 it will start autotuning at 6 blocks now (the card has 12 SMX).

Also I made changes to how memory is allocated. The backoff value on Windows is currently 12% of the largest allocation it was able to make. On Linux it is a mere 2%. If I don't back off, autotune will crash pretty badly. It can still occasionally crash with launch timeouts though.

I find that my GTX 660Ti is a better investment than my new GTX 780 card (3 GB each, but 7 vs 12 SMX). At -L 2 the 660Ti totally beats my 780. Meh.

My GT 660 Ti uses -L 2 -l K64x2 -C 1 -b 32768 -i 0 and gets 3.7 kHash/s

Christian
newbie
Activity: 9
Merit: 0
Should we roll the Lookup-Gap into kernel launch configurations?

I am also considering to also allow specifying the devices like in the following example because whenever I swap cards around on my mainboards, all the device IDs get shuffled by CUDA which is annoying. The strings however would keep working as is, unless you remove the card with the given name.

-d "GT 640, GTX 780 Ti, GTX 660 Ti, GTX 660 Ti#2"

Christian


This is your baby, but those sound like good ideas, in addition to the idea about setting warp ranges for auto tuning.

I would suggest clarifying/cleaning up the display and help pages for new people. You are beginning to make a real dent in the struggle for viable NVidia mining and getting attention across the web. Your baby ought to look its best, right? Maybe once you do a new release even open a new thread (with a link to this one obviously) so people aren't overwhelmed by 130+ pages of old comments pertaining mainly to old versions.

Keep up the good work!

PS. Do you take Yacoin donations?
newbie
Activity: 33
Merit: 0
Anyone having issues with the YAC wallet ? Mine crashes as soon as I start it on windows 7 64 bit...

What is the error you're getting if there is one?

I was getting not able to load block index, but was able to fix it.
hero member
Activity: 756
Merit: 502
Should we roll the Lookup-Gap into kernel launch configurations?

how does T12x32/6 look like to you? ;-)

No issues with the YAC wallet on Windows here, but mine does start horribly slowly on Linux (takes up to an hour). I pulled it from the official PPA repository for stable builds.

The reason for autotune crashes on Windows with lookup gap seems to be rising memory usage during the autotune process. e.g on my 780Ti as soon as the "Memory Used" value shown in GPU-z hits 3072MB, the driver will crash. I could fix it by adding a configurable "backoff" parameter in percent. The default value on Windows should be higher than on Linux, probably around 10% on Windows and 2% on Linux. Alternatively I could allow giving the backoff in MB also.

For a very quick fix in the current source code, increment the parameter 2 in this for loop in salsa_kernel.cu to something higher - like e.g. 2*LOOKUP_GAP. It should fix auto-tuning when single-memory allocation is not enabled.

Code:
               for (int i=0; warp > 0 && i < 2; ++i) {
                    warp--;
                    checkCudaErrors(cudaFree(h_V[thr_id][warp]-h_V_extra[thr_id][warp]));
                    h_V[thr_id][warp] = NULL; h_V_extra[thr_id][warp] = 0;
                }

UPDATE: I also find that CUDA sometimes kills the autotuning process with the error message "the launch timed out and was terminated. This might be fixed by auto-tuning with smaller batchsize (-b) parameters, like e.g. 1024. CUDA has a watchdog timer that will kill kernel calls that take longer than 5 seconds. This is to avoid permanent display freeze when some computation gets stuck.

I am also considering to also allow specifying the devices like in the following example because whenever I swap cards around on my mainboards, all the device IDs get shuffled by CUDA which is annoying. The strings however would keep working as is, unless you remove the card with the given name.

-d "GT 640, GTX 780 Ti, GTX 660 Ti, GTX 660 Ti#2"

Christian
ktf
newbie
Activity: 24
Merit: 0
Anyone having issues with the YAC wallet ? Mine crashes as soon as I start it on windows 7 64 bit...
member
Activity: 84
Merit: 10
SizzleBits
When I go to compile to get lookup_gap I end up with this error

C:\Users\Zak Lantz\Desktop\cudaminer_vc2010_prerequisites\CudaMiner-master\cudaminer.vcxproj : error  : Unable to read the project file "cudaminer.vcxproj".
C:\Users\Zak Lantz\Desktop\cudaminer_vc2010_prerequisites\CudaMiner-master\cudaminer.vcxproj(50,5): The imported project "C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 5.5.props" was not found. Confirm that the path in the declaration is correct, and that the file exists on disk.

I understand what the error is because its an error that its not finding cuda installed because I have it installed on my H drive since my C is a 120 gb SSD so how would I point Visual Studio to look where CUDA is actually installed?
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
On-topic: I tried the new build today, getting up to 4.5kh/s with T68x4 and -L4 on a GTX780. It usually hovers more around 4.3.

Hovering or jittering to me occurs when there's too much memory being used or at least it's borderline.
So for example for me N 14 with L 3 results in 181 warps.
Autotune comes up with K59x3 (= 177) which results in a very stable hashrate, using 1931 VRAM. (using the default 3 measurements)
But using K10x18 (= 180) jitters a bit but on average it's better, even if the VRAM usage keeps jumping between 1942-1963, which if I have to guess is causing the jittering.

Here's a screenshot (with minimum/average/maximum hashrates added in the brackets).

So in addition to my previous post, you can find these borderline kernel configs if you don't touch, or maybe even increase the number of measurements done by autotune, but if you're card is used as primary (has a monitor attached to it), you will be fine with a less accurate autotune since VRAM usage is not static (desktop, background apps, etc).


Also, I guess most of us have their cards overclocked at this point but as the new lookup gap puts more pressure on the cards, our pre-lookup gap overclocks are not that stable anymore, causing crashes.
hero member
Activity: 840
Merit: 1000
Really? Dude drop the entitlement...

Excuse me, but you need to drop something yourself. That being the assumption that you know my motives or what type of person I am. You don't, so knock it off.

It was a sort of tongue-in-cheek comment, but I can see how the humor doesn't come across very well without knowing the intent of the post. If it were intended as you framed it, why would I follow up the comment with a polite request for updated binaries? Anyway I'm getting the prerequisites together as we speak s I can compile it myself. I was not aware that a trial of VS2010 could be used to compile, but now I know.

Thanks for the snap judgment, though. Makes my day when some snooty know-it-all gets something totally wrong. Next time drop the egoistic notion that you've got everything figured out, and you'll be less likely to make the same mistake again.

Thanks cbuchner1 for your continued effort.

Ok - I just don't find posts like that constructive when you could have just asked for help instead (I compiled a version of this for a guy that asked). You're right I shouldn't have judged what type of person you are from that post. I apologize, let's move on.

On-topic: I tried the new build today, getting up to 4.5kh/s with T68x4 and -L4 on a GTX780. It usually hovers more around 4.3.

legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
CudaMiner at the moment is the strongest around an N factor of 14 (compared to ATI/AMD GPU's and CPU's) and YaC is the only one around which makes it the most profitable.
YaC has some issues though so I'm waiting for other coins to get close to N 14.


On another note, if anyone wants to speed up the autotuning process for the cost of some accuracy, you could decrease the number of measurements in salsa_kernel.cu (538)
Code:
while (repeat < 3)  // average up to 3 measurements for better exactness


Also, you can interrupt autotuning with CTRL+C in windows anytime and while it will close cudaMiner, it will show you the best kernel launch config it has found up to that point (handy for skipping the last part in some cases).

member
Activity: 63
Merit: 10
Call me stupid, but why all of a sudden did YAC become a thing?  I use cudaminer for a while mining alt coins, and check this thread once in while.  But it's all about yac now.  Is it the most profitable coin to mine with a Nvid card now? Did not see it traded on Cryptsy at all, so not sure on what it's all about.   Huh
legendary
Activity: 1400
Merit: 1050
The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian


Here what I got with my 780ti: L3   29x7  => 4,78 khash/s
                                                   L4  137x2 => 5.09
                                                   L5  169x2 => 5.1
                                                   L6  60x8   => 5.22
In principle there should be somewhat better timing. In script the best one are multiple of the cuda cores number (no reason it doesn't work this way for scrypt-jane).
I can't monitor the power usage on linux, but I use a self modbios to allow up to 150% of the tdp, but I don'tthink it has any impact, since I can't change the power limit)
full member
Activity: 182
Merit: 100
The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian


I am sure you can squeeze more out of your GTX 780, I get 3.87-3.90 khash/s with -l T64x2 -b 8192 -L 2 -i 0 --algo=scrypt-jane.
hero member
Activity: 756
Merit: 502
The lookup gap has turned my 10 kHash/s 450 Watts Yacoin mining rig into a devilish 14 kHash/s 666 Watts mining rig. Not quite as high as I had hoped for, but the new Wattage is nice.

I run GTX 780 with -L 6 -l 12x32    up to 3.65 kHash/s
and GTX 780Ti with -L 6 -l 15x32   up to 4.7 kHash/s

still quite an easy to remember formula with a decent performance. There may be better values but that is what I found within an hour of tinkering.

Christian

full member
Activity: 120
Merit: 100
Astrophotographer and Ham Radioist!
OP, how about autotune crashing on Fermi kernels? I think they need some love as well, any news on their progress?
ktf
newbie
Activity: 24
Merit: 0
It seems that running -L 2 it was set to K59x2, which was netting almost 3khash/s.

 If I try to specify however -l K59x2 I get errors :

[2014-01-19 17:49:25] GPU #1: cudaError 4 (unspecified launch failure) calling ' cudaStreamSynchronize(context_streams[1][thr_id])' (C:/__test/CudaMiner-master/s alsa_kernel.cu line 164)

 I tried with different values and I get the same error. It only works if I don't use the -l flag.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
Is it just a lucky streak getting a share accepted every few seconds? For me it sometime takes several minutes. I do farm on another pool thou (yac.m-s-t.org). Or is that due to the pool settings? I never really read into all those stratum tcp pps pplns etc stuff...

yac.coinmine.pl has a fixed, low difficulty (no vardiff) so you're submitting shares faster. But you're not getting more coins, though Tongue
Jump to: