[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1003.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: coercion on January 19, 2014, 01:14:49 AM

I've been playing with the lookup gap. My GTX 780 went from 3.7 kH/s to 5.0 kH/s on Yacoin with -L 5. (-L 4 produces 4.948, -L 3 4.535, and -L 2 was almost no improvement)

[2014-01-18 22:09:22] GPU #0: Performing auto-tuning (Patience...)
[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_idata, mem_size)' (salsa_kernel.cu line 499)

can't wait to try -L on my three 780Ti cards at home - hoping for 5-6 kHash/s per device. Right now I am at a meeting of computer geeks demo'ing one of my mining rigs...

I will have to improve the memory management a lot, both on Windows and on Linux. This out of memory problem is annoying.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: col_oddball on January 18, 2014, 08:47:24 PM

cbuchner1: do you plan to implement lookup gap for scrypt???

it's implemented but Salsa20/8 (N=1024) is mostly compute bound on nVidia and there is no benefit seen from this feature.

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Quote from: coercion on January 19, 2014, 01:14:49 AM

I've been mining an N Factor 13 coin as of late, and my 780 went from 10.7 to 16.0 kH/s with -L 3.

I'm doing the same (zcc) and lookup gap had no effect for me on my 660 (2GB), it stayed at 10.0 kH/s.

Here's my oversimplified take on lookup gap:
An increased lookup gap is virtually giving the GPU more VRAM to play with, but that only helps if the GPU was bottlenecked by the amount of VRAM in the first place so it doesn't help a thing if the GPU was already sweating to get the job done.

So an increased lookup gap with my mediocre GPU and mediocre VRAM amount, at:
N 13 had no effect (GPU wasn't bottlenecked by VRAM);
N 14 had a 30% performance increase;
N 15 had a 100% performance increase because the memory bottleneck is the worst here.

coercion

newbie

Activity: 34

Merit: 0

I've been playing with the lookup gap. My GTX 780 went from 3.7 kH/s to 5.0 kH/s on Yacoin with -L 5. (-L 4 produces 4.948, -L 3 4.535, and -L 2 was almost no improvement)

My GT 640s received no benefit at N Factor 14. At N Factor 15 they produce 0.684 kH/s. If I recall they maxed out at at 0.6 previously.

I've been mining an N Factor 13 coin as of late, and my 780 went from 10.7 to 16.0 kH/s with -L 3. Cudaminer does not fair well with NF=13 when I try to autotune with -L 3 if I don't also specify -lT.

Code:

Masochist:CudaMiner mark$ ./cudaminer --algo=scrypt-jane:13 -d0 -m1 -i0 -L3 --benchmark -D
*** CudaMiner for nVidia GPUs by Christian Buchner ***
This is version 2013-12-18 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
Cuda additions Copyright 2013 Christian Buchner
My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

[2014-01-18 22:09:13] 1 miner threads started, using 'scrypt-jane' algorithm.
[2014-01-18 22:09:13] DEBUG: got new work in 1 ms
[2014-01-18 22:09:13] Given scrypt-jane parameters: 13
[2014-01-18 22:09:13] Nfactor is 13 (N=16384)!
[2014-01-18 22:09:20] GPU #0: GeForce GTX 780 with compute capability 3.5
[2014-01-18 22:09:20] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 1
[2014-01-18 22:09:20] GPU #0: 8 hashes / 5.3 MB per warp.
[2014-01-18 22:09:22] GPU #0: Performing auto-tuning (Patience...)
[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_idata, mem_size)' (salsa_kernel.cu line 499)

[2014-01-18 22:09:22] GPU #0: cudaError 2 (out of memory) calling 'cudaMalloc((void **) &d_odata, mem_size)' (salsa_kernel.cu line 501)

[2014-01-18 22:09:22] GPU #0: cudaError 11 (invalid argument) calling 'cudaMemcpy(d_idata, h_idata, mem_size, cudaMemcpyHostToDevice)' (salsa_kernel.cu line 506)

[2014-01-18 22:09:22] GPU #0: maximum warps: 527
[2014-01-18 22:09:22] GPU #0: cudaError 4 (unspecified launch failure) calling 'cudaDeviceSynchronize()' (salsa_kernel.cu line 534)

It proceeds to repeat the last line for each warp in every block. More than a little spammy, though I guess I'm asking with it with the debug flag on, but I'm not really interested in waiting several hours for autotune to complete to find out what my optimal config is, particularly when I'm testing multiple lookup gaps with multiple N factors.

It seems to autotune fine with -L3 if I specify -lT. It will not autotune with -L4 in any case, and will fail in the same fashion as above, although if I just give it a config, -L4 seems to work fine. I'm not too worried about it, looking at the -L2 and -L3 charts makes it pretty clear I was experiencing diminishing returns.

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Quote from: bathrobehero on January 18, 2014, 03:31:23 PM

Quote from: cbuchner1 on January 17, 2014, 05:36:09 PM

Also some of you might want to check if it works for you to specify --algo=scrypt:2048 (or whatever "N" value it is currently at) to mine VertCoin. You can now directly give the N parameter if needed (not the N-factor like with scrypt-jane).

It starts hashing, but as soon as it would found/check a share it crashes, even with different scrypt arguments:

VertCoin schedule:

col_oddball

newbie

Activity: 4

Merit: 0

Quote from: cbuchner1 on January 18, 2014, 06:53:30 PM

Quote from: ManIkWeet on January 18, 2014, 06:51:31 PM

Does the lookup-gap decrease the "value" of a hash or is it only positive effects?

depends entirely on the card. cannot generalize here, sorry.

Also I do not recommend to use a lookup-gap with scrypt mining. I think it only has benefits with scrypt-jane.

That's interesting for cgminer default lookup-gap is 2 and you get increased hash rate.. I guess it comes down to how efficient the salsa20/8 implement is. The lower the number of cycles to complete (salsa20/8) then lookup gap is worthwhile.

FYI:
I been writing a FPGA implementation and the lookup-gap helps increase the hash rate since you can increase the number of scrypt cores.
below shows what can be achieved on a Virtex 6 running @ 150MHz.
total FPGA blockram memory: 1,024 kbytes, you need 128kbytes for lookup-gap=1 therefore 1024/128=8

lookup_gap         1    2    4    8      
total cores:       8   16   32   64      
total FPGA hash   29   49   73   98   kh/s

cbuchner1: do you plan to implement lookup gap for scrypt???

cheers
oddball

dereinehalt

newbie

Activity: 9

Merit: 0

I reach 570-600 khash / s with gtx 780 Ganinwald Phanton GLH.
-H 1 -D -i 0 -l T24x26 -C1
may have a use for it or any suggestions to me ^ ^

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: orrett3 on January 18, 2014, 05:47:36 PM

Right now i've compiled the latest source and this is what I've been getting, mining yacoin with a gtx 770 2GB card.

Other than this do you see anything wrong with what im getting? Is there any way to get more?

The values aren't stellar - but your card does not have enough RAM to make use of all its compute power.
So try my new lookup gap. I have a GTX 760 with 4 GB RAM and it helped a bit. It should help quite a bit
more on your 2 GB card.

Pass -L 2 and autotune (preferrably with the -D flag also given so you see autotune results printed).
Afterwards maybe also check -L 3

ManIkWeet

full member

Activity: 182

Merit: 100

Quote from: cbuchner1 on January 18, 2014, 07:04:11 PM

if you get a higher kHash/s then yes...

GTX 780 is a Compute 3.5 part. I haven't finished the lookup-gap for that kernel yet.

I expect the higher end devices like 660Ti, 760, 770, 780, 780Ti to benefit from the lookup gap.

Also the lower end cards with 1GB (e.g. GT 640 GK208 with 1 GB DDR5 memory)

Very nice! I will patiently wait for you to implement it for the T kernel

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: orrett3 on January 18, 2014, 07:03:15 PM

Would i be adding that a flag on the shortcut or bat file?

one of

-L 2
-L 3
-L 4

added to your bat file or shortcut (on Windows). Anything else stays quite the same.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: ManIkWeet on January 18, 2014, 07:02:19 PM

I mean, lets say I solo mine YAC with GTX 780, and increase lookup-gap, would I find blocks more often? I don't exactly know how the hashrate works...

if you get a higher kHash/s then yes...

GTX 780 is a Compute 3.5 part. I haven't finished the lookup-gap for that kernel yet.

I expect the higher end devices like 660Ti, 760, 770, 780, 780Ti, Geforce Titan to benefit from the lookup gap.

Also the lower end cards with 1GB (e.g. GT 640 GK208 with 1 GB DDR5 memory)

orrett3

newbie

Activity: 33

Merit: 0

Quote from: cbuchner1 on January 18, 2014, 06:47:45 PM

Try the lookup-gap now on Compute 3.0 devices (Kepler kernel). The Titan kernel will follow soon... always autotune for different gap numbers, as configurations will differ wildly

Would i be adding that a flag on the shortcut or bat file?

ManIkWeet

full member

Activity: 182

Merit: 100

Quote from: cbuchner1 on January 18, 2014, 06:53:30 PM

Quote from: ManIkWeet on January 18, 2014, 06:51:31 PM

Does the lookup-gap decrease the "value" of a hash or is it only positive effects?

depends entirely on the card. cannot generalize here, sorry.

I mean, lets say I solo mine YAC with GTX 780, and increase lookup-gap, would I find blocks more often? I don't exactly know how the hashrate works...

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: ManIkWeet on January 18, 2014, 06:51:31 PM

Does the lookup-gap decrease the "value" of a hash or is it only positive effects?

depends entirely on the card. cannot generalize here, sorry.

Also I do not recommend to use a lookup-gap with scrypt mining. I think it only has benefits with scrypt-jane.

ManIkWeet

full member

Activity: 182

Merit: 100

Does the lookup-gap decrease the "value" of a hash or is it only positive effects?

cbuchner1

hero member

Activity: 756

Merit: 502

Try the lookup-gap now on Compute 3.0 devices (Kepler kernel). The Titan kernel will follow soon... always autotune for different gap numbers, as configurations will differ wildly

NOTE: a gap value of 1 actually means no gap. ;-) a gap value of 2 specifies that only every 2nd value is stored in the scratchpad (and the intermediate values being recomputed on the fly), cutting memory use in half. Values of up to 4 may make sense IMHO. start with 2 and work your way up...

the more SMX your card has and the less memory there is, the more benefit you may see.. power consumption may also rise... Users of 1GB and 2GB cards may finally see some better hash rates now.

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Hi, try autotune with -C 0.

orrett3

newbie

Activity: 33

Merit: 0

Hey guys, i've been running cuda miner for a very long time now and would like to say thanks to whoever contributed and also cbuchner1.

Right now i've compiled the latest source and this is what I've been getting, mining yacoin with a gtx 770 2GB card.

originally autotune tuned it to 37x1 up from 9x1 on the last official release, but i was able to manually configure it to 40x1, so autotune is a little off. Also if i do go to 41 i get an error message that is spammed on the screen.

EDIT: the error message is [2014-01-18 17:51:00] GPU #0: cudaError 4 (unspecified launch failure) calling '
cudaEventRecord(context_serialize[stream][thr_id], context_streams[stream][thr_i
d])' (C:/Users/Orrett3/Desktop/Build CudaMiner/source/salsa_kernel.cu line 820)

config: -i 1 -b 32768 -C 1 -l K40x1
Mem usage: 1561 MB
Utilization: 99%
Core offset: +160
Mem offset: -502
As you can see i have some error messages, but i don't think they are affecting the hashrate too much.

Other than this do you see anything wrong with what im getting? Is there any way to get more?

http://i1081.photobucket.com/albums/j348/Orrett3/MiningYacoinMax.png

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Quote from: cbuchner1 on January 17, 2014, 05:36:09 PM

Also some of you might want to check if it works for you to specify --algo=scrypt:2048 (or whatever "N" value it is currently at) to mine VertCoin. You can now directly give the N parameter if needed (not the N-factor like with scrypt-jane).

It starts hashing, but as soon as it would found/check a share it crashes, even with different scrypt arguments:

Scribbles646

newbie

Activity: 52

Merit: 0

(Running Scrypt on Christians 12-10-2013 x64 binary)
Wanted to note odd behavior on my GT 750m's when my Lenovo laptop was set to Optimized Battery Health, while still plugged in and set to Maximum Performance, it cut performance and I ended up with some really random autoconfigs, until I set it back Maximum Battery Life mode. Figured I'd note it in case someone else is having such an issue.

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1003. (Read 3426989 times)