[ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1112.

nst6563

sr. member

Activity: 252

Merit: 254

Quote from: dbabo on April 22, 2013, 06:53:23 PM

Quote from: nst6563 on April 22, 2013, 06:48:33 PM

...
If you overclock that gt430 a bit you can get in the 30kh/s range.
I currently get 36kh/s on my gt430 with configuration 20x8.

and how do i do that? Roll Eyes

Google a tool called NvidiaInspector (I think it's from TechPowerup). It will let you adjust the fan speeds, voltage, and clock speeds of the core/mem/shader of most all nvidia cards.

I use it to set the clocks on my gt430 card to 882Mhz core, 810Mhz mem, 1760Mhz shader, .990v. That combo yields between 36kh/s-38kh/s. If I go any higher than that I get the driver crash.

Your mileage may vary though on the clock speeds you can attain. I have an EVGA GT430 so I'm not sure how it compares to other flavors.

dbabo

newbie

Activity: 41

Merit: 0

Quote from: nst6563 on April 22, 2013, 06:48:33 PM

...
If you overclock that gt430 a bit you can get in the 30kh/s range.
I currently get 36kh/s on my gt430 with configuration 20x8.

and how do i do that? Roll Eyes

nst6563

sr. member

Activity: 252

Merit: 254

Quote from: dbabo on April 22, 2013, 06:29:09 PM

Quote from: cbuchner1 on April 22, 2013, 05:50:58 PM

Quote from: Misiolap on April 22, 2013, 05:48:47 PM

Yes, without this on 64bit it kernel dies due to Warp Misaligned Address on compute_10.

ok then I'll put that in and make this a final upload for today. Thank you for the explanations and for developing the patch.

well deserved bottle of wine shall be corked:

Code:

[2013-04-22 19:29:03] 1 miner threads started, using 'scrypt' algorithm.
[2013-04-22 19:29:14] GPU #0: GeForce GT 430 with compute capability 2.1
[2013-04-22 19:29:14] GPU #0: interactive: 1, tex-cache: 0 , single-alloc: 0
[2013-04-22 19:29:14] GPU #0: Performing auto-tuning (Patience...)
[2013-04-22 19:29:21] GPU #0: 24.34 khash/s with configuration 4x6
[2013-04-22 19:29:21] GPU #0: using launch configuration 4x6
[2013-04-22 19:29:21] GPU #0: GeForce GT 430, 4608 hashes, 0.26 khash/s
[2013-04-22 19:29:21] GPU #0: GeForce GT 430, 1536 hashes, 15.62 khash/s
[2013-04-22 19:29:25] GPU #0: GeForce GT 430, 78336 hashes, 22.29 khash/s
[2013-04-22 19:29:30] GPU #0: GeForce GT 430, 112128 hashes, 22.11 khash/s
[2013-04-22 19:29:35] GPU #0: GeForce GT 430, 110592 hashes, 21.79 khash/s
[2013-04-22 19:29:40] GPU #0: GeForce GT 430, 109056 hashes, 21.37 khash/s
[2013-04-22 19:29:45] GPU #0: GeForce GT 430, 107520 hashes, 22.23 khash/s

and no i686 deps

first coin goes to you
thank you!

If you overclock that gt430 a bit you can get in the 30kh/s range.
I currently get 36kh/s on my gt430 with configuration 20x8.

dbabo

newbie

Activity: 41

Merit: 0

Quote from: cbuchner1 on April 22, 2013, 05:50:58 PM

Quote from: Misiolap on April 22, 2013, 05:48:47 PM

Yes, without this on 64bit it kernel dies due to Warp Misaligned Address on compute_10.

ok then I'll put that in and make this a final upload for today. Thank you for the explanations and for developing the patch.

well deserved bottle of wine shall be corked:

Code:

[2013-04-22 19:29:03] 1 miner threads started, using 'scrypt' algorithm.
[2013-04-22 19:29:14] GPU #0: GeForce GT 430 with compute capability 2.1
[2013-04-22 19:29:14] GPU #0: interactive: 1, tex-cache: 0 , single-alloc: 0
[2013-04-22 19:29:14] GPU #0: Performing auto-tuning (Patience...)
[2013-04-22 19:29:21] GPU #0: 24.34 khash/s with configuration 4x6
[2013-04-22 19:29:21] GPU #0: using launch configuration 4x6
[2013-04-22 19:29:21] GPU #0: GeForce GT 430, 4608 hashes, 0.26 khash/s
[2013-04-22 19:29:21] GPU #0: GeForce GT 430, 1536 hashes, 15.62 khash/s
[2013-04-22 19:29:25] GPU #0: GeForce GT 430, 78336 hashes, 22.29 khash/s
[2013-04-22 19:29:30] GPU #0: GeForce GT 430, 112128 hashes, 22.11 khash/s
[2013-04-22 19:29:35] GPU #0: GeForce GT 430, 110592 hashes, 21.79 khash/s
[2013-04-22 19:29:40] GPU #0: GeForce GT 430, 109056 hashes, 21.37 khash/s
[2013-04-22 19:29:45] GPU #0: GeForce GT 430, 107520 hashes, 22.23 khash/s

and no i686 deps

first coin goes to you
thank you!

Misiolap

newbie

Activity: 14

Merit: 0

Great, now it works out-of-box for me (salsa_kernel), thanks.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: Misiolap on April 22, 2013, 05:48:47 PM

Yes, without this on 64bit it kernel dies due to Warp Misaligned Address on compute_10.

ok then I'll put that in and make this a final upload for today. Thank you for the explanations and for developing the patch.

Misiolap

newbie

Activity: 14

Merit: 0

Yes, without this on 64bit it kernel dies due to Warp Misaligned Address on compute_10.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: Misiolap on April 22, 2013, 05:04:43 PM

Additionally shared buffers for 64bit builds must be 64bit aligned.

Does this also apply when targeting compute_10, sm_10 (which is done in salsa_kernel.cu) ?

Christian

Misiolap

newbie

Activity: 14

Merit: 0

Additionally shared buffers for 64bit builds must be 64bit aligned.

If it's worth to save some memory for 32bit builds something like this can be done:

Code:

#if __x86_64__
#define _64BIT_ALIGN 1
#else
#define _64BIT_ALIGN 0
#endif

And for each buffer:

Code:

__shared__ uint32_t X[WARPS_PER_BLOCK][WU_PER_WARP][32+1+_64BIT_ALIGN];

cbuchner1

hero member

Activity: 756

Merit: 502

Okay, this would be the 2nd attempt for the day.

uint32_t becomes typedef'd as unsigned int
ulong2 becomes uint2
ulong4 becomes uint4

and Titan kernel now does uint2 based memory transactions in a shared memory buffer of [16+2] width, which should reduce warp serialization.

now where's my bottle of wine?

peacefulmind

full member

Activity: 196

Merit: 100

Quote from: cbuchner1 on April 22, 2013, 03:51:29 PM

Quote from: peacefulmind on April 22, 2013, 03:43:39 PM

I am using 4/22 version. Same cmd as before, however total has dropped from 520 to 410kH.
This got 520kH on the 2x titan in 4/17 release.

Wah! I need to empty a bottle of wine now.

Christian,

Perhaps it is something in my setup? This is not made for a mining rig, I use it for day to day and gaming.

990x
12GB RAM
2x Titan
5760x1200 SLI
64bit win7

I have noticed when I set interactive to 1,1 it freezes, also when I try to let it auto-tune it freezes.

I have a ton of games and applications on this machine - so it may be my system.

Perhaps the new 4/22 build needs different settings than the ones I used on the 4/17 build? I will try some more.

My dedicated mining machines are lean and mean and using AMD RADEON on Linux/a few win7 - so it is hard to compare.

You are trailblazing new ground!

Misiolap

newbie

Activity: 14

Merit: 0

Quote from: cbuchner1 on April 22, 2013, 03:50:11 PM

Any suggestion for a portable 32 bit type among 32 bit and 64 bit builds? I thought int changed size depending on architecture, long is always 32 bits, and long long is always 64 bits.

EDIT: I've been reading up on the differences between Microsoft's LLP64 model vs. Unix/Linux LP64 model. I will have to change a few things in the code, then.

Christian

For general purpose vars use uint*_t from

I'm not sure what should be used for CUDA vector types for portability.

On 64bit linux:
sizeof(ulong2): 16, sizeof(uint2): 8

On 32bit linux it's probably 8 for both.

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

wait, 1 titan is 7kh/s slower as my 580? that's sad Sad

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: peacefulmind on April 22, 2013, 03:43:39 PM

I am using 4/22 version. Same cmd as before, however total has dropped from 520 to 410kH.
This got 520kH on the 2x titan in 4/17 release.

Wah! I need to empty a bottle of wine now.

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: Misiolap on April 22, 2013, 03:45:53 PM

Isn't ulong the same as uint on 32 bit builds?
On 64bit linux it breaks things, because ulong is 64bit and uint is 32bit.

Any suggestion for a portable 32 bit type among 32 bit and 64 bit builds? I thought int changed size depending on architecture, long is always 32 bits, and long long is always 64 bits.

EDIT: I've been reading up on the differences between Microsoft's LLP64 model vs. Unix/Linux LP64 model. I will have to change a few things in the code, then.

Christian

cbuchner1

hero member

Activity: 756

Merit: 502

Quote from: tacotime on April 22, 2013, 03:45:28 PM

Do you feel you are at release candidate level yet? I want to add this to guiminer-scrypt when it hits maturity.

hmm I am probably not going to change the console and command line options output now. But stability (error checking) has to be improved before this can even hit beta status.

Misiolap

newbie

Activity: 14

Merit: 0

Isn't ulong the same as uint on 32 bit builds?
On 64bit linux it breaks things, because ulong is 64bit and uint is 32bit.

tacotime

legendary

Activity: 1484

Merit: 1005

Do you feel you are at release candidate level yet? I want to add this to guiminer-scrypt when it hits maturity.

peacefulmind

full member

Activity: 196

Merit: 100

I am using 4/22 version. Same cmd as before, however total has dropped from 520 to 410kH.

Same clocks here is .bat

cudaminer.exe --url http://127.0.0.1:8332/ --userpass xxx.x:123 -i 0,0 -d 0,1 -m 1,1 -C 2,2 -l 84x4,84x4

This got 520kH on the 2x titan in 4/17 release.

cbuchner1

hero member

Activity: 756

Merit: 502

I am like so close -----> <----- to throwing out the texture cache support in 64 bit builds.

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1112. (Read 3426996 times)