Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 1110. (Read 3426918 times)

full member
Activity: 176
Merit: 100
FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...

I fixed the sorted sheet. sort2  Smiley

Right, but the problem is trying to find the link to these docs in this complete clusterfuck of a 27-page thread. Forum layout isn't always conducive to information archiving and retrieval... posts between the first two and last two pages are pretty much lost in history. :/
sr. member
Activity: 247
Merit: 250
FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...

I fixed the sorted sheet. sort2  Smiley
full member
Activity: 176
Merit: 100
FYI: oughtta post the "put your results here" GDocs link into the first post so it's easy to find... I'm off to go look for it, since I just got access to a GTX 660 to play around with on cudaMiner. I'm installing Win7 on that test machine Smiley

edit: yeah, hidden on page 14, and I'm not even about to try following the discussion around to that post where I found the "sorted" one...
legendary
Activity: 1078
Merit: 1001
Quote

But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar


This! Extracted in the same directory, thats why i've edited my post
hero member
Activity: 756
Merit: 502
Can you post your settings? I'm only pulling 15Kh that was up from 2 but still, seeing you 32 makes me curious

on these old cards it makes a difference whether you run Windows XP, Linux --- or Windows Vista/7/8. It's not a settings issue - I believe the WDDM driver model has issues with these old cards.
newbie
Activity: 20
Merit: 0
Thank you *very* much for this. I only have an Nvidia card and before your project mining anything was just stupid. Cgminer would give me ~28 Mh/s fot BTC but LTC or FC is was only ~2.0 KH/s :< Cudaminer gives me ~32 Kh/s!!!!! This is on a 9800GT.

tl;dr You are frekin awesome and have my gratitude! I have never been able to mine a full unit of any crypto currency but IMHO my best shot is Feather Coin. As soon as I'm able to mine some, I will be more than happy to donate a portion to you. Thanks again!!

Can you post your settings? I'm only pulling 15Kh that was up from 2 but still, seeing you 32 makes me curious
hero member
Activity: 756
Merit: 502
My bet is that this new code now violates the memory coalescing rules for CUDA devices, resulting in low throughput-

Back to the drawing board... Let's see if I can keep the elegance while maintaining memory coalescing.
full member
Activity: 168
Merit: 100
But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar
I saw the file path bar.  Maybe he did extract it that way.  We have no way of knowing.

So I eliminate shared memory alltogether from salsa kernel A, like this (pure elegance). Note the switch to uint4.

Code:
template __global__ void
scrypt_core_kernelA(uint4 *g_idata)
{
    int warpIdx        = threadIdx.x / warpSize;
    int warpThread     = threadIdx.x % warpSize;

    uint4 *V = (uint4*)(c_V[blockIdx.x * WARPS_PER_BLOCK + warpIdx] + SCRATCH*warpThread);
    g_idata += 8 * (blockIdx.x * WU_PER_BLOCK + warpIdx * WU_PER_WARP + warpThread);

    uint4 B[4], C[4]; // registers to store an entire work unit

#define idxloop __pragma(unroll 4) for (int idx=0; idx < 4; idx++)

    idxloop { *V++ = B[idx] = *g_idata++; }
    idxloop { *V++ = C[idx] = *g_idata++; }

    for (int i = 1; i < 1024; i++) {

        xor_salsa8_uint4(B, C); xor_salsa8_uint4(C, B);

        idxloop { *V++ = B[idx]; }
        idxloop { *V++ = C[idx]; }
    }
}

And I get 108 kHash/sec on the 560 Ti 448 core.

I'm like "WTF dude?!"  I expected 250 from this.
Is this like the old sorting memory allocation problem?  Next Fit vs First Fit or whatever?  Sorting is expensive, so forcing organization is actually costing performance?
hero member
Activity: 756
Merit: 502
So I eliminate shared memory alltogether from salsa kernel A, like this (pure elegance). Note the switch to uint4.

Code:
template __global__ void
scrypt_core_kernelA(uint4 *g_idata)
{
    int warpIdx        = threadIdx.x / warpSize;
    int warpThread     = threadIdx.x % warpSize;

    uint4 *V = (uint4*)(c_V[blockIdx.x * WARPS_PER_BLOCK + warpIdx] + SCRATCH*warpThread);
    g_idata += 8 * (blockIdx.x * WU_PER_BLOCK + warpIdx * WU_PER_WARP + warpThread);

    uint4 B[4], C[4]; // registers to store an entire work unit

#define idxloop __pragma(unroll 4) for (int idx=0; idx < 4; idx++)

    idxloop { *V++ = B[idx] = *g_idata++; }
    idxloop { *V++ = C[idx] = *g_idata++; }

    for (int i = 1; i < 1024; i++) {

        xor_salsa8_uint4(B, C); xor_salsa8_uint4(C, B);

        idxloop { *V++ = B[idx]; }
        idxloop { *V++ = C[idx]; }
    }
}

And I get 108 kHash/sec on the 560 Ti 448 core.

I'm like "WTF dude?!"  I expected 250 from this. PTX is more compact than ever, global memory loads and stores are all .v4.u32 (vectorized). So what's going wrong here?
newbie
Activity: 13
Merit: 0
My 680 was mining at 205 kH/s yesterday afternoon on the older, WHQL drivers. Ill be interested to see if the betas help performance at all.

@cbuchner1: if you need anything, don't hesitate to PM me (w/ email since I dont have PM privs yet).
sr. member
Activity: 247
Merit: 250

So, lol. I touch scrypt_core_kernelA() a bit, rearrange shared memory and yay! The hash rate drops from 220kHash/s to 32kHash/s on my fastest card, a GTX 560Ti 448core edition.

Well, lol. Results are still correct though Wink

Christian


Sounds like you're on course to program for Apple! Cheesy
hero member
Activity: 756
Merit: 502

So, lol. I touch scrypt_core_kernelA() a bit, rearrange shared memory and yay! The hash rate drops from 220kHash/s to 32kHash/s on my fastest card, a GTX 560Ti 448core edition.

Well, lol. Results are still correct though Wink

Christian
sr. member
Activity: 247
Merit: 250
new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.
nVidia released a new driver today.  He was talking about that, not the new version of cudaMiner

I'd like to echo the other thoughts here... this tool has made mining on nVidia cards enjoyable again.  Not epicuberdoomminer, but effective when it wasn't previously.  Thanks.

But the user was saying he was using cudaMINER 22-04 with new drivers, but the screenshots show it is running 2013-04-17. Unless he extracted to the same directory everytime.

Check the File Path in the menu bar
full member
Activity: 168
Merit: 100
new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.
nVidia released a new driver today.  He was talking about that, not the new version of cudaMiner

I'd like to echo the other thoughts here... this tool has made mining on nVidia cards enjoyable again.  Not epicuberdoomminer, but effective when it wasn't previously.  Thanks.
newbie
Activity: 9
Merit: 0
Thank you *very* much for this. I only have an Nvidia card and before your project mining anything was just stupid. Cgminer would give me ~28 Mh/s fot BTC but LTC or FC is was only ~2.0 KH/s :< Cudaminer gives me ~32 Kh/s!!!!! This is on a 9800GT.

tl;dr You are frekin awesome and have my gratitude! I have never been able to mine a full unit of any crypto currency but IMHO my best shot is Feather Coin. As soon as I'm able to mine some, I will be more than happy to donate a portion to you. Thanks again!!
hero member
Activity: 756
Merit: 502
new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third

the cudaminer directory says 2013-04-17 though.

legendary
Activity: 1078
Merit: 1001
new nvidia beta driver increases my khashes/s  Shocked 140x2



cudaMiner 22-04 third
hero member
Activity: 756
Merit: 502
is there any way to determine where I can (safely) set my texture-cache variable?

try -l 48x5 -C 2 then

The worst that could happen is a temporary driver crash (which it should recover from in 99% of all cases)

I had an idea how to further cut shared memory use. So Kepler based cards would see full occupancy on their SMX'es, maybe gaining 10% performance on these devices.

Christian

newbie
Activity: 28
Merit: 0
For anyone that have Optimus NVIDIA card and using Linux, executing ./cudaminer will return
Code:
Floating point exception (core dumped)

Here's my solution :

1. Install Bumblebee.
2. run cudaminer with optirun (yes, you can use primusrun, but it had no effect)
Code:
optirun cudaminer blablabla
newbie
Activity: 20
Merit: 0

Oh, and to the other Falcon (above)... it's in the readme.

I swear, if it had teeth, I looked at that paragraph over and over and just simple failed to see it...sheesh

thanks for the answer though, it makes sense I just could not "see" it crashing like with guiminer when it would crash my screen would flicker for a driver reset and then it would be reporting Ghashes.

Jump to: