Author

Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] - page 949. (Read 3426989 times)

legendary
Activity: 1400
Merit: 1050
either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.
same here with my gtx 660 oem 1.5gb for the kernel, although in 2048 it is rather Y6x20
but it is well known that the gtx660oem is not really a gtx660  Grin
hero member
Activity: 756
Merit: 502

1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink


Actually, it's funny you mention that.  If i try to run Y7x32, it fails horribly aand crashes the driver

according to the code, it shouldn't crash.... K and Y really do the same thing.

Code:
            switch (kernelid)
            {
                case 'T': case 'Z': *kernel = new NV2Kernel(); break;
                case 't':           *kernel = new TitanKernel(); break;
                case 'K': case 'Y': *kernel = new NVKernel(); break;
                case 'k':           *kernel = new KeplerKernel(); break;
                case 'F': case 'L': *kernel = new FermiKernel(); break;
                case 'f': case 'X': *kernel = new TestKernel(); break;
                case ' ': // choose based on device architecture
                    *kernel = Best_Kernel_Heuristics(props);
                break;
sr. member
Activity: 280
Merit: 250

1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink


Actually, it's funny you mention that.  If i try to run Y7x32, it fails horribly aand crashes the driver
hero member
Activity: 756
Merit: 502

1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32

isn't Y an alias for K? Wink
sr. member
Activity: 280
Merit: 250
either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.


Hi there, long time lurker.  Reg'd to post up for this.


I'm seeing the same trend on GTX 670s.
 
1024 would get me ~280 with best results from Y14x20
2048 it's ~133 with the best results from K7x32
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian


GTX 660:
N:1024, Y5x32, ~240 kH/s
N:2048, Y5x32, ~128 kH/s

Edit: Y5x32 seems to be the fastest kernel/config, even though autotune tends to find Y5x28 the fastest most of the time.
legendary
Activity: 1400
Merit: 1050
yes the Z kernel is the slowest for the Vertcoin (it has always been the case since it was introduced)
hero member
Activity: 756
Merit: 502
Something strange (or not that's the question...).
For most of the coins, the formerly known as Z kernel is the fastest especially with script coins.
However, for Vertcoin (script:2048)  it is way much slower (difference>50khash) than the formerly known as T kernel.
Is there any reason for this ?

so you're saying the current "T" (alias name Z) kernel is slower than the current "t" kernel (formerly known as T) for VertCoin?

or do you compare current cudaminer performance with some older prerelease version?

either way, this is surprising. There should not be much of a difference between N=1024 and N=2048 scrypt coins, really. At high N the low register count kernels have a significant advantage - they reach higher occupancy under tight memory constraints. And they can do a lookup gap without running into much register pressure. But N=2048 isn't high...

Christian
legendary
Activity: 1400
Merit: 1050
Something strange (or not that's the question...).
For most of the coins, the formerly known as Z kernel is the fastest especially with script coins.
However, for Vertcoin (script:2048)  it is way much slower (difference>50khash) than the formerly known as T kernel.
Is there any reason for this ?
newbie
Activity: 43
Merit: 0
Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Since they are the same I use the same configuration. My problem is that if i start either one alone. It does reach 605. If I start em together They both reach 605 but after 2-3 minutes the gpu clock drops and the Voltage and the hash with them (drops to 520). It s the upper card and the the monitor is plugged in meaning the pcie is the most powerful. Also this card appears to have more temp (85C) than the one that works with max hash (75C). (but probably due to the limited space that it has to breath).

EDIT: I also saw the 2 next answers. Thanks I ll try to play with the H (althouh I doubt i ll see any difference).

Have you tried undervolting your cards? With the new kernel I can undervolt massively and still have a high overclock (gtx 780). Currently running +310 on core which gives me 1254mhz, no memory oc, -50mv voltage which makes it 1.100 on load.
The lowered heat might make your cards stay at higher clocks more, you should also use afterburner to set the priority to the power target and not the temp target.
sr. member
Activity: 247
Merit: 250
Well I just got ripped off of a YACoin block, it said the Yay!!! thing but the damn client never actually showed the block.
The client even said it found a block but it never appeared in my wallet, so sad...
Yacoin takes ~520 confirms. That usually takes a few hours after a found block.

The same happened to me with an UltraCoin block. My wallet lists 3 transaction in total, but only 2 incoming transactions from mining are actually displayed.

If you find a way to recover that missing transaction, please let me know.


command-line command:   -rescan                Rescan the block chain for missing wallet transactions


Have you tried this?
member
Activity: 112
Merit: 10
When recompiling this, is there anything wrong with doing a git pull, running autogen, configure and then make.  Or is it better to just delete and start from scratch.
Doesnt git make sure your not mixing any files?

BTW the new commits are definately improving the performance on Fermi, but still under what it was with 2014-01-20.
member
Activity: 69
Merit: 10
Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Since they are the same I use the same configuration. My problem is that if i start either one alone. It does reach 605. If I start em together They both reach 605 but after 2-3 minutes the gpu clock drops and the Voltage and the hash with them (drops to 520). It s the upper card and the the monitor is plugged in meaning the pcie is the most powerful. Also this card appears to have more temp (85C) than the one that works with max hash (75C). (but probably due to the limited space that it has to breath).

EDIT: I also saw the 2 next answers. Thanks I ll try to play with the H (althouh I doubt i ll see any difference).
newbie
Activity: 6
Merit: 0
Hi, i'm using 2xGTX560 here

with 2013-12-10 version I get about 290kh/s
with 2013-12-18 version I get about 310kh/s but it freezes often
with 2014-02-02 version I get about 270kh/s  Huh

I'm using: cudaminer.exe -d 0,1 -i 0,0 -l F7x16,F7x16 -H 1,1 -C 1,1
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

Primary cards are always going to perform worse as they are stressed by the OS, your browser, background apps and so on. Also,the -H flag could cause it so try -H 2 to exclude the CPU. If you're not using risers, chances are one of your card is hotter than the other, or at least requires a higher fan speed to keep it at lower temps so the fans are using more power on one card which very well means lower core frequencies when it comes to kepler. And those are not the only possible explanations, but my brake is over...
newbie
Activity: 43
Merit: 0
Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?

Have you guys monitored your cards in afterburner? The topmost cards might be throttling more than the bottom one, or just that they boost to different mhz. Custom bios with disabled boost is awesome in general.
hero member
Activity: 756
Merit: 502
Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?

I have 3 GTX 780Ti in one PC and two of them hash 10-20 kHash/s less than the fastest one. I attribute this to subtle differences in the PCI express connectivity.

But 100 kHash/s difference - ouch? played with the -H options yet?
newbie
Activity: 4
Merit: 0
I think your bug report is the one that made my mind go http://www.digitalsherpa.com/wp-content/uploads/2012/11/lightbulb1.gif

The CUDA constant memory (the c_N loop trip count, etc...) of most CUDA kernels is only initialized properly for the first GPU (use of a single static variable to mark initialization instead of a thread-specific static variable). Which explains the majority of the crashes people are seeing with multi-GPU. Thank you. The Fermi owners use a kernel that doesn't yet make use of such constants, and hence the multi-GPU support is working fine for them.

So this is also on the FIXME list for tonight.
Awesome, looking forward to the fix. Thanks for the support Smiley

However I think that in your case where you run two cudaminer instances this cannot be the root cause. So we will have to keep looking.
Oh no I don't run two instances, I meant that one of the GPU's within the same cudaMiner instance produced invalid results. Which is in line with your explanation above. Running two instances of cudaMiner (one for each GPU) actually works perfectly, so this also confirms your hypothesis.
legendary
Activity: 1400
Merit: 1050
A brave tester with 8 Fermi cards Tesla M2090 (thanks Choseh) just figured out the performance regression between 2013-12-18 and 2014-02-02.

If you change the #if 0 in the fermi_kernel.cu to #if 1 (thereby enabling the previous version of the Salsa20/8 round function) you should see the previous performance figures again. Those who can compile the code themselves and want to mine on Fermi are welcome to make this change themselves.

also there seems to be a bug in the autotuning code in salsa_kernel.cu

                            hash_sec = (double)WU_PER_LAUNCH / tdelta;

should very likely be

                            hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;

to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!

So, it seems I should release fixes (new binary release) for these problems tonight.

Christian

Yes, It works better this way. However there are still the problem with power increase between config but it is less apparent.
(strangely, I don't have that problem with the gtx660, its power stays at 100% and doesn't fluctuate)


member
Activity: 69
Merit: 10
I'm having an issue mining scrypt with two GPU's: every time I start cudaMiner, one of them (not always the same) seems to return results which mostly doesn't validate. This is tested on Linux, using cudaMiner master, tested both on Z14x14 and T14x32. When I mine using a single card (selected with -d) all results validate... Any idea?

this is a long shot: Would it help to downgrade the video driver to the exact version hat shipped with the CUDA 5.5 toolkit download from nVidia? Of course if your video cards are newer than this driver release, this is a no-go, as the driver would not recognize them ;-/


I'm using the NVIDIA drivers from the Debian repositories, so downgrading isn't that easy... That said, I'm currently using the 319 driver series, which seems like the earliest one supported by CUDA 5.5. I've tried upgrading to the 331 driver series, but that doesn't change a thing.

I think your bug report is the one that made my mind go

The CUDA constant memory (the c_N loop trip count, etc...) of most CUDA kernels is only initialized properly for the first GPU (use of a single static variable to mark initialization instead of a thread-specific static variable). Which explains the majority of the crashes people are seeing with multi-GPU. Thank you. The Fermi owners use a kernel that doesn't yet make use of such constants, and hence the multi-GPU support is working fine for them.

So this is also on the FIXME list for tonight.

However I think that in your case where you run two cudaminer instances this cannot be the root cause. So we will have to keep looking.

Christian


Mate any idea why although
I have 2 GTX 780 (and two cuda miners) one shows 520 khps and the other 605? Can it be that the one that the monitor is plugged loses hash power because of it? Any idea what is going on?
Jump to: