CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 966.

zTheWolfz

full member

Activity: 231

Merit: 150

Quote from: djm34 on September 02, 2015, 02:58:41 PM

Quote from: sp_ on September 02, 2015, 02:21:33 PM

Quote from: djm34 on September 02, 2015, 01:49:21 PM

Quote from: sp_ on September 02, 2015, 01:24:37 PM

I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.

no actually it doesn't have a problem allocating big-block, as long as it doesn't use it full (checked that with ethereum), I suspect the compiler knows what to really allocate at compile time and actually the memshift of 4 was to allow smoother accesses to global mem at least with the 780ti... (problem was reversed in classic lyra)

Link to your latest release please, thanks.

Grim

sr. member

Activity: 506

Merit: 252

release 63 (different codebase) --> solomining works
any other release including 64 --> solomining wont work

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: sp_ on September 02, 2015, 02:21:33 PM

Quote from: djm34 on September 02, 2015, 01:49:21 PM

Quote from: sp_ on September 02, 2015, 01:24:37 PM

I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.

no actually it doesn't have a problem allocating big-block, as long as it doesn't use it full (checked that with ethereum), I suspect the compiler knows what to really allocate at compile time and actually the memshift of 4 was to allow smoother accesses to global mem at least with the 780ti... (problem was reversed in classic lyra)

Slava_K

hero member

Activity: 677

Merit: 500

Now testing lira2v2 on sm50 (750Ti).
Miner show good hashrate. On pool side (nicehash) every 2.5 minutes connection reseted (when diff go from 2 up to 4 and 40 seconds miner dont take any yay!!). Problem same.
Pool shows speed 3-6 Mh - my 2x750Ti make 8.4 Mh...

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: djm34 on September 02, 2015, 01:49:21 PM

Quote from: sp_ on September 02, 2015, 01:24:37 PM

I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.

t-nelson

member

Activity: 70

Merit: 10

Quote from: djm34 on September 02, 2015, 01:41:55 PM

Quote from: t-nelson on September 02, 2015, 01:35:57 PM

sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.

I don't think it is possible or you would have to write different files for different kernel. But no matter what you will still need to recompile it (for what it is worth, it is a lot better than opencl)

Actually this seems to be working. I think the commit I reverted touched WAY more than I thought it did is all.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: tsiv on September 02, 2015, 08:44:18 AM

Quote from: djm34 on September 02, 2015, 08:09:31 AM

and a little more Grin

(the 750ti is oc at +200 mem/core, actually core matters more than mem)

Nice, didn't take you too long

I'm thinking we agree on how it should be done, tried that one thing I hadn't had time to try yet Grin

most likely (however some variations around that are probably possible) Grin

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: sp_ on September 02, 2015, 01:24:37 PM

I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..

The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: t-nelson on September 02, 2015, 01:35:57 PM

sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.

I don't think it is possible or you would have to write different files for different kernel. But no matter what you will still need to recompile it (for what it is worth, it is a lot better than opencl)

Slava_K

hero member

Activity: 677

Merit: 500

Commit 997 lyra2v2 is slower on sm52 (from 27.9 Mh to 26.7-27.6 Mh) and not stable in miner. sm 50 not tested.
Nicehash pool side show - not more 20Mh. when pool set diff to 8 - miner dont found any hashes. Then diff resetting to 4 and again hashes founded.GPU usage is jumping up and down.
After resetting diff on pool accepted speed is starting from zero. Stratum connection reseted.
PS. 2x980GTX and 960GTX in one rig.

t-nelson

member

Activity: 70

Merit: 10

sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..

The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

5MHASH on the gtx: 750ti
10MHASH on the gtx 970:

sp-mod 64-git

ccminer -q -i 19.2 -g 2 -a lyra2v2 -C (--benchmark)

but there is a bug. The miner will crash after a while when mining on a pool. And use the quiet mode because you get too much information on the ccminer screen without it.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

-Blakecoin 70% faster
-Added a new switch -C (cpu-mining) If you want to cpu help the gpu. works on all algos.
-fixed some bugs like the color output and rapid hash outputs.

1.5.64(sp-MOD) is available here: (02-09-2015)

https://github.com/sp-hash/ccminer/releases/

The sourcecode is available here:

https://github.com/sp-hash/ccminer

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: scryptr on September 02, 2015, 03:56:23 AM

And 50Watt more power on the rig.. So 50 watt for 1KHASH gain

DETAILS, DETAILS--
I do not know the exact details, I was considering 10.2Mh/s vs. 10.6Mh/s for Quark. I am not the dev, just the miner. If the CPU can be tasked to improve a few of the algos, I am in favor of the command line switch. Quark hash rates are a "big deal" to any miner just now. Neoscrypt rates matter also, but, like Lyra2v2, you might need to mine the coin instead of a BTC exchange rate. VTC markets are looking up, I haven't traded FTC and the merge-mined companion coins for a while, but I just might.
--scryptr

I have made the switch now. Quark seems to do 30KHASH bether per card. and the cpu load is 80%. +30watt(or depends on your cpu) The new switch is called -C (--cpu-mining) It should work on all algos, but not tested.

I will test some more and submitt to github.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

I submitted a speedup in the blakecoin. 70% faster

(-a blakecoin)

tsiv

full member

Activity: 137

Merit: 100

Quote from: djm34 on September 02, 2015, 08:09:31 AM

and a little more Grin

(the 750ti is oc at +200 mem/core, actually core matters more than mem)

Nice, didn't take you too long

I'm thinking we agree on how it should be done, tried that one thing I hadn't had time to try yet Grin

Weirdest thing is power use dropped from 440ish to 400ish Huh

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: djm34 on September 02, 2015, 08:09:31 AM

and a little more Grin

(the 750ti is oc at +200 mem/core, actually core matters more than mem)

so the question is once again: what do you wanna do with it? :-)

djm34

legendary

Activity: 1400

Merit: 1050

and a little more Grin

(the 750ti is oc at +200 mem/core, actually core matters more than mem)

Genoil

sr. member

Activity: 438

Merit: 250

Quote from: restless on September 02, 2015, 06:47:44 AM

Quote from: sp_ on September 02, 2015, 06:34:12 AM

No the most profitable is Etherum.

but, if the 750ti was hashing lyra2v2 at 7,636 MHASH it will be more profitable than the public quark kernals. But it isn't. Even with the private kernals.

Ethereum?!
Mining with 750 is ok only under Linux, Windows has too many bugs - like getting 5MH instead of 8-9 for unknown reason, or even getting 300-400KH under 8.1
Of 3 750Ti cards I have, 2 can mine with ~ 5MH under Win7, the third - 300KH/s under 8.1

You're absolutely right. It's killing me Angry

Meanwhile I've managed to speed up the kernel by about 2-3% using lop3.b32 PTX instruction (new in CUDA 7.5) in keccak. Might be useful for other coins as well.

https://github.com/Genoil/cpp-ethereum/blob/cudaminer-frontier/libethash-cuda/keccak.cuh

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 966. (Read 2347664 times)