Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 966. (Read 2347664 times)

full member
Activity: 231
Merit: 150
I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16
not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.
no actually it doesn't have a problem allocating big-block, as long as it doesn't use it full (checked that with ethereum), I suspect the compiler knows what to really allocate at compile time and actually the memshift of 4 was to allow smoother accesses to global mem at least with the 780ti... (problem was reversed in classic lyra)

Link to your latest release please, thanks.
sr. member
Activity: 506
Merit: 252
release 63 (different codebase) --> solomining works
any other release including 64  --> solomining wont work
legendary
Activity: 1400
Merit: 1050
I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16
not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.
no actually it doesn't have a problem allocating big-block, as long as it doesn't use it full (checked that with ethereum), I suspect the compiler knows what to really allocate at compile time and actually the memshift of 4 was to allow smoother accesses to global mem at least with the 780ti... (problem was reversed in classic lyra)
hero member
Activity: 677
Merit: 500
Now testing lira2v2 on sm50 (750Ti).
Miner show good hashrate. On pool side (nicehash) every 2.5 minutes connection reseted (when diff go from 2 up to 4 and 40 seconds miner dont take any yay!!). Problem same.
Pool shows speed 3-6 Mh - my 2x750Ti make 8.4 Mh...
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..
The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16
not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...

Yes, this is the maxwell mod. Less memory is good on windows because the operating system have problems with allocation big blocks of gpu-mem. Windows 8/10. With -X 16 the 750ti is using 1.8 Gig of memory in lyra2v2 +30 khash.
member
Activity: 70
Merit: 10
sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.
I don't think it is possible or you would have to write different files for different kernel. But no matter what you will still need to recompile it (for what it is worth, it is a lot better than opencl)

Actually this seems to be working.  I think the commit I reverted touched WAY more than I thought it did is all.
legendary
Activity: 1400
Merit: 1050
and a little more  Grin (the 750ti is oc at +200 mem/core, actually core matters more than mem)


Nice, didn't take you too long Smiley

I'm thinking we agree on how it should be done, tried that one thing I hadn't had time to try yet Grin
most likely  (however some variations around that are probably possible)Grin
legendary
Activity: 1400
Merit: 1050
I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..

The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16

not entirely a good idea, you are clearly sacrificing the 3.5 compute cards... (already noticed that in some previous change... I understand that you don't have these card, don't understand why you are removing the optimization done here) also it doesn't really matter how much memory you allocate if it isn't used, that's why I used that "#define memshift 3 (or 4")" to do that.
Don't really understand either why you are pushing the intensity so high, at some point it doesn't help anymore and create more bottleneck on mem access...
legendary
Activity: 1400
Merit: 1050
sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.
I don't think it is possible or you would have to write different files for different kernel. But no matter what you will still need to recompile it (for what it is worth, it is a lot better than opencl)
hero member
Activity: 677
Merit: 500
Commit 997 lyra2v2 is slower on sm52 (from 27.9  Mh to 26.7-27.6 Mh) and not stable in miner. sm 50 not tested.
Nicehash pool side show - not more 20Mh. when pool set diff to 8 - miner dont found any hashes. Then diff resetting to 4 and again hashes founded.GPU usage is jumping up and down.
After resetting diff on pool accepted speed is starting from zero. Stratum connection reseted.
PS. 2x980GTX and 960GTX in one rig.
member
Activity: 70
Merit: 10
sp_,

Is there some way to do "minimal rebuilds" of the CUDA kernels on Windows?

It's pretty lame sitting around for 10min while all that unchanged stuff rebuilds after I change one line.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I also submitted a 20% reduction in memory on the lyra2v2 algo so you can have higher intensities than before..

The default for the 750ti is -X 16. in release 63 it was -X 8
The default for the 970 is -X 26, in release 63 it was -X 16
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
5MHASH on the gtx: 750ti
10MHASH on the gtx 970:

sp-mod 64-git

ccminer -q -i 19.2 -g 2 -a lyra2v2  -C (--benchmark)

but there is a bug. The miner will crash after a while when mining on a pool. And use the quiet mode because you get too much information on the ccminer screen without it.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
-Blakecoin 70% faster
-Added a new switch -C (cpu-mining) If you want to cpu help the gpu. works on all algos.
-fixed some bugs like the color output and rapid hash outputs.

1.5.64(sp-MOD) is available here: (02-09-2015)

https://github.com/sp-hash/ccminer/releases/

The sourcecode is available here:

https://github.com/sp-hash/ccminer
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
And 50Watt more power on the rig.. So 50 watt for 1KHASH gain Smiley
DETAILS, DETAILS--
I do not know the exact details,  I was considering 10.2Mh/s vs. 10.6Mh/s for Quark.  I am not the dev, just the miner.  If the CPU can be tasked to improve a few of the algos, I am in favor of the command line switch.  Quark hash rates are a "big deal" to any miner just now.  Neoscrypt rates matter also, but, like Lyra2v2, you might need to mine the coin instead of a BTC exchange rate.  VTC markets are looking up, I haven't traded FTC and the merge-mined companion coins for a while, but I just might.
      --scryptr

I have made the switch now. Quark seems to do 30KHASH bether per card. and the cpu load is 80%. +30watt(or depends on your cpu) The new switch is called -C (--cpu-mining) It should work on all algos, but not tested.

I will test some more and submitt to github.

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I submitted a speedup in the blakecoin. 70% faster

 (-a blakecoin)
full member
Activity: 137
Merit: 100
and a little more  Grin (the 750ti is oc at +200 mem/core, actually core matters more than mem)





Nice, didn't take you too long Smiley

I'm thinking we agree on how it should be done, tried that one thing I hadn't had time to try yet Grin



Weirdest thing is power use dropped from 440ish to 400ish  Huh
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
and a little more  Grin (the 750ti is oc at +200 mem/core, actually core matters more than mem)

so the question is once again: what do you wanna do with it? :-)
legendary
Activity: 1400
Merit: 1050
and a little more  Grin (the 750ti is oc at +200 mem/core, actually core matters more than mem)



sr. member
Activity: 438
Merit: 250
No the most profitable is Etherum.

but,  if the 750ti was hashing lyra2v2 at 7,636 MHASH it will be more profitable than the public quark kernals. But it isn't. Even with the private kernals.
Ethereum?!
Mining with 750 is ok only under Linux, Windows has too many bugs - like getting 5MH instead of 8-9 for unknown reason, or even getting 300-400KH under 8.1
Of 3 750Ti cards I have, 2 can mine with ~ 5MH under Win7, the third - 300KH/s under 8.1

You're absolutely right. It's killing me  Angry

Meanwhile I've managed to speed up the kernel by about 2-3% using lop3.b32 PTX instruction (new in CUDA 7.5) in keccak. Might be useful for other coins as well.

https://github.com/Genoil/cpp-ethereum/blob/cudaminer-frontier/libethash-cuda/keccak.cuh
Jump to: