Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1044. (Read 2347601 times)

sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
I have buildt a new version:

-Merged and modded the new DJM34 lyra implementation. Lyra 66% faster on gtx970 and 46% faster on the 750ti
-small improvement in quark on the 750ti on standard clocks.

1.5.54(sp-MOD) is available here: (08-07-2015)

https://github.com/sp-hash/ccminer/releases/tag/1.5.54

The sourcecode is available here:

https://github.com/sp-hash/ccminer
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
has anyone got cryptonight hashrates for the 960? Was thinking of buying one today - tigerdirect has em for 190$ or something.
CRYPTONIGHT--
A few pages back, page 202, SP_ has a screen shot of a GTX 960 mining Cryptonight at 280H/s next to a 750ti mining at 300H/s.  Properly coded, the 960 should outperform the 750ti.
The Lyra2 code may have something in common, and yield a clue for optimization.       --scryptr

This is right. I think by using the djm34 teqniques the cryptight miner will be improve alot..By using vectors with the shuffle instruction to avoid global memory. A great task for DJM34. He likes this stuff, and does it well. A 1000 hours ++ job

I like to mod the kernals. Spend a few hours and gain a few percent. Wink

I don't use NSIGHT and I am not a registered CUDA developer.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
lyra2:

Submitted a 90KHASH improvement on the gtx970 (6.3%)
and 20KHASH improvement on the 750ti.

Again by reducing the codesize to reduce instructioncache fetching
legendary
Activity: 1797
Merit: 1028

COMMIT/BUILD #843--

Each commit to GitHub increments the commit number (Upper Left Hand corner)  There is also a commit hash number, but it is not sequential, so I don't use it.  I just checked and your GitHub commit number is 843.  That is the commit that I built and am currently mining with; it is maybe 12 hours old now.       --scryptr

How do you do checkouts then?  I'm used to the command line and using the sha for checkouts, so that is why I am wondering.

COMMAND LINE--

I use the command line, and refer to the commit number when posting about performance.  The sha will verify checksum, and is very precise for that purpose.  Commit numbers are sequential.

The line, "git clone https://github.com/sp-hash/ccminer", should clone the latest commit.  If I am wrong, please tell me!

--scryptr
legendary
Activity: 1260
Merit: 1008
has anyone got cryptonight hashrates for the 960? Was thinking of buying one today - tigerdirect has em for 190$ or something.

CRYPTONIGHT--

A few pages back, page 202, SP_ has a screen shot of a GTX 960 mining Cryptonight at 280H/s next to a 750ti mining at 300H/s.  Properly coded, the 960 should outperform the 750ti.

The Lyra2 code may have something in common, and yield a clue for optimization.       --scryptr

thanks for that, I apologize for my inability to scan through these pages. "Properly coded"... how do we make that happen? I was going to try and go through tsivs ccminer and just replace all of the fixed values with variables that could be set from command line and modified via an internal optimizer routine... but then I remembered I can't program, and just stared at the code and got really frustrated cause I'm like "I KNOW ITS IN THERE!"

bah.
legendary
Activity: 1797
Merit: 1028
has anyone got cryptonight hashrates for the 960? Was thinking of buying one today - tigerdirect has em for 190$ or something.

CRYPTONIGHT--

A few pages back, page 202, SP_ has a screen shot of a GTX 960 mining Cryptonight at 280H/s next to a 750ti mining at 300H/s.  Properly coded, the 960 should outperform the 750ti.

The Lyra2 code may have something in common, and yield a clue for optimization.       --scryptr
member
Activity: 111
Merit: 10

COMMIT/BUILD #843--

Each commit to GitHub increments the commit number (Upper Left Hand corner)  There is also a commit hash number, but it is not sequential, so I don't use it.  I just checked and your GitHub commit number is 843.  That is the commit that I built and am currently mining with; it is maybe 12 hours old now.       --scryptr

How do you do checkouts then?  I'm used to the command line and using the sha for checkouts, so that is why I am wondering.
sr. member
Activity: 249
Merit: 250
something is strange here. CCminer must be still doing something but I can't find it in task manager.
legendary
Activity: 1260
Merit: 1008
has anyone got cryptonight hashrates for the 960? Was thinking of buying one today - tigerdirect has em for 190$ or something.
legendary
Activity: 1797
Merit: 1028
GTX 960 SSC 2GB and Lyra2--

It just works!  With a setting of "-i 16.3", and +150/+300 core/mem, I get this:



GTX 960 mining Lyra2 with DJM34 Windows binary.

Actually, the hash rate fluctuates between 1200kh/s and the 1236kh/s shown, depending on system load.  The 960 is the only card in the Win 7 x64 system.       --scryptr
legendary
Activity: 1510
Merit: 1003
wow. new lyra2re code is here, thanks to djm and sp! My "hero" gtx750 was taken from shelf for this case. With 1500/1500 gpu/mem clock it shows 1115 khash/s with only 40% memory controller load. Old code was near 100% mem load.
And it still not very hot algo. 25-30% less power then quark.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
yiimp is a "test pool" i try to set up... without auto exchange, i will update the main page to explain better soon...

will not be like the yaamp multipool system which require a lot of attention about trades

else... CUDA 7.5 really improve ccminer, on almost all algos :p

I don't know how you can say CUDA7.5 does so much better. It was just put out to developers, hence the 'RC' designation. Stands for 'Release Candidate', meaning it's in the early stages.

Concerning your 'test pool'; I wouldn't broadcast you are trying this until you are ready to pay-up! I almost started mining there thinking I would be paid for the work I was doing. Just sayin'!

Its wrote on the main page, Yiimp is not an "autotrade" platform... So like others pools you mine the currency you want with the -right- currency address. I dont want to pay in VTC (or BTC) the whole china which is using SHA farms

The pool is working and pay what is mined... I don't want a second exchange full time job Wink Consider the fees as a donation for the new algos... Some are set very high because we are doing "private" tests... you can still mine on those but its made to reduce "anonymous" users...
legendary
Activity: 1797
Merit: 1028
DJM34, SP_ --

I flipped you each a nickle.  Thank you for your hard work!  I hope my 960's will be able to mine Lyra2 on Windows at about 1250kh/s soon, maybe more!

Thanks!       --scryptr

P.S.  I was able to get DJM34's Windows binary to run on my Win 7 x64 system with a 2GB GTX960 SSC with ONLY the performance setting of "-i 16.3".  No other performance settings were used.  Algo, username, password were as standard.

Result:  1175kh/s mining Lyra2       --scryptr
thanks,
don't forget to run at p0 state using nvidia-smi , that gives the possibility to oc the memclock (it will run also at a somewhat)

From what I've seen memory OCs aren't worth it. They give you a tiny bit of extra hash and they wreck your efficiency. Like 5% more hashrate for 15% more power.
well they are for memory hard algo as it decreases the frame buffer usage... (meaning less bottleneck at that level)
and if you have a large number of cards, you probably want a moderate power usage, if you are limited in gpu ressource you want to get the highest hashrate

OVERCLOCKING--

I am running DJM34's Windows binary with an intensity setting of "-i 16.3"' and +100 core / +300 mem overclock utilizing EVGA PrecisionX 16.  The result is 1200kh/s on my single 2GB GTX 960 SSC.  I earlier reported 1175kh/s with a lower overclock of +80/+240.  The overclock of +100/+300 was the highest stable overclock when mining Quark, but recent Quark code changes made it less stable.

If the 960 remains stable for a day or so, I may increase the overclock again.  For some reason, I had difficulty launching DJM34's Windows binary at default intensity, or my former setting of "-i 16.5" for Lyra2.  The card is mining Lyra2 instead of Quark, and making a big difference in my total Lyra2 hash power.

The card should be mining in the "P0" state, but PrecisionX 16 doesn't have a specific indicator for that.

I also play games with it.  Smiley       --scryptr
default intensity is set per compute version... since my compute_52 is a 980 with 4gb of memory it works well... obviously with a 960 with only 2GB, it might not work, it should use the same setting as the 750ti
p0 state are shown in nvidia inspector. (however if you didn't changed it, it is most likely running at p2, issue for all the 900 cards), and the mem oc is probably not passed at all

I think there is an option in latest sp version (on which my release is based actually) to set p0 state (haven't tried though...), I used the command line

nVidia Inspector--

OK, I switched to using nVidia Inspector (by Orbmu2k).  It detects the "P" state, and I have selected P0.  I have been playing with the clock settings, and the best hash rate I have reached is 1230kh/s.  I think that "P2" is the default P state for the card.  For the moment, I am running at +150/+300, and the hash rate varies between 1200kh/s and 1230kh/s.

This data is pretty raw, milder clocks most likely would give a stable 1200kh/s.       --scryptr
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
The Lyra2 profit has dropped alot.
1 week ago The rental sites payed 1.3 BTC/Day for 1 gigahash of lyra.
today it's down to 0.6-0.8BTC/Day
legendary
Activity: 1400
Merit: 1050
DJM34, SP_ --

I flipped you each a nickle.  Thank you for your hard work!  I hope my 960's will be able to mine Lyra2 on Windows at about 1250kh/s soon, maybe more!

Thanks!       --scryptr

P.S.  I was able to get DJM34's Windows binary to run on my Win 7 x64 system with a 2GB GTX960 SSC with ONLY the performance setting of "-i 16.3".  No other performance settings were used.  Algo, username, password were as standard.

Result:  1175kh/s mining Lyra2       --scryptr
thanks,
don't forget to run at p0 state using nvidia-smi , that gives the possibility to oc the memclock (it will run also at a somewhat)

From what I've seen memory OCs aren't worth it. They give you a tiny bit of extra hash and they wreck your efficiency. Like 5% more hashrate for 15% more power.
well they are for memory hard algo as it decreases the frame buffer usage... (meaning less bottleneck at that level)
and if you have a large number of cards, you probably want a moderate power usage, if you are limited in gpu ressource you want to get the highest hashrate

OVERCLOCKING--

I am running DJM34's Windows binary with an intensity setting of "-i 16.3"' and +100 core / +300 mem overclock utilizing EVGA PrecisionX 16.  The result is 1200kh/s on my single 2GB GTX 960 SSC.  I earlier reported 1175kh/s with a lower overclock of +80/+240.  The overclock of +100/+300 was the highest stable overclock when mining Quark, but recent Quark code changes made it less stable.

If the 960 remains stable for a day or so, I may increase the overclock again.  For some reason, I had difficulty launching DJM34's Windows binary at default intensity, or my former setting of "-i 16.5" for Lyra2.  The card is mining Lyra2 instead of Quark, and making a big difference in my total Lyra2 hash power.

The card should be mining in the "P0" state, but PrecisionX 16 doesn't have a specific indicator for that.

I also play games with it.  Smiley       --scryptr
default intensity is set per compute version... since my compute_52 is a 980 with 4gb of memory it works well... obviously with a 960 with only 2GB, it might not work, it should use the same setting as the 750ti
p0 state are shown in nvidia inspector. (however if you didn't changed it, it is most likely running at p2, issue for all the 900 cards), and the mem oc is probably not passed at all

I think there is an option in latest sp version (on which my release is based actually) to set p0 state (haven't tried though...), I used the command line
legendary
Activity: 1797
Merit: 1028
DJM34, SP_ --

I flipped you each a nickle.  Thank you for your hard work!  I hope my 960's will be able to mine Lyra2 on Windows at about 1250kh/s soon, maybe more!

Thanks!       --scryptr

P.S.  I was able to get DJM34's Windows binary to run on my Win 7 x64 system with a 2GB GTX960 SSC with ONLY the performance setting of "-i 16.3".  No other performance settings were used.  Algo, username, password were as standard.

Result:  1175kh/s mining Lyra2       --scryptr
thanks,
don't forget to run at p0 state using nvidia-smi , that gives the possibility to oc the memclock (it will run also at a somewhat)

From what I've seen memory OCs aren't worth it. They give you a tiny bit of extra hash and they wreck your efficiency. Like 5% more hashrate for 15% more power.
well they are for memory hard algo as it decreases the frame buffer usage... (meaning less bottleneck at that level)
and if you have a large number of cards, you probably want a moderate power usage, if you are limited in gpu ressource you want to get the highest hashrate

OVERCLOCKING--

I am running DJM34's Windows binary with an intensity setting of "-i 16.3"' and +100 core / +300 mem overclock utilizing EVGA PrecisionX 16.  The result is 1200kh/s on my single 2GB GTX 960 SSC.  I earlier reported 1175kh/s with a lower overclock of +80/+240.  The overclock of +100/+300 was the highest stable overclock when mining Quark, but recent Quark code changes made it less stable.

If the 960 remains stable for a day or so, I may increase the overclock again.  For some reason, I had difficulty launching DJM34's Windows binary at default intensity, or my former setting of "-i 16.5" for Lyra2.  The card is mining Lyra2 instead of Quark, and making a big difference in my total Lyra2 hash power.

The card should be mining in the "P0" state, but PrecisionX 16 doesn't have a specific indicator for that.

I also play games with it.  Smiley       --scryptr
legendary
Activity: 1797
Merit: 1028
On my Linux boxes, SP_'s build 843 compiled and mines Lyra2 at 1850kh/s on my 970 FTW+ cards, and at 1050kh/s on my 750ti FTW cards.  The 750ti FTW cards were running at 825kh/s on the SP_'s release dot 50.

Did you try after my latest commit? I get 40khash + on my gigabyte windforce cards with a 6pins connector.(750ti) The compute 5.2 cards are unchanged as they use another kernal.

Here is the commit:

https://github.com/sp-hash/ccminer/commit/384d4cc461d38fdfb2243cb806806cdccad98074

The commit is not big but it reduces the register usage from 185 to 113. and reduces the codesize wich gives less pressure on the instructioncache.
(less memory usage)

COMMIT/BUILD #843--

Each commit to GitHub increments the commit number (Upper Left Hand corner)  There is also a commit hash number, but it is not sequential, so I don't use it.  I just checked and your GitHub commit number is 843.  That is the commit that I built and am currently mining with; it is maybe 12 hours old now.       --scryptr
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
the pragma unroll were chosen with care and they enhance the hashrate by about that same amount on one of my card (most likely the 980, that might decrease the hashrate on the 900 serie...)

Yes, and they work good on the highendcards, but not so good on the 750ti. (compute5.0) You have 2 kernals, one for the compute 50 and one for the others. I halved the threads per block and removed some of the pragma unrolls in the 5.0 kernal. 3.5-4% gain.
legendary
Activity: 1400
Merit: 1050
DJM34, SP_ --

I flipped you each a nickle.  Thank you for your hard work!  I hope my 960's will be able to mine Lyra2 on Windows at about 1250kh/s soon, maybe more!

Thanks!       --scryptr

P.S.  I was able to get DJM34's Windows binary to run on my Win 7 x64 system with a 2GB GTX960 SSC with ONLY the performance setting of "-i 16.3".  No other performance settings were used.  Algo, username, password were as standard.

Result:  1175kh/s mining Lyra2       --scryptr
thanks,
don't forget to run at p0 state using nvidia-smi , that gives the possibility to oc the memclock (it will run also at a somewhat)

From what I've seen memory OCs aren't worth it. They give you a tiny bit of extra hash and they wreck your efficiency. Like 5% more hashrate for 15% more power.
well they are for memory hard algo as it decreases the frame buffer usage... (meaning less bottleneck at that level)
and if you have a large number of cards, you probably want a moderate power usage, if you are limited in gpu ressource you want to get the highest hashrate
legendary
Activity: 1764
Merit: 1024
DJM34, SP_ --

I flipped you each a nickle.  Thank you for your hard work!  I hope my 960's will be able to mine Lyra2 on Windows at about 1250kh/s soon, maybe more!

Thanks!       --scryptr

P.S.  I was able to get DJM34's Windows binary to run on my Win 7 x64 system with a 2GB GTX960 SSC with ONLY the performance setting of "-i 16.3".  No other performance settings were used.  Algo, username, password were as standard.

Result:  1175kh/s mining Lyra2       --scryptr
thanks,
don't forget to run at p0 state using nvidia-smi , that gives the possibility to oc the memclock (it will run also at a somewhat)

From what I've seen memory OCs aren't worth it. They give you a tiny bit of extra hash and they wreck your efficiency. Like 5% more hashrate for 15% more power.
Jump to: