Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1213. (Read 2347597 times)

sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...)
depends of the cases, sometimes you cant change the TPB without rewriting the code (i think about sharedmem code) and you cant fine tune the max regs with the launchbound.
That is right to support all past and possible/future cards.. not for the 90% of users with 750Ti or 970/980

The problem is that when you change the number of registers in one kernal, all the other kernals needs to be recompiled, and sometimes the performance in the other kernals get worse. Use launchbounds. It's faster and bether.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes
you have also remains of a missing simd free with current commit (does not build)

The new Echo is doing 8.xx rounds of echo instead of 10 rounds. The previous version did 9.25 rounds. And the original version does 10rounds. 

On the 980 we will probobly get 300KHASH more.

Less work, less power more hash.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes

2.92MHASH on the 750TI windforce black am I right? I only get 2700-2750 on my gainward ti.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Not done, wait for a new checkin tonight. Multipools show high numbers , but Solomining is broken.
 
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes

you have also remains of a missing simd free with current commit (does not build)
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Will not make a new build before I have run it through the night.

The version I checked in last night on github needs some more work...
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.

actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...)

depends of the cases, sometimes you cant change the TPB without rewriting the code (i think about sharedmem code) and you cant fine tune the max regs with the launchbound.

That is right to support all past and possible/future cards.. not for the 90% of users with 750Ti or 970/980
legendary
Activity: 2296
Merit: 1031
--help or README.txt at github (or in my releases)


thx!
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
--help or README.txt at github (or in my releases)
legendary
Activity: 2296
Merit: 1031
could folks share there command line?  this is mine:

Code:
ccminer.exe -q -r 3 -R 10 -a x13 --no-color -o stratum+tcp://yaamp.com:3633 -u xxx -p xxx


Thank you.  Is there a 'read me' that explains the -flags?  Not familiar with -q -r or -R.  seems like the -q maybe 'quiets' some of the output?
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Will not make a new build before I have run it through the night.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Today I improved the final hashing in x11 (echo). The 1mb 750 is +50 KHASH. the ti 50-100KHASH. reverted BMW to an earlier version.
My gainward 750ti is peaking at 2.750MHASH. While it was at 2650-2700 earlier.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...)

Yes, most of my modded kernals use __launchbounds__.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
I started on the merge of tpruvot 1.5.1, but there was so many files and I got lazy. I tested the latest version on compute 3.0 (gtx 650) and it crashed. I will look into it.
When you see commits changing miner.h and structures, its important to do a clean build (make clean on linux or ... Regenerate in VStudio)
I also tried to merge some of your changes, but... for the moment i had weird results on linux.
Shavite seems to be suboptimal on mine... will check that later (Conference today, the whole afternoon)

I think the best would be to refork, and just change some of the kernal code.
I will probobly revert BMW to an earlier version. It seems to go faster on 750ti without the new changes  even if it spills more registers and the code is longer. But messuring is abit hard with the boost clock setting. If you push the cards to hard/get hot, they will downclock and loose performance. Different ti cards have different perfomance/voltagate/wattage.
legendary
Activity: 1400
Merit: 1050
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.

actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...)
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer

I started on the merge of tpruvot 1.5.1, but there was so many files and I got lazy. I tested the latest version on compute 3.0 (gtx 650) and it crashed. I will look into it.


When you see commits changing miner.h and structures, its important to do a clean build (make clean on linux or ... Regenerate in VStudio)

I also tried to merge some of your changes, but... for the moment i had weird results on linux.

Shavite seems to be suboptimal on mine... will check that later (Conference today, the whole afternoon)
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
SP anyway to make the miner/cmd window minimize to the tray
that would be amazing

*answered myself!
http://rbtray.sourceforge.net/

You can use the start command as in:
start /min ccminer.exe .....

You can also set the priority that way (/low /normal /high, etc).
sr. member
Activity: 285
Merit: 250
SP anyway to make the miner/cmd window minimize to the tray
that would be amazing

*answered myself!
http://rbtray.sourceforge.net/
member
Activity: 61
Merit: 10
Here's my first impression of ccminer 1.5.0. The miner shows 15.2-15.3 mh + occasional hw. Pool shows this:



Left: release14
from 12:00: release8
from 20:30: ccminer 1.5.0

So apparently some of the hash magically disappears. Smiley

Jump to: