Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 841. (Read 2347659 times)

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.
This is not a solution, this is a workaround. Smiley
I don't see a problem.
Most people already have both cuda 6.5 and 7.5.
With some little changes to the Makefile, you could compile each kernel with its best cuda version, in a single executable.

Don't you want to know if cuda7.5 can make your code faster?. A recoding is probobly required.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.

This is not a solution, this is a workaround. Smiley


I don't see a problem.
Most people already have both cuda 6.5 and 7.5.
With some little changes to the Makefile, you could compile each kernel with its best cuda version, in a single executable.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Since quark was the focus of the most recent changes it proves that cuda 7.5 can perform better than 6.5. I hope these results translate to the other algos.

I have showed that it can be done with quark.
I believe the other algos can be tuned faster as well with more work..

0.01 BTC guys. This is all I am asking Smiley
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.
Tested working on Linux. Here's the procedure:

- build ccminer with cuda 7.5 as usual
- remove all the object files in the neoscrypt folder: rm neoscrypt/*.o
- edit Makefile
- replace all the instances of "7.5" with "6.5"
- run make again
- you just made a ccminer executable with all the algos on 7.5 except neoscrypt on 6.5 :-)
- revert the Makefile changes to build it again in the future

If you find this useful, please donate to the BTC address in my signature.

If I undertstand the result neo is compiled with 6.5 and eveyrthing else with 7.5. Then it is all linked
with 6.5. I'm not sure linking object files from different compilers is safe.

I prefer to use a script to select the prefered executable based on the algo. Less work, less risk, more
flexible.

Edit: but it's still a workaround. Wink

Linking object files from different compilers: I've often linked object files create with a C compiler and others created with an assembler.
In ccminer, some objects are compiled with gcc, some others with nvcc... you get the picture.
legendary
Activity: 1470
Merit: 1114
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.
Tested working on Linux. Here's the procedure:

- build ccminer with cuda 7.5 as usual
- remove all the object files in the neoscrypt folder: rm neoscrypt/*.o
- edit Makefile
- replace all the instances of "7.5" with "6.5"
- run make again
- you just made a ccminer executable with all the algos on 7.5 except neoscrypt on 6.5 :-)
- revert the Makefile changes to build it again in the future

If you find this useful, please donate to the BTC address in my signature.

If I undertstand the result neo is compiled with 6.5 and eveyrthing else with 7.5. Then it is all linked
with 6.5. I'm not sure linking object files from different compilers is safe.

I prefer to use a script to select the prefered executable based on the algo. Less work, less risk, more
flexible.

Edit: but it's still a workaround. Wink
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.

This is not a solution, this is a workaround. Smiley

x11 was hashing @ 1900KHASH in release 74 on the 750ti when building with cuda 7.5(x86) I modded it to go to 2940 in release 76, wich is a bit below release 74 buildt with cuda 6.5.

After changing the compiler version in the project, all the kernals need to be retuned.
I haven't finished yet.

My 5% increase in the quark hashrate has only given me a little more than 0.1 btc in donations.
Please donate some more guys. I need your support.

Here you can see my latest opensource work:

https://github.com/sp-hash/ccminer/commits/windows

...

Since cude 7.5 is increasing the register and stack usage in the table based AES algos, I reduced the number of threads per block to compensate. I also had to move some precalc tables into the instruction cache. (inlined in the instructions.) I also have reduced the codesize abit since cuda 7.5 seems to loop bether. etc..etc..
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Got the test results on a EVGA 980 reference standard clocks.

                 76-7.5       76-6.5        74-6.5
quark        19.9          19.3           19.3
x11           9850         9920          10000
lyra2v2     10.7          11.4           11.6
neo           220           635            640

Thanks for testing. Can you please try to compile release 74 with x86 build and cuda 7.5?
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I've found a solution to the neoscrypt problem: building a cuda 6.5/7.5 hybrid.
Tested working on Linux. Here's the procedure:

- build ccminer with cuda 7.5 as usual
- remove all the object files in the neoscrypt folder: rm neoscrypt/*.o
- edit Makefile
- replace all the instances of "7.5" with "6.5"
- run make again
- you just made a ccminer executable with all the algos on 7.5 except neoscrypt on 6.5 :-)
- revert the Makefile changes to build it again in the future

If you find this useful, please donate to the BTC address in my signature.
legendary
Activity: 1470
Merit: 1114
Compiling on Windows is a pain. I have to rebuild my compile environment every month because VS shuts down
unless I register. I had to create a virtual machine snapshot before installing VS the first time, otherwise the
tombstone from the previous install would trigger the forced registration imediately.

Compile with visual studio express 2013. It is free

VS community is advertised as free but it only works for a month without registering. The registration might be
free but I haven't tried it. I don't recall beig able to download VS Express, I'll take another look, thanks.

Got the test results on a EVGA 980 reference standard clocks.

                 76-7.5       76-6.5        74-6.5
quark        19.9          19.3           19.3
x11           9850         9920          10000
lyra2v2     10.7          11.4           11.6
neo           220           635            640

These results confirm the increase in quark is purely due to cuda 7.5. It also shows no degradation
in cuda 6.5 performance, a win-win for quark.

The neo degradation is also purely due to cuda 7.5 with no significant difference between r74 & r76
when compiled with cuda 6.5.

X11 is interesting, the 76-6.5 performance is lower than 74-6.5 and the 76-7.5 performance is lower still, a lose-lose.

Since quark was the focus of the most recent changes it proves that cuda 7.5 can perform better than 6.5. I hope
these results translate to the other algos.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Compiling on Windows is a pain. I have to rebuild my compile environment every month because VS shuts down
unless I register. I had to create a virtual machine snapshot before installing VS the first time, otherwise the
tombstone from the previous install would trigger the forced registration imediately.

Compile with visual studio express 2013. It is free
legendary
Activity: 1470
Merit: 1114
yep 76 is a 50% slow down for neo scrypt  from 586 to 234
 and quark is up 1.04 % from 17206 to 17972
time to run two folders

Has anyone compiled r76 for sm5.2 using cuda 6.5 (windows or linux to do a direct comparison with cuda 7.5?
I could only do it with 750ti's on Linux and there was virtually no difference.

Edit: however, they were both slower than 1.5.74-cuda6.5.

The numbers (gpu0/gpu1) both EVGA 750ti SC no OC.


            1.5.74(6.5)    1.5.76(6.5)     1.5.76(7.5)
x11         3090/3145      2985/3050       2980/3045
quark       6360/6450      6335/6380       6340/6400
lyra2v2     4715/4755      4680/4715       4680/4715


What a pain getting this to display correctly, this ain't WYSIWYG.




im compiling the the latest as we speak ... i was well enough to drive - so im here in the office at the moment ...

the only thing i can give a comparison rate to is quark ( from 74 - 76 ) on c7.5 ...

that doesnt exactly hep what you are asking - but it may be of interest to you regarding the last compiles i had with fedora ...

btw - i have upgraded to fedora23x64 on the test machine - and compiling with c7.5 and ccminer-spmod76 ...

#crysx

Unfortunately with Linux there are very few versions that support both cuda 6.5 & 7.5 so it's difficult to do
direct comparisons. I'm compiling r76 for cuda 6.5 on Windows (had to fiddle with the project file) so I can
directly compare the difference between r74-cuda6.5 vs r76-cuda6.5 and r76-cuda6.5 vs r76-cuda7.5.

Compiling on Windows is a pain. I have to rebuild my compile environment every month because VS shuts down
unless I register. I had to create a virtual machine snapshot before installing VS the first time, otherwise the
tombstone from the previous install would trigger the forced registration imediately.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
at least on my end anyway ... as you know - my systems seem to go the complete opposite of what most people were getting ...

look out for the hashrate to show on you link ... https://www.nicehash.com/?p=miners&addr=1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd&a=12&l=0 ...

ill do the test on quark as that is a good comparison from the previous v74 ( that was compiled in f22x64c75 ) ...

#crysx

Thanks crysx. The difference is mainly for the compute 5.2 devices, but I hope you will get a boost on your system as well.

all good sp ...

compiled and rebooted just now ...

lets see ...

edit - its active ...

-------

Compiled with GCC 5.1 using Nvidia CUDA Toolkit 7.5

  Based on pooler cpuminer 2.3.2 and the tpruvot@github fork
  CUDA support by Christian Buchner, Christian H. and DJM34
  Includes optimizations implemented by sp, klaust, tpruvot, tsiv and pallas.

[2015-12-12 18:39:17] Adding 3774720 threads to intensity 22, 7969024 cuda threads
[2015-12-12 18:39:21] Starting Stratum on stratum+tcp://donate-sp.granitecoin.com:7012/
[2015-12-12 18:39:21] NVML GPU monitoring enabled.
[2015-12-12 18:39:21] 3 miner threads started, using 'quark' algorithm.
[2015-12-12 18:39:21] Binding thread 0 to cpu 0 (mask 1)
[2015-12-12 18:39:21] Binding thread 2 to cpu 0 (mask 4)
[2015-12-12 18:39:21] Binding thread 1 to cpu 1 (mask 2)
[2015-12-12 18:39:22] Stratum difficulty set to 0.04
[2015-12-12 18:39:23] donate-sp.granitecoin.com:7012/ quark block 2557680
[2015-12-12 18:39:28] donate-sp.granitecoin.com:7012/ quark block 2557681
[2015-12-12 18:39:36] donate-sp.granitecoin.com:7012/ quark block 2557682

------

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
at least on my end anyway ... as you know - my systems seem to go the complete opposite of what most people were getting ...

look out for the hashrate to show on you link ... https://www.nicehash.com/?p=miners&addr=1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd&a=12&l=0 ...

ill do the test on quark as that is a good comparison from the previous v74 ( that was compiled in f22x64c75 ) ...

#crysx

Thanks crysx. The difference is mainly for the compute 5.2 devices, but I hope you will get a boost on your system as well.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Here is the checklist to reach the optimal speed:

1. Latest drivers
2. Windows x86
3. Compile for cuda 7.5 (or use my binary)

I get a 5% boost on my testrigs on compute 5.2 devices in the quark algo.

If you do this:

1. Linux
2. 64bit build
3. cuda 6.5

You get a 0% boost in the quark algo.

though linux build will show in a few minutes as to what it does compiled in f23x64c75 ( ie - fedora 23 x64 cuda 7.5 - im shorthanding from now on ) ...

at least on my end anyway ... as you know - my systems seem to go the complete opposite of what most people were getting ...

look out for the hashrate to show on you link ... https://www.nicehash.com/?p=miners&addr=1CTiNJyoUmbdMRACtteRWXhGqtSETYd6Vd&a=12&l=0 ...

ill do the test on quark as that is a good comparison from the previous v74 ( that was compiled in f22x64c75 ) ...

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
yep 76 is a 50% slow down for neo scrypt  from 586 to 234
 and quark is up 1.04 % from 17206 to 17972
time to run two folders

Has anyone compiled r76 for sm5.2 using cuda 6.5 (windows or linux to do a direct comparison with cuda 7.5?
I could only do it with 750ti's on Linux and there was virtually no difference.

Edit: however, they were both slower than 1.5.74-cuda6.5.

The numbers (gpu0/gpu1) both EVGA 750ti SC no OC.


            1.5.74(6.5)    1.5.76(6.5)     1.5.76(7.5)
x11         3090/3145      2985/3050       2980/3045
quark       6360/6450      6335/6380       6340/6400
lyra2v2     4715/4755      4680/4715       4680/4715


What a pain getting this to display correctly, this ain't WYSIWYG.




im compiling the the latest as we speak ... i was well enough to drive - so im here in the office at the moment ...

the only thing i can give a comparison rate to is quark ( from 74 - 76 ) on c7.5 ...

that doesnt exactly hep what you are asking - but it may be of interest to you regarding the last compiles i had with fedora ...

btw - i have upgraded to fedora23x64 on the test machine - and compiling with c7.5 and ccminer-spmod76 ...

it is this test machine i will be placing to mine for sp as a donation ... so if all is well - it should be a nice gain - unless fedora / cuda is broken in this version ... if not - im looking forward to some gains ... and sp - expect some hashrate on the donation link for you ...

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer

Here is the checklist to reach the optimal speed:

1. Latest drivers
2. Windows x86
3. Compile for cuda 7.5 (or use my binary)

I get a 5% boost on my testrigs on compute 5.2 devices in the quark algo.

If you do this:

1. Linux
2. 64bit build
3. cuda 6.5

You get a 0% boost in the quark algo.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Has anyone compiled r76 for sm5.2 using cuda 6.5 (windows or linux to do a direct comparison with cuda 7.5?
I could only do it with 750ti's on Linux and there was virtually no difference.
Edit: however, they were both slower than 1.5.74-cuda6.5.

This fork is now a cuda 7.5 fork. Upgrade to the latest drivers and compile with cuda 7.5 to get a boost in quark. (compute 5.2 devices) gtx 950, gtx 960, gtx 970, gtx 980, gtx 980ti...
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
tanx for the update sp ... cant wait to compile and see ...
I just released 76 with increased hashrate all over the place. But some of the algos are still slower than release 74. Quark has the most boost this time, and mostly on compute 5.2 devices.
ok great ...
then thats the first thing i will test when i get into the office tomorrow ...
a miner will hash quark on your donation address ( donate-sp.granitecoin.com ) and see how it tests ...
look out for the test on nicehash ( eu stratum ) ...
tanx again mate ... i hope it compiles and works well on the new system ...
#crysx

Just tested quark release 76 on my 980ti g1 gigabyte windforce oc.
27.3 (+1,4MHASH) on the standard clocks(New WORLD RECORD)!
Release 74 is only doing 25.9MHASH.

On the EVGA SSC gtx 960. 11 MHASH on standard clocks. up 0.5 MHASH from release 74



is that stock clocks sp? ...

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
yep 76 is a 50% slow down for neo scrypt  from 586 to 234
 and quark is up 1.04 % from 17206 to 17972
time to run two folders

Your math is wrong.

from 17206 to 17972 is a 4,4519353713821% increase of the hashrate.
Jump to: