Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1015. (Read 2347601 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
There is something wrong with the dependencies of groestl_functions_quad.cu: modifying it doesn't trigger recompile of the files that include it, like groestlcoin.cu
legendary
Activity: 1470
Merit: 1114
you dont have to use the distribution packages to use cuda... all the versions can be used (and even mixed on linux) for a project.

I repeat it again... there was an improvement in the 7.5 RC (over the 7.0 Wink ) and a big part of the speed reduction is related to the fine tuning which should be redone due to the different register count in the output "binaries". Kernels which were "oppressed" (with a low limit compared to the required count of registers) are slower

There are two issues here and I don't know which one you are responding to.

Yes multiple versions of cuda can be used on the same Linux installation, just need to specify
--with-cuda when running ./configure. The RPM repo supports this as well with seperate meta-packages
for cuda 6.5 and 7.

Given that cuda 6.5 is still the best for mining at this time the other issue is finding a Linux distro that
has 6.5 available from Nvidia, either in the distro's package format or run file. The last 6.5 for Ubuntu was
14.04, for Fedora it was 20.

I know that in Fedora some RPMs for release n will work in release n+1. In fact the Fedora 18 version of
Virtualbox was carried over all the way to 21. it's only in Fedora 22 that VBox created a new release version.

I don't know if this is the case with cuda but it might be worth a try.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I see, that makes more sense... Was just wondering why you guys are optimizing groestl if it's not that popular...

Groestl is used in quark (possibly others), so faster Groestl means faster quark.

and x11 and x13 and lyra2 etc....
but, the code is only shared at times, so it takes work to spread to all the algos.
member
Activity: 111
Merit: 10
I see, that makes more sense... Was just wondering why you guys are optimizing groestl if it's not that popular...

Groestl is used in quark (possibly others), so faster Groestl means faster quark.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
you dont have to use the distribution packages to use cuda... all the versions can be used (and even mixed on linux) for a project.

I repeat it again... there was an improvement in the 7.5 RC (over the 7.0 Wink ) and a big part of the speed reduction is related to the fine tuning which should be redone due to the different register count in the output "binaries". Kernels which were "oppressed" (with a low limit compared to the required count of registers) are slower
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
They probably don't care about mining and rendering and other CUDA calculations might even improved.

the funny thing is - they originally built cuda to spearhead the 'render farm' movie niche and scientific market that was wide open a little while ago ...

why would they screw things up now? ...

you are probably right though - they dont care about mining and the rest of the community :| ...

#crysx

We should probably make them more aware of the mining community.

To add to my previous comment:


legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
They probably don't care about mining and rendering and other CUDA calculations might even improved.

the funny thing is - they originally built cuda to spearhead the 'render farm' movie niche and scientific market that was wide open a little while ago ...

why would they screw things up now? ...

you are probably right though - they dont care about mining and the rest of the community :| ...

#crysx
sr. member
Activity: 271
Merit: 251
Not for sale. But you can keep donating, so I can publish small increases in the hashrates for free.

Next up is another groestlcoin optimalization.

From 23,7 to 24 MHASH on the gtx 970 windoforce oc.(stock) (will submitt later tonight)

The speed on AMD cards with the pallas opensource is:

Quote
v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s


Mine is faster on Tahiti - bin is public.

that falls into the "v2" category ;-)
in order to avoid confusion, I might link your tahiti bin in the OP and add its hashrate to that list.
are you ok with that?

Yeah, go ahead; just credit me.

I was looking around about mining groestl and it's only groestlcoin, right?
Only one exchange without volume, so bag holding only?

you can mine diamond as well.

I see, that makes more sense... Was just wondering why you guys are optimizing groestl if it's not that popular...
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
They probably don't care about mining and rendering and other CUDA calculations might even improved.
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
My fork only works with cuda 6.5 or 7.5.

Cuda 7.5beta shows a drop in hashrate of around 30% in the x11 algorithm but it validates. A 750ti is down from 3MHASH to 1.9


ok - so cuda 7.0 is really a dead way to go ...

i wonder if im better to downgrade back and get the test machine runnign under 6.5 again ... this will take some work and time - but better to go back to a working system than a step forward with less has ...

i think waiting for cuda 7.5 to be running mainstream is probably the better way to go ...

tanx again sp ...

#crysx

Problem is cuda 6.5 isn't supported on newer Linux releases. The latest supported Fedora is 20
which EOLed a month ago.

Seems odd that Nvidia would drop cuda 6.5 if cuda 7 was performing poorly.

i agree - it does seem odd ...

unless they have something up their sleeve that they are about to release - then i would have summized that the future releases would not only be backward compatible but also at the same level of competency for compilations ...

instead - its worse hashrates - even worse compilation issues - and a 7.5 rc that gets release very shortly after the 7.0 release ... only to push fedora cuda repos forward without backward compatibility ...

its just downright silly ...

ill be reinstalling fedora 20 x64 with cuda 6.5 on the test machine again ... tomorrow or the next day ...

nite all ...

#crysx
member
Activity: 94
Merit: 10
Pretty nice increase in hashrate for Axiom with latest cpu miner:

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Not for sale. But you can keep donating, so I can publish small increases in the hashrates for free.

Next up is another groestlcoin optimalization.

From 23,7 to 24 MHASH on the gtx 970 windoforce oc.(stock) (will submitt later tonight)

The speed on AMD cards with the pallas opensource is:

Quote
v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s


Mine is faster on Tahiti - bin is public.

that falls into the "v2" category ;-)
in order to avoid confusion, I might link your tahiti bin in the OP and add its hashrate to that list.
are you ok with that?

Yeah, go ahead; just credit me.

I was looking around about mining groestl and it's only groestlcoin, right?
Only one exchange without volume, so bag holding only?

you can mine diamond as well.
sr. member
Activity: 271
Merit: 251
Not for sale. But you can keep donating, so I can publish small increases in the hashrates for free.

Next up is another groestlcoin optimalization.

From 23,7 to 24 MHASH on the gtx 970 windoforce oc.(stock) (will submitt later tonight)

The speed on AMD cards with the pallas opensource is:

Quote
v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s


Mine is faster on Tahiti - bin is public.

that falls into the "v2" category ;-)
in order to avoid confusion, I might link your tahiti bin in the OP and add its hashrate to that list.
are you ok with that?

Yeah, go ahead; just credit me.

I was looking around about mining groestl and it's only groestlcoin, right?
Only one exchange without volume, so bag holding only?
legendary
Activity: 1470
Merit: 1114
My fork only works with cuda 6.5 or 7.5.

Cuda 7.5beta shows a drop in hashrate of around 30% in the x11 algorithm but it validates. A 750ti is down from 3MHASH to 1.9


ok - so cuda 7.0 is really a dead way to go ...

i wonder if im better to downgrade back and get the test machine runnign under 6.5 again ... this will take some work and time - but better to go back to a working system than a step forward with less has ...

i think waiting for cuda 7.5 to be running mainstream is probably the better way to go ...

tanx again sp ...

#crysx

Problem is cuda 6.5 isn't supported on newer Linux releases. The latest supported Fedora is 20
which EOLed a month ago.

Seems odd that Nvidia would drop cuda 6.5 if cuda 7 was performing poorly.
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
My fork only works with cuda 6.5 or 7.5.

Cuda 7.5beta shows a drop in hashrate of around 30% in the x11 algorithm but it validates. A 750ti is down from 3MHASH to 1.9


ok - so cuda 7.0 is really a dead way to go ...

i wonder if im better to downgrade back and get the test machine runnign under 6.5 again ... this will take some work and time - but better to go back to a working system than a step forward with less has ...

i think waiting for cuda 7.5 to be running mainstream is probably the better way to go ...

tanx again sp ...

#crysx
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
These clock and power limit flags only works for x64 builds on windows and for linux (require nvml.dll which doesnt exists for 32 bit binaries)
sp just call the nvidia-smi, but mine is more complex, i check first the possible values and set them according to possible values. But about these clocks, it just a limit, not an overclock... same with sp version... The plimit works (i even exploded a PSU with that)

NVIDIA has removed support for x86 in their API, so I just call their command line tool. Seems to work to change the clocks on the gtx970/980 if you know the valid clocks. 750ti is not supported.

tanx sp ...

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
These clock and power limit flags only works for x64 builds on windows and for linux (require nvml.dll which doesnt exists for 32 bit binaries)
sp just call the nvidia-smi, but mine is more complex, i check first the possible values and set them according to possible values. But about these clocks, it just a limit, not an overclock... same with sp version... The plimit works (i even exploded a PSU with that)

NVIDIA has removed support for x86 in their API, so I just call their command line tool. Seems to work to change the clocks on the gtx970/980 if you know the valid clocks. 750ti is not supported.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
My fork only works with cuda 6.5 or 7.5.

Cuda 7.5beta shows a drop in hashrate of around 30% in the x11 algorithm but it validates. A 750ti is down from 3MHASH to 1.9
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---

Tpruv's windows version does allow setting gpu clock on the command line.   This is a feature I'd like to see in SP's mods.
 
fyi: I was Angora but lost that account as pw got changed and could not get the reset email.

These clock and power limit flags only works for x64 builds on windows and for linux (require nvml.dll which doesnt exists for 32 bit binaries)

sp just call the nvidia-smi, but mine is more complex, i check first the possible values and set them according to possible values. But about these clocks, it just a limit, not an overclock... same with sp version... The plimit works (i even exploded a PSU with that)

so so you have instructions on how to use these settings? ...

what the commanline parameters are - how to implement them on the commandline - what limits you can have ...

im curious as to how to get all this running in fedora 20 x64 - cuda 6.5 ...

empsylon3 - maybe you can answer this ... i have upgraded one of the systems to fedora 21 x64 and cuda 7.0 ... compiled both sps fork and tpruvots fork and both are giving cpu validation errors on the two algos that i tested - x11 and quark ...

what could this be? ... and how do i fix it? ...

tanx ...

#crysx
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer

Tpruv's windows version does allow setting gpu clock on the command line.   This is a feature I'd like to see in SP's mods.
 
fyi: I was Angora but lost that account as pw got changed and could not get the reset email.

These clock and power limit flags only works for x64 builds on windows and for linux (require nvml.dll which doesnt exists for 32 bit binaries)

sp just call the nvidia-smi, but mine is more complex, i check first the possible values and set them according to possible values. But about these clocks, it just a limit, not an overclock... same with sp version... The plimit works (i even exploded a PSU with that)
Jump to: