Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 961. (Read 2347664 times)

legendary
Activity: 1400
Merit: 1050
Error   18   error : identifier "uint8" is undefined   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   19   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminerError   18   error : identifier "uint8" is undefined   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   19   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   20   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   21   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   22   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   23   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   24   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   25   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   26   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   27   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   28   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   29   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   30   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   31   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   32   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   33   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   34   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   35   error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I. -Icompat -I"compat\curl-for-windows\curl\include" -Icompat\jansson -Icompat\getopt -Icompat\pthreads -I"compat\curl-for-windows\openssl\openssl\include" -I"compat\curl-for-windows\zlib" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"    --keep --keep-dir Release -maxrregcount=80 --ptxas-options=-v --machine 32 --compile -cudart static --ptxas-options="-O3"     -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DUSE_WRAPNVML -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Ox /Zi  /MT  " -o Release\cuda_groestlcoin.cu.obj "C:\ccminer-windows\cuda_groestlcoin.cu"" exited with code 2.   C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 6.5.targets   593   9   ccminer



needs to include cuda_vector.h instead of cuda_helper.h
hero member
Activity: 677
Merit: 500
Error   18   error : identifier "uint8" is undefined   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   19   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminerError   18   error : identifier "uint8" is undefined   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   19   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   20   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   21   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   22   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   489   1   ccminer
Error   23   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   24   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   25   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   26   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   496   1   ccminer
Error   27   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   28   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   29   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   30   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   497   1   ccminer
Error   31   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   32   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   33   error : expected an expression   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   34   error : expected a ")"   c:\ccminer-windows\groestl_functions_quad.cu   501   1   ccminer
Error   35   error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\bin\nvcc.exe" -gencode=arch=compute_50,code=\"sm_50,compute_50\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I. -Icompat -I"compat\curl-for-windows\curl\include" -Icompat\jansson -Icompat\getopt -Icompat\pthreads -I"compat\curl-for-windows\openssl\openssl\include" -I"compat\curl-for-windows\zlib" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v6.5\include"    --keep --keep-dir Release -maxrregcount=80 --ptxas-options=-v --machine 32 --compile -cudart static --ptxas-options="-O3"     -DWIN32 -DNDEBUG -D_CONSOLE -D_CRT_SECURE_NO_WARNINGS -DCURL_STATICLIB -DUSE_WRAPNVML -DSCRYPT_KECCAK512 -DSCRYPT_CHACHA -DSCRYPT_CHOOSE_COMPILETIME -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Ox /Zi  /MT  " -o Release\cuda_groestlcoin.cu.obj "C:\ccminer-windows\cuda_groestlcoin.cu"" exited with code 2.   C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V120\BuildCustomizations\CUDA 6.5.targets   593   9   ccminer


legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Pallas patches broke compilation...  Grin
OOOPS...Redownload ant try to recompilate!
Errores in groestl_functions_quad.cu and cuda_helper.h
Win10 and ms vs2013

I didn't modify cuda_helper.h
Try compiling from scratch.

EDIT: could you please paste the errors here?
hero member
Activity: 677
Merit: 500
Pallas patches broke compilation...  Grin
OOOPS...Redownload ant try to recompilate!
Errores in groestl_functions_quad.cu and cuda_helper.h
Win10 and ms vs2013
member
Activity: 111
Merit: 10
Sorry Sp_ for the 4 pull requests instead of one: I made them on github directly and couldn't find a way to send a single pull request with more patches (or make a single patch from multiple file edits).
I don't believe there is a way via the web interface.  This could be a workaround, but haven't tried it (from:http://stackoverflow.com/questions/17815895/can-i-edit-two-files-then-make-one-commit-using-github-web-based-editor):
    Create a temporary branch, switch to it;
    Edit multiple files, commit each file separately;
    Make pull request;
    Merge pull request and delete temporary branch.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I've pushed some little patches with more compact code and little speedups for quark and x11 (also less build warnings).
Sorry Sp_ for the 4 pull requests instead of one: I made them on github directly and couldn't find a way to send a single pull request with more patches (or make a single patch from multiple file edits).
legendary
Activity: 1400
Merit: 1050
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)

If a kernel is faster it probably also draw more power, which in turn means more heat so higher chance of throttling.
If an enhancement to a kernel has the same performance/watt ratio as the original, the card may throttle and bring the same performance using the same power but a lower clock speed.
I'm talking general as I don't know if it's valid for this specific case.
heat isn't really part of the equation, the 980 never runs hotter than 75°C (and currently are running at 71°C and the limit before throttling is the standard nvidia one 81°C), also the power hungry in this miner is lyra (well... the private one...) basically the other algo don't use a lots of power in lyra2re setup (keccak, bmw256 and blake are very fast algo and are used with a much smaller throughput than the one they can handle alone... so they is no big difference in power consumption in this setup
legendary
Activity: 1764
Merit: 1024
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)

Hypothetical... Faster kernel uses more power > cards get hotter > cards slow down. Although not always true. Sometimes it could mean that the card will use the same amount of power, it'll just be more heavily utilized and produce more heat.

(Pallas beat me to this)
legendary
Activity: 1512
Merit: 1000
quarkchain.io
Just got some GTX950 for weekend testing... Very exited... Smiley
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)

If a kernel is faster it probably also draw more power, which in turn means more heat so higher chance of throttling.
If an enhancement to a kernel has the same performance/watt ratio as the original, the card may throttle and bring the same performance using the same power but a lower clock speed.
I'm talking general as I don't know if it's valid for this specific case.
legendary
Activity: 1400
Merit: 1050
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

legendary
Activity: 1400
Merit: 1050
Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.

You should fork my branch and merge the lyra2 changes. My fork is already 500KHASH faster than the DJM34's opensource without modding the lyra2(only the other algos). Big donations are waiting.


hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

edit: actually the main difference I saw from my original setting, was by raising the intensity (which is a parameter adjustable by the user even in my release)
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.
That's interesting. Any idea why it's not unrolling it automatically in this case?
Did you try this:
for (i = 16; i<18; i++)
   Q = expand32_1(i, M32, H, Q);

I don't know. But I know that in my change the loop was working on constant data, and when I unrolled it manually the constant data was not calculated and less instructions was the result.
legendary
Activity: 1797
Merit: 1028
11 Mh/s QUARK, RELEASE 65---

With a little tuning, and using the "cpu-mining", "-C" switch, I was able to get these results with my Win 7 x64 work computer and an EVGA GTX 960 SSC graphics card:


EVGA GTX 960 SSC mining Quark

The card is mining with SP-mod release 65, and a +80 core/+240 mem overclock.  There may be room for better and faster tuning, but this appears to be a stable setting for my machine and card.  Higher overclocks bring as much as 11.2Mh/s results, but have been less stable.

Earlier this week, this card was mining Quark at 10.6Mh/s. With the cpu-mining switch, "-C", performance has improved.  My other rigs on Linux have shown similar gains.  My 6x EVGA 750ti FTW rig mines at 40Mh/s, up from 38.5Mh/s, and the cards run from 6.6Mh/s to 6.8Mh/s each.  My EVGA GTX 970 FTW+ cards now mine Quark at 16.5Mh/s each, up from 14-15Mh/s each.

I hope the other bugs are worked out, I'd like to try solo-mining VertCoin.  I also noticed the lower poolside VTC hash-rate reports within the last 2 releases, hope it is fixed.       --scryptr

EDIT:  Better results are obtained when using an intensity slightly less than the maximum acceptable/stable.  My launch string: ./ccminer -a quark -i 23.9 -C --cpu-priority 5 -o stratum+tcp://quark.pool.com:port -u a -p x


EVGA GTX 960 SSC with OverClock, mining Quark

My clocks are currently +90core/+270mem.  My results are +160kh/s from the first (top) posted pic.  Adjust per your hardware.       --scryptr
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.

You should fork my branch and merge the lyra2 changes. My fork is already 500KHASH faster than the DJM34's opensource without modding the lyra2(only the other algos). Big donations are waiting.

You have done some improvements in x11(simd) and x13 so your handle is still in the credits. But this is 1 year ago.

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.

That's interesting. Any idea why it's not unrolling it automatically in this case?
Did you try this:

for (i = 16; i<18; i++)
   Q = expand32_1(i, M32, H, Q);
member
Activity: 70
Merit: 10
That in addition to VS insisting on rebuilding EVERYTHING after changing something for a single source file in the project file, gotta love it.

Pretty sure one of my PRs from yesterday should've taken care of that.  Unless you're touching a header, in which case you're probably up a creek.  I think every header includes ever other header.  Embarrassed
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
@joblo, glad you got linux OC working. which release are you running?

it takes a little digging with google and man pages, and it's definitely not user friendly (i.e. work locally first to setup up initial OC of multiple cards, then deploy as headless SSH), but does work.

Currently using Fedora 20. Although it's EOL it's the last Fedora release supported by cuda 6.5.
I'm crossing my fingers that cuda 7.5 gets optimized soon so I can upgrade to Fedora 22 or Centos 7.

same here joblo ...

all nvidia based miners - running fedora 20 x64 with all the latest dnf updates ...

all amd based miners - running fedora 19 x64 with all the latest dnf updates also ...

f20x64 - cuda 6.5 ...

i have one f22x64 cuda 7.0.28 machine that IS running - ccminer-spmod 1.5.64 using x11 - and its fine ... though hashrate is about 200KH under the 6.5 compiles ... many of the other algos ( including quark and lyra2v2 ) are 'cpu validation error' persistent ...

when the donation links are up and running - and all my other jobs are done ( official rename of granitecoin and logo and website ) - then ill work on the recompile and adjustment of oc and OS test also - with centos 7 x64 vps ... i have about 7 of those at the moment - and soon to grow to a LOT more vps in centos 7 x64 for various applications ...

i would be VERY interested if there is a dedicated page / link / site specifically for linux / fedora / oc - so that we can reference it all to ... if not - ill make one ... i think we all need it when trying to setup ( and also help ) the systems for the linux savvy ... there is just too much to wade through to get the 'right' info ...

im back online for the next few days - so off to compile the 'new' ccminer-spmod 1.5.65 in both cuda 6.5 ( f20x64 ) and 7.0 ( f22x64 ) ...

wish me luck Smiley ...

btw - tsiv ... if you read this ... i have not heard back from the pm i sent you ... i would really like your details also - as i cant setup a donation server without them ... ill be publishing them in the next day or so when i iron out the little issues i have currently with them ...

tanx ...

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.
Jump to: