Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 862. (Read 2347659 times)

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.

I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.


Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)

What hashrates are you getting?
sr. member
Activity: 438
Merit: 250
Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..
The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.
That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti
Why the big difference?

The 780ti only have 64 registers while the maxwell have 256.

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..
The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.
That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti
Why the big difference?

The 780ti only have 64 registers while the maxwell have 256. With 64 registers, the 780ti spills to the stack. But in the memory algos like Lyra2v2 the performance is bether. Mostly because djm34 have made seperate kernals for compute 3.5 and 5.0.  In my maxwell mod I removed the optimized compute 3.5 kernals, because they used 25% more memory. The maxwell is caching only 32bit  in a cacheline while the kepler cache 128 bits.
legendary
Activity: 1049
Merit: 1001
Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..

The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.


That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti


Why the big difference?
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
10x speed isn't going to happen, maybe 2x tops if I had to guess. The big advantage will come from better efficiency due to the switch from 28nm to 16nm architecture.

I actually hope Pascal won't be very good because that would render our current cards pretty much useless for mining with a huge drop in resell value.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..

The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.
legendary
Activity: 1049
Merit: 1001
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.

1.5x sounds more realistic considering it will be 16nm
sr. member
Activity: 438
Merit: 250
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



The 10 times figure came from some marketing talk by NVidia CEO Jen-Hsun Huang, which had nothing to do with hard benchmark figures:



I think this had something to do with machine learning with a crapload of interconnected cards.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.

damn ...

well - when we see pascal out in the stores - we should see some degree of increase ... and if its anything larger than 25% then we are doing well ...

if it gets much larger than 25% - say 50-100% - then we will be in for a wild ride pallas ... and some massive hashrates ...

ill start a miner system and btc address just for them next year ...

#crysx
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.
full member
Activity: 231
Merit: 150

-snip-

have you tried this with v74? ...

or with ftc? ...

#crysx

No not yet with either.

Edit: Will test with .74 now.

Sorry real life has had me working, not much time to keep up with the thread and versions.

i think a lot of us know how you feel mate ... im in exactly the same boat ... Wink ...

tanx ...

#crysx

v74 is working with some loss of hash over
v72 at -i 11 - 320
v74 -i 11 - 266/316

[2015-11-17 01:37:28] accepted: 2/3 (66.67%), 275.05 kH/s yes!
Getting noooo's with both at times, didn't see these nooo's with sp v54

Edit: after double checking looks like v72 & v74 are doing about the same hash numbers on good found blocks.
Not doing to bad 350 coins found since my first post today, but I do have 3 other cards besize the 960 working.
Three of the 50 coin blocks were found on a pair of R270 non X cards. Nothing found on the 760GTX yet.

legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

that can process x11? ... ...

Wink ...

#crysx

The Cyclone V itself isn't big enough. I do know how to get boards that are on the cheap.

would they be difficult to code to do x11 optimized efficiently? ...

#crysx

It's not really coding, it's chip design. And it'd be VERY tedious, but doable.

tedious and doable - but worth doing? ...

#crysx

DEFINITELY.

well - that says it all doesnt it Smiley ...

ill pm you for any details you wish to share - and whether you are interested in maybe doing it as a project ...

you know all the details - so its just a matter of when where and how much? ...

hang on a moment ... thats a proposition for a service - but not this one ... ok ...

Tongue ...

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

-snip-

have you tried this with v74? ...

or with ftc? ...

#crysx

No not yet with either.

Edit: Will test with .74 now.

Sorry real life has had me working, not much time to keep up with the thread and versions.

i think a lot of us know how you feel mate ... im in exactly the same boat ... Wink ...

tanx ...

#crysx
full member
Activity: 231
Merit: 150

-snip-

have you tried this with v74? ...

or with ftc? ...

#crysx

No not yet with either.

Edit: Will test with .74 now.

Sorry real life has had me working, not much time to keep up with the thread and versions.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
yeah. The Neoscrypt kernal is pretty good.

Looks like the latest changes in the lyra2v2 is hurting the 980/980ti performance.. Since I have added a memory access (level1 cache) I think the tpb(threads per block) needs  to be retuned.
search for these lines in the code (lyra2rev2.cu)

and change tpb

1,2,3,4,5,6,7.... 256?

   else if (strstr(props.name, "980"))
   {
      intensity = 256 * 256 * 18;
      tpb = 8;
   }

and...

   else if (strstr(props.name, "980 Ti"))
   {
      intensity = 256 * 256 * 18;
      tpb = 8;
   }


NEOSCRYPT--

Neoscrypt will not solo-mine.  None of the wallets (phoenixcoin, feathercoin, UFOcoin, etc) communicate properly with the miner.  I keep getting "Invalid JSON data", and "Unable to start getwork" errors with tpruvot's ccminer, and no error information with your ccminer.  These are really low diff coins, with small network hash rates.       --scryptr

For sp's .72 don't forget to use the flag --broken-neo-wallet
It appears to work but gets nooo's so far with my test of phoenixcoin.

Command line:
ccminer.exe -a neoscrypt --no-gbt --no-longpoll -o 127.0.0.1:9556 -u phoenixcoinpc -p x -d 0 -i 10 --broken-neo-wallet

phoenixcoin.conf:
Code:
server=1
daemon=1
defaultkey=1
logtimestamps=1
dns=1
addnode=prometheus.phoenixcoin.org:9555
addnode=menoetius.phoenixcoin.org:9555
addnode=atlas.phoenixcoin.org:9555
rpcuser=phoenixcoinpc
rpcpassword=x
rpcport=9556
rpcconnect=127.0.0.1
rpcallowip=192.168.*.*
algo=phoenixcoin

.conf should be located in C:\Phoenixcoin-win64-0.6.5.1\Phoenixcoin\data
in the data folder.



have you tried this with v74? ...

or with ftc? ...

#crysx
full member
Activity: 231
Merit: 150
yeah. The Neoscrypt kernal is pretty good.

Looks like the latest changes in the lyra2v2 is hurting the 980/980ti performance.. Since I have added a memory access (level1 cache) I think the tpb(threads per block) needs  to be retuned.
search for these lines in the code (lyra2rev2.cu)

and change tpb

1,2,3,4,5,6,7.... 256?

   else if (strstr(props.name, "980"))
   {
      intensity = 256 * 256 * 18;
      tpb = 8;
   }

and...

   else if (strstr(props.name, "980 Ti"))
   {
      intensity = 256 * 256 * 18;
      tpb = 8;
   }


NEOSCRYPT--

Neoscrypt will not solo-mine.  None of the wallets (phoenixcoin, feathercoin, UFOcoin, etc) communicate properly with the miner.  I keep getting "Invalid JSON data", and "Unable to start getwork" errors with tpruvot's ccminer, and no error information with your ccminer.  These are really low diff coins, with small network hash rates.       --scryptr

For sp's .72 don't forget to use the flag --broken-neo-wallet
It appears to work but gets nooo's so far with my test of phoenixcoin.

Command line:
ccminer.exe -a neoscrypt --no-gbt --no-longpoll -o 127.0.0.1:9556 -u phoenixcoinpc -p x -d 0 -i 10 --broken-neo-wallet

phoenixcoin.conf:
Code:
server=1
daemon=1
defaultkey=1
logtimestamps=1
dns=1
addnode=prometheus.phoenixcoin.org:9555
addnode=menoetius.phoenixcoin.org:9555
addnode=atlas.phoenixcoin.org:9555
rpcuser=phoenixcoinpc
rpcpassword=x
rpcport=9556
rpcconnect=127.0.0.1
rpcallowip=192.168.*.*
algo=phoenixcoin

.conf should be located in C:\Phoenixcoin-win64-0.6.5.1\Phoenixcoin\data
in the data folder.

legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

that can process x11? ... ...

Wink ...

#crysx

The Cyclone V itself isn't big enough. I do know how to get boards that are on the cheap.

would they be difficult to code to do x11 optimized efficiently? ...

#crysx

It's not really coding, it's chip design. And it'd be VERY tedious, but doable.

tedious and doable - but worth doing? ...

#crysx
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

that can process x11? ... ...

Wink ...

#crysx

The Cyclone V itself isn't big enough. I do know how to get boards that are on the cheap.

would they be difficult to code to do x11 optimized efficiently? ...

#crysx
legendary
Activity: 1049
Merit: 1001
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

I remember looking at the Parallella with Dual core ARM A9 and a 16-core Epihany coprocessor / FPGA

What brand/model did you go with?
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

that can process x11? ... ...

Wink ...

#crysx
Jump to: