Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1043. (Read 2347601 times)

sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
http://hashpower.co/  (yaamp clone) is currently paying 0.7BTC/GHASH for quark. Have anyone tried this pool?
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Same on AMD (Omega drivers) performance is lost compared to the 14.6, 14.7 drivers.

Depends on tha hash and on the card chip. Hawaii groestl got a 25% percent boost on 14.12 and whirlpoolx +10% on 15.3, for example.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
slower than 6.5 on windows
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
X11 with cuda7 is 10% slower. How fast is it with cuda 7,5?
legendary
Activity: 1400
Merit: 1050
The compiler in the cuda 7 produces shitty code. all the AES algos got increased register count and the program is spilling memory, performance is lost, and the hash is broken.

Same on AMD (Omega drivers) performance is lost compared to the 14.6, 14.7 drivers.


did you get an access violation in cuda_hefty ? How did you solved it ?  ;

cuda 7.5 gives however some performance boost in lyra  Grin

The main problem is that it will be difficult to stay on some older version of cuda forever...
legendary
Activity: 1400
Merit: 1050
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system



Surely there has to be a workaround. I mean the memory/swap doesn't even seem to be allocated let alone used, not even for a second.
Something like initializing the cards one after the other instead of all at the same time or something? Or giving the cards different jobs instead of working together on one big job? I have no idea but I'm sure there's a way.

Also agree, the same thing was happening to me with Neoscrypt and just had to throw more memory at it even though system memory basically isn't used at all.

If it just uses it to 'load' into the vram on the miner, a asynchronous load should help with it (load each card into memory, then into vram one at a time). Right now though I don't really see any indication of memory usage on the system.
if you open msi AB and watch both ram and pagefile graphics, you'll see it gets allocated (more on the pagefile than on the memory) so may-be trying to increase pagefile could work.
There isn't really a work around on the code side, global memory variables have to be allocated from the host and cudamalloc works in mysterious way...)
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
The compiler in the cuda 7 produces shitty code. all the AES algos got increased registercount and the program is spilling memory, performance is lost, and the hash is broken.

Same on AMD (Omega drivers) performance is lost compared to the 14.6, 14.7 drivers.

legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer

The windows machines allow for easy software overclocking.  I still need to learn the command line API flags for Linux, and probably need to re-install Linux with the latest drivers for proper use.

If I move to CUDA Toolkit 7.5, will the SP_ releases still compile on Linux?       --scryptr

Yes it will compile but most of the sp "fine tuning" is made "at the register" ... a kernel use a given number of registers which can diffear against the platform (os and 32/x64) and also the gpu sm.
If you change the OS or the sdk, some kernels will require to be retuned to fit a certain number of registers (one more reg. can reduce a lot the overall speed). I made this work for linux with nvprof because its faster to do and in general also benefits on windows...

The linux driver in the cuda 7.5 RC is the 352.07 and is older than the one i recommend (352.21) which have the power limit features. Else, you can install it easily on all distributions with the .run (tested ubuntu 14, debian 7, slackware and fedora 22) and both can be installed at once.

Regarding the "overclocking" functions, i think nvidia made a step, but didnt really finish the implementation... Power limit values seems to work, application clocks not sure except it change the pstate to P0...

http://yiimp.ccminer.org/

This pool looks promising. Can you please add

sharkcoin(quark),
Digibyte(skein)
Myriadcoin(skein)


Actually working on a proper way to mine only the coin set by your address, its why there is only one coin per algo for the moment...
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
SP_ RELEASE dot 54--
SP_'s reduction of register use in Lyra2 appears to have reduced some of the memory requirement.    I have been able to increase my intensity setting from my old standard of "-i 16.5" to higher values and see hash rate improvement, but the setting varies per machine.  My initial results, all for Lyra2:
  GTX 750ti FTW - 1080-1100kh/s per card  (Linux)
  GTX 750ti SC - 1140-1150kh/s per card (Win 8 )
  GTX 960 2GB SSC - 1220-1240kh/s per card (Win 7)
  GTX 960 4GB FTW - 1220-1240kh/s per card (Win 8 )
  GTX 970 4GB FTW+ - 2Mh/s per card (Linux)
The windows machines allow for easy software overclocking.  I still need to learn the command line API flags for Linux, and probably need to re-install Linux with the latest drivers for proper use.
If I move to CUDA Toolkit 7.5, will the SP_ releases still compile on Linux?       --scryptr

Since I use half the threads per block compared to djm34's version, you can add 1 to the maxintensity in the old miner and is still runs on the 750ti.
But running it without the intensity parameter should give a performance increase as well. Crypto mining blog messured it to be +150KHASH on the gtx 980 with the default intensity.

Since I am not a registred cuda developer, I cannot download cuda 7.5 and test.
legendary
Activity: 1764
Merit: 1024
7x970 - Lyra2 won't start (out of memory)
4GB system memory

-Upgrade to the latest NVidia drivers (22-jun-2015)
-Add 16GB virtual ram

NVIDIA fixed a memory allocation bug in their latest driver.

If it still doesn't work. reduce the intensity

f.eks -i 17

I understand the whole throw more memory at it thing or reduce intensity, we basically went over the same thing when I was having Neo problems due to out of memory. Is there any reason the system doesn't actually use any memory and it's still getting these messages? The memory usage goes up slightly, but if you look at Resource Monitor, the system is barely using any of the available memory... Pagefile or Hardware.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Its wrote on the main page, Yiimp is not an "autotrade" platform... So like others pools you mine the currency you want with the -right- currency address. I dont want to pay in VTC (or BTC) the whole china which is using SHA farms
The pool is working and pay what is mined... I don't want a second exchange full time job Wink Consider the fees as a donation for the new algos... Some are set very high because we are doing "private" tests... you can still mine on those but its made to reduce "anonymous" users...

http://yiimp.ccminer.org/

This pool looks promising. Can you please add

sharkcoin(quark),
Digibyte(skein)
Myriadcoin(skein)

sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
7x970 - Lyra2 won't start (out of memory)
4GB system memory

-Upgrade to the latest NVidia drivers (22-jun-2015)
-Add 16GB virtual ram

NVIDIA fixed a memory allocation bug in their latest driver.

If it still doesn't work. reduce the intensity

f.eks -i 17
legendary
Activity: 1764
Merit: 1024
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system



Surely there has to be a workaround. I mean the memory/swap doesn't even seem to be allocated let alone used, not even for a second.
Something like initializing the cards one after the other instead of all at the same time or something? Or giving the cards different jobs instead of working together on one big job? I have no idea but I'm sure there's a way.

Also agree, the same thing was happening to me with Neoscrypt and just had to throw more memory at it even though system memory basically isn't used at all.

If it just uses it to 'load' into the vram on the miner, a asynchronous load should help with it (load each card into memory, then into vram one at a time). Right now though I don't really see any indication of memory usage on the system.
newbie
Activity: 54
Merit: 0
Release 54

2 Gtx 970 / Nicehash (QUARK still the best payout BTC BTC  Grin )
EVGA 04G-2974-KR GeForce GTX 970 Superclocked 4GB

QUARK (0.014 BTC / day atm)
ccminer.exe -i 22.9 -r 5 -R 10 --cpu-priority 5 -q -a quark -o stratum+tcp://quark.usa.nicehash.com:3345 -u xxxxxxxxxxx -p x
31 350 khash/s

LYRA2 (0.003 BTC / day atm)
ccminer.exe -i 18 -r 5 -R 10 --cpu-priority 5 -q -a lyra2 -o stratum+tcp://quark.usa.nicehash.com:3342 -u xxxxxxxxxxx -p x
3950 khash/s VS 2383 khash/s Release53

QUBIT (0.006 BTC / day atm)
ccminer.exe -i 21 -r 5 -R 10 --cpu-priority 5 -a Qubit -o stratum+tcp://qubit.usa.nicehash.com:3344 -u xxxxxxxxxxx -p x
25 500 khash/s

X11 (0.0087 BTC / day atm)
ccminer.exe -i 21 -r 5 -R 10 --cpu-priority 5 -o stratum+tcp://quark.usa.nicehash.com:3336 -u xxxxxxxxxxx -p x
16 450 khash/s

X13 (0.006135 BTC / day atm)
ccminer.exe -i 19 -r 5 -R 10 --cpu-priority 5 -o stratum+tcp://x13.usa.nicehash.com:3337 -u xxxxxxxxxxx -p x
15 600 khash/s

X15 (0.0071 BTC / day atm)
ccminer.exe -i 21 -r 5 -R 10 --cpu-priority 5 -o stratum+tcp://x15.usa.nicehash.com:3339 -u xxxxxxxxxxx -p x
15 200 khash/s

KECCAK (0.0024 BTC / day atm)
ccminer.exe -i 22.9 -r 5 -R 10 --cpu-priority 5 -q -a Keccak -o stratum+tcp://keccak.usa.nicehash.com:3338 -u xxxxxxxxxxx -p x
878 800 khash/s


Thanks SP, djm34 and all other who help and contribute !!
legendary
Activity: 1797
Merit: 1028
SP_ RELEASE dot 54--

SP_'s reduction of register use in Lyra2 appears to have reduced some of the memory requirement.    I have been able to increase my intensity setting from my old standard of "-i 16.5" to higher values and see hash rate improvement, but the setting varies per machine.  My initial results, all for Lyra2:

  GTX 750ti FTW - 1080-1100kh/s per card  (Linux)
  GTX 750ti SC - 1140-1150kh/s per card (Win 8 )
  GTX 960 2GB SSC - 1220-1240kh/s per card (Win 7)
  GTX 960 4GB FTW - 1220-1240kh/s per card (Win 8 )
  GTX 970 4GB FTW+ - 2Mh/s per card (Linux)

The windows machines allow for easy software overclocking.  I still need to learn the command line API flags for Linux, and probably need to re-install Linux with the latest drivers for proper use.

If I move to CUDA Toolkit 7.5, will the SP_ releases still compile on Linux?       --scryptr
legendary
Activity: 1400
Merit: 1050
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system



Surely there has to be a workaround. I mean the memory/swap doesn't even seem to be allocated let alone used, not even for a second.
Something like initializing the cards one after the other instead of all at the same time or something? Or giving the cards different jobs instead of working together on one big job? I have no idea but I'm sure there's a way.
if you open msi AB and watch both ram and pagefile graphics, you'll see it gets allocated (more on the pagefile than on the memory) so may-be trying to increase pagefile could work.
There isn't really a work around on the code side, global memory variables have to be allocated from the host and cudamalloc works in mysterious way...)
legendary
Activity: 1470
Merit: 1114
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system



Surely there has to be a workaround. I mean the memory/swap doesn't even seem to be allocated let alone used, not even for a second.
Something like initializing the cards one after the other instead of all at the same time or something? Or giving the cards different jobs instead of working together on one big job? I have no idea but I'm sure there's a way.

How many cards can you run? It's cheaper to add more RAM than splitting the cards into
multiple rigs.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system



Surely there has to be a workaround. I mean the memory/swap doesn't even seem to be allocated let alone used, not even for a second.
Something like initializing the cards one after the other instead of all at the same time or something? Or giving the cards different jobs instead of working together on one big job? I have no idea but I'm sure there's a way.
legendary
Activity: 1400
Merit: 1050
7x970 - Lyra2 won't start (out of memory)
4GB system memory
7x 970 ?!  Roll Eyes mobo are so expensive...
you need at least as much ram than vram here something like 28Gb to run that kind of system

sr. member
Activity: 271
Merit: 251
7x970 - Lyra2 won't start (out of memory)
4GB system memory
Jump to: