CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 898.

djeZo

hero member

Activity: 588

Merit: 520

Quote from: Grim on October 15, 2015, 04:22:31 PM

Quote from: Genoil on October 15, 2015, 03:15:58 PM

Quote from: Grim on October 15, 2015, 08:13:48 AM

Quote from: Genoil on October 15, 2015, 03:18:40 AM

Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
Grin

wow thats actually bad news because it seems memory heavy algos are the future for gpus

I was experiencing same kind of issue when I was making Axiom CUDA algo. Having 980 Ti, which packs 6 gig of memory, whenever I set algo to use more than about 2,5 gigs, there was a massive slow down, bus interface load jumped up, TDP jumped down. Since 980 Ti is my primary GPU, it constantly has mem load of about 400 mega even in idle time - and that would explain that actual mem cutoff is at around 2.1 gigs - same as other v2 maxwell cards.

I don't have account there to post, but measure bus interface load during these bottlenecks - maybe it can reveal another hint getting down (I used GPUZ for measuring bus interface load).

Bus interface load is - to my knowledge - how much PCIE bus gets loaded with data. And my algorithm implementation was sending very very little data over this bus - not something to load PCIE 3.0 16x so massively that it would show 30-50% of load. I could not explain, why bus load was so high, googling gave no results and I kinda gave up. But now that you revealed this slow down happening with other algorithms, other cards, I have my suspicion that these problems are related. My first idea would be; what if CUDA is automatically syncing GPU and CPU memory - as if some part of GPU memory was set to be in sync with CPU memory - this would explain massive bus load, as my algo was causing a lot of changes in this massive allocated buffer. I believe, CUDA even has a name for this - Unified memory. And to my knowledge, it is only active when you explicitly set so. What if it is active even in cases when you do not explicitly set so? Or maybe a bug in CUDA software - sending data over bus even though there is no need for synced memory space?

djeZo

hero member

Activity: 588

Merit: 520

Quote from: s7icky on October 15, 2015, 12:16:27 PM

Quote from: djeZo on October 15, 2015, 12:01:26 PM

What speeds do you get on GTX 980 Ti and GTX 950 Lyra2REv2?

I get
GTX 980 Ti ... 17.450 khs
GTX 950 ... 5.480 khs

clocks? OS? build?

around 1400mhs both cards, windows, latest SP... but I tweaked some params, originally I was getting 17khs on 980 Ti and 5khs on 950.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: sp_ on October 15, 2015, 01:41:57 AM

Quote from: rednoW on October 14, 2015, 10:00:47 PM

2 sp_
Latest github compile is 20-30khs slower then release 70 on lyra2v2. gtx750, win7x64, cuda6.5 x32 build.

yes, I tested last night. The latest is slower on the 750/750ti. I will fix it.

I have submitted 2 new checkins in the lyra2v2 algo. Can you please test?

Note that the default intensity is low at -X 5, probobly bether with -X 8.

on the 750ti -X 10 or 11

With the latest checkin the gtx 970 G1 windforce oc is doing 9525KHASH. on standard clock. 10+ with overclocking.
The asus strix 750ti is doing 4485KHASH with -X 11 on standard clocks and 5MHASH with overclocking.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: dominuspro on October 16, 2015, 12:24:42 AM

Quote from: rekphiv on October 15, 2015, 09:33:27 PM

Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.

I have the same random crashes on 1 of 2 machines. Both running windows7x64. The loop works as a workaround...

same happens to me on linux, so I assume it's a ccminer quark specific issue.

hashbrown9000

sr. member

Activity: 427

Merit: 250

Ok I found the switches in cudaminer help file:

Code:

--launch-config [-l] specify the kernel launch configuration per device.
This replaces autotune or heuristic selection. You can
pass the strings "auto" or just a kernel prefix like
F or K or T to autotune for a specific card generation
or a kernel prefix plus a lauch configuration like F28x8
if you know what kernel runs best (from a previous
autotune).

--lookup-gap [-L] values > 1 enable a tradeoff between memory
savings and extra computation effort, in order to
improve efficiency with high N-factor scrypt-jane
coins. Defaults to 1.

hashbrown9000

sr. member

Activity: 427

Merit: 250

yes i checked help file but i don't see these switches listed.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Code:

-d, --devices Comma separated list of CUDA devices to use. \n\
   Device IDs start counting from 0! Alternatively takes\n\
   string names of your cards like gtx780ti or gt640#2\n\
   (matching 2nd gt640 in the PC)\n\
  -i --intensity=N GPU intensity 8-31 (default: auto) \n\
   Decimals are allowed for fine tuning \n\
  -f, --diff Divide difficulty by this factor (std is 1) \n\
  -v, --vote=VOTE block reward vote (for HeavyCoin)\n\
  -m, --trust-pool trust the max block reward vote (maxvote) sent by the pool\n\
  -o, --url=URL URL of mining server\n\
  -O, --userpass=U:P username:password pair for mining server\n\
  -u, --user=USERNAME username for mining server\n\
  -p, --pass=PASSWORD password for mining server\n\
   --cert=FILE certificate for mining server using SSL\n\
  -x, --proxy=[PROTOCOL://]HOST[:PORT] connect through a proxy\n\
  -t, --threads=N number of miner threads (default: number of nVidia GPUs)\n\
  -g, --gputhreads=N number of threads per gpu (default: 1)\n\
  -r, --retries=N number of times to retry if a network call fails\n\
   (default: retry indefinitely)\n\
  -R, --retry-pause=N time to pause between retries, in seconds (default: 30)\n\
   --time-limit maximum time [s] to mine before exiting the program.\n\
  -T, --timeout=N network timeout, in seconds (default: 270)\n\
  -s, --scantime=N upper bound on time spent scanning current work when\n\
   long polling is unavailable, in seconds (default: 5)\n\
  -N, --statsavg number of samples used to display hashrate (default: 30)\n\
   --no-gbt disable getblocktemplate support (height check in solo)\n\
   --no-longpoll disable X-Long-Polling support\n\
   --no-stratum disable X-Stratum support\n\
  -q, --quiet disable per-thread hashmeter output\n\
   --no-color disable colored output\n\
  -D, --debug enable debug output\n\
  -P, --protocol-dump verbose dump of protocol-level activities\n\
   --cpu-affinity set process affinity to cpu core(s), mask 0x3 for cores 0 and 1\n\
   --cpu-priority set process priority (default: 0 idle, 2 normal to 5 highest)\n\
  -b, --api-bind IP/Port for the miner API (default: 127.0.0.1:4068)\n\
  -S, --syslog use system log for output messages\n\
  --syslog - prefix = ... allow to change syslog tool name\n\
   -B, --background run the miner in the background\n\
--benchmark run in offline benchmark mode\n\
   --cputest debug hashes from cpu algorithms\n\
  -c, --config=FILE load a JSON-format configuration file\n\
  -C, --cpu-mining Enable the cpu to aid the gpu. (warning: uses more power)\n\
  -V, --version display version information and exit\n\
  -h, --help display this help text and exit\n\
  -X, --XIntensity intensity GPU intensity(default: auto) \n\
   --broken-neo-wallet Use 84byte data for broken neoscrypt wallets.\n\
";

hashbrown9000

sr. member

Activity: 427

Merit: 250

Anyone know what the long-form entry is for these switches: "-L" , "-l" ? Upper and lowercase - "el". I run ccminer with a .conf file so I need to write out the switch names.

i.e.

Code:

-X

becomes

Code:

"XIntensity": 2,

Genoil

sr. member

Activity: 438

Merit: 250

Quote from: Grim on October 15, 2015, 04:22:31 PM

Quote from: Genoil on October 15, 2015, 03:15:58 PM

Quote from: Grim on October 15, 2015, 08:13:48 AM

Quote from: Genoil on October 15, 2015, 03:18:40 AM

Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
Grin

wow thats actually bad news because it seems memory heavy algos are the future for gpus

As this is a (proven) Windows driver issue, it's down to the driver guys to fix it. But something tells they aren't too hasty until the game devs start complaining

. Too bad GTX' cant be put in TCC mode..

dominuspro

full member

Activity: 201

Merit: 100

Quote from: rekphiv on October 15, 2015, 09:33:27 PM

Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.

I have the same random crashes on 1 of 2 machines. Both running windows7x64. The loop works as a workaround...

Slava_K

hero member

Activity: 677

Merit: 500

Latest commit speed degrade on lyra2rev2 750Ti 15kH, 980GTX 20kH.

rekphiv

newbie

Activity: 2

Merit: 0

Thanks, for some reason my Quark file did not have the loop setup. Appears to be working correctly after crash now, Will test tonight.

ldp5500

full member

Activity: 173

Merit: 100

Quote from: rekphiv on October 15, 2015, 09:33:27 PM

Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.

:loop

ccminer -a quark -o stratum+tcp:\\poolspecificaddress:port -u username.worker1 -p pswd

goto loop

ccminer -a quark -o stratum+tcp:\\poolspecificaddress:port -u username.worker1 -p pswd

pause

rekphiv

newbie

Activity: 2

Merit: 0

Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.

airwalker2662

member

Activity: 116

Merit: 10

sp_ , I have been using your miner for quite some time and check on updates daily. You have made my little mining hobby quite fun and I really respect what you do for the community. Here is some long overdue beer funds:

9f332fa25272960df3147fdc946eed6e11a025df8bff2197260263af8a7b4fd6

Thanks again, and I cant wait to see the presents in 71 Wink

scryptr

legendary

Activity: 1797

Merit: 1028

Quote from: albysprx on October 15, 2015, 04:38:30 PM

Quote from: sp_ on October 15, 2015, 02:40:19 PM

Quote from: albysprx on October 15, 2015, 02:04:19 PM

Quote from: sp_ on October 15, 2015, 12:52:16 PM

Is is a compute 3.5 card?

nvidia geforce gt 540m driver 358.50

This card is not supported..

check here. Compute 5.0 and up is supported. (Maxwell)

https://en.wikipedia.org/wiki/CUDA

your gpu is a compute 2.1 device

is there an updated gpu miner for olds cuda graphics card?

KBOMBA--

KBomba wrote the last CCminer version that supported compute 2.1. Look at GitHub.com/Kbomba for release v1.02. --scryptr

cinnamon_carter

legendary

Activity: 1148

Merit: 1018

It's about time -- All merrit accepted !!!

As ram becomes less expensive memory intensive algos are likely to be more popular however from a foundation of cryptography & mathematics the security is not necessarly better just because something needs more ram.

antantti

legendary

Activity: 1176

Merit: 1015

Skål!

542ad358908973837664d71eb04bc134fd51cf33593c7b9811c42dfdd1bd2d89

albysprx

sr. member

Activity: 247

Merit: 250

Quote from: sp_ on October 15, 2015, 02:40:19 PM

Quote from: albysprx on October 15, 2015, 02:04:19 PM

Quote from: sp_ on October 15, 2015, 12:52:16 PM

Is is a compute 3.5 card?

nvidia geforce gt 540m driver 358.50

This card is not supported..

check here. Compute 5.0 and up is supported. (Maxwell)

https://en.wikipedia.org/wiki/CUDA

your gpu is a compute 2.1 device

is there an updated gpu miner for olds cuda graphics card?

Grim

sr. member

Activity: 506

Merit: 252

Quote from: Genoil on October 15, 2015, 03:15:58 PM

Quote from: Grim on October 15, 2015, 08:13:48 AM

Quote from: Genoil on October 15, 2015, 03:18:40 AM

Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
Grin

wow thats actually bad news because it seems memory heavy algos are the future for gpus

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 898. (Read 2347659 times)