Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 898. (Read 2347659 times)

hero member
Activity: 588
Merit: 520
Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
 Grin


wow thats actually bad news because it seems memory heavy algos are the future for gpus

I was experiencing same kind of issue when I was making Axiom CUDA algo. Having 980 Ti, which packs 6 gig of memory, whenever I set algo to use more than about 2,5 gigs, there was a massive slow down, bus interface load jumped up, TDP jumped down. Since 980 Ti is my primary GPU, it constantly has mem load of about 400 mega even in idle time - and that would explain that actual mem cutoff is at around 2.1 gigs - same as other v2 maxwell cards.

I don't have account there to post, but measure bus interface load during these bottlenecks - maybe it can reveal another hint getting down (I used GPUZ for measuring bus interface load).

Bus interface load is - to my knowledge - how much PCIE bus gets loaded with data. And my algorithm implementation was sending very very little data over this bus - not something to load PCIE 3.0 16x so massively that it would show 30-50% of load. I could not explain, why bus load was so high, googling gave no results and I kinda gave up. But now that you revealed this slow down happening with other algorithms, other cards, I have my suspicion that these problems are related. My first idea would be; what if CUDA is automatically syncing GPU and CPU memory - as if some part of GPU memory was set to be in sync with CPU memory - this would explain massive bus load, as my algo was causing a lot of changes in this massive allocated buffer. I believe, CUDA even has a name for this - Unified memory. And to my knowledge, it is only active when you explicitly set so. What if it is active even in cases when you do not explicitly set so? Or maybe a bug in CUDA software - sending data over bus even though there is no need for synced memory space?
hero member
Activity: 588
Merit: 520
What speeds do you get on GTX 980 Ti and GTX 950 Lyra2REv2?

I get
GTX 980 Ti ... 17.450 khs
GTX 950 ... 5.480 khs

clocks? OS? build?


around 1400mhs both cards, windows, latest SP... but I tweaked some params, originally I was getting 17khs on 980 Ti and 5khs on 950.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
2 sp_
Latest github compile is 20-30khs slower then release 70 on lyra2v2. gtx750, win7x64, cuda6.5 x32 build.
yes, I tested last night. The latest is slower on the 750/750ti. I will fix it.

I have submitted 2 new checkins in the lyra2v2 algo. Can you please test?

Note that the default intensity is low at -X 5, probobly bether with -X 8.

on the 750ti -X 10 or 11

With the latest checkin the gtx 970 G1 windforce oc is doing 9525KHASH. on standard clock. 10+ with overclocking.
The asus strix 750ti is doing 4485KHASH with -X 11 on standard clocks and 5MHASH with overclocking.

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.



I have the same random crashes on 1 of 2 machines. Both running windows7x64. The loop works as a workaround...

same happens to me on linux, so I assume it's a ccminer quark specific issue.
sr. member
Activity: 427
Merit: 250
Ok I found the switches in cudaminer help file:

Code:
--launch-config  [-l] specify the kernel launch configuration per device.
                 This replaces autotune or heuristic selection. You can
                 pass the strings "auto" or just a kernel prefix like
                 F or K or T to autotune for a specific card generation
                 or a kernel prefix plus a lauch configuration like F28x8
                 if you know what kernel runs best (from a previous
                 autotune).

--lookup-gap     [-L] values > 1 enable a tradeoff between memory
                 savings and extra computation effort, in order to
                 improve efficiency with high N-factor scrypt-jane
                 coins. Defaults to 1.
sr. member
Activity: 427
Merit: 250
yes i checked help file but i don't see these switches listed.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Code:
 -d, --devices         Comma separated list of CUDA devices to use. \n\
                        Device IDs start counting from 0! Alternatively takes\n\
                        string names of your cards like gtx780ti or gt640#2\n\
                        (matching 2nd gt640 in the PC)\n\
  -i  --intensity=N     GPU intensity 8-31 (default: auto) \n\
                        Decimals are allowed for fine tuning \n\
  -f, --diff            Divide difficulty by this factor (std is 1) \n\
  -v, --vote=VOTE       block reward vote (for HeavyCoin)\n\
  -m, --trust-pool      trust the max block reward vote (maxvote) sent by the pool\n\
  -o, --url=URL         URL of mining server\n\
  -O, --userpass=U:P    username:password pair for mining server\n\
  -u, --user=USERNAME   username for mining server\n\
  -p, --pass=PASSWORD   password for mining server\n\
      --cert=FILE       certificate for mining server using SSL\n\
  -x, --proxy=[PROTOCOL://]HOST[:PORT]  connect through a proxy\n\
  -t, --threads=N       number of miner threads (default: number of nVidia GPUs)\n\
  -g, --gputhreads=N    number of threads per gpu (default: 1)\n\
  -r, --retries=N       number of times to retry if a network call fails\n\
                          (default: retry indefinitely)\n\
  -R, --retry-pause=N   time to pause between retries, in seconds (default: 30)\n\
      --time-limit      maximum time [s] to mine before exiting the program.\n\
  -T, --timeout=N       network timeout, in seconds (default: 270)\n\
  -s, --scantime=N      upper bound on time spent scanning current work when\n\
                          long polling is unavailable, in seconds (default: 5)\n\
  -N, --statsavg        number of samples used to display hashrate (default: 30)\n\
      --no-gbt          disable getblocktemplate support (height check in solo)\n\
      --no-longpoll     disable X-Long-Polling support\n\
      --no-stratum      disable X-Stratum support\n\
  -q, --quiet           disable per-thread hashmeter output\n\
      --no-color        disable colored output\n\
  -D, --debug           enable debug output\n\
  -P, --protocol-dump   verbose dump of protocol-level activities\n\
      --cpu-affinity    set process affinity to cpu core(s), mask 0x3 for cores 0 and 1\n\
      --cpu-priority    set process priority (default: 0 idle, 2 normal to 5 highest)\n\
  -b, --api-bind        IP/Port for the miner API (default: 127.0.0.1:4068)\n\
  -S, --syslog          use system log for output messages\n\
  --syslog - prefix = ... allow to change syslog tool name\n\
   -B, --background      run the miner in the background\n\
--benchmark       run in offline benchmark mode\n\
      --cputest         debug hashes from cpu algorithms\n\
  -c, --config=FILE     load a JSON-format configuration file\n\
  -C, --cpu-mining Enable the cpu to aid the gpu. (warning: uses more power)\n\
  -V, --version         display version information and exit\n\
  -h, --help            display this help text and exit\n\
  -X,  --XIntensity     intensity GPU intensity(default: auto) \n\
      --broken-neo-wallet Use 84byte data for broken neoscrypt wallets.\n\
";
sr. member
Activity: 427
Merit: 250
Anyone know what the long-form entry is for these switches: "-L" , "-l" ? Upper and lowercase - "el".  I run ccminer with a .conf file so I need to write out the switch names.

i.e.

Code:
-X

becomes

Code:
"XIntensity": 2,
sr. member
Activity: 438
Merit: 250
Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
 Grin


wow thats actually bad news because it seems memory heavy algos are the future for gpus

As this is a (proven) Windows driver issue, it's down to the driver guys to fix it. But something tells they aren't too hasty until the game devs start complaining Smiley. Too bad GTX' cant be put in TCC mode..
full member
Activity: 201
Merit: 100
Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.



I have the same random crashes on 1 of 2 machines. Both running windows7x64. The loop works as a workaround...
hero member
Activity: 677
Merit: 500
Latest commit speed degrade on lyra2rev2 750Ti 15kH, 980GTX 20kH.
newbie
Activity: 2
Merit: 0
Thanks, for some reason my Quark file did not have the loop setup. Appears to be working correctly after crash now, Will test tonight.
full member
Activity: 173
Merit: 100
Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.



:loop

ccminer -a quark -o stratum+tcp:\\poolspecificaddress:port -u username.worker1 -p pswd

goto loop

ccminer -a quark -o stratum+tcp:\\poolspecificaddress:port -u username.worker1 -p pswd

pause
newbie
Activity: 2
Merit: 0
Is anyone else having issues with random closes while mining quark with the latest .70?
I have 2 different gtx 970 rigs, windows 10. Both computers will crash within the hour mining at completely stock clocks.
Can someone post a quark bat file so I can see how yours are setup.

TYIA.

member
Activity: 116
Merit: 10
sp_ , I have been using your miner for quite some time and check on updates daily. You have made my little mining hobby quite fun and I really respect what you do for the community. Here is some long overdue beer funds:

9f332fa25272960df3147fdc946eed6e11a025df8bff2197260263af8a7b4fd6

Thanks again, and I cant wait to see the presents in 71 Wink

legendary
Activity: 1797
Merit: 1028
Is is a compute 3.5 card?
nvidia geforce gt 540m driver 358.50

This card is not supported..

check here. Compute 5.0 and up is supported. (Maxwell)

https://en.wikipedia.org/wiki/CUDA

your gpu is a compute 2.1 device
is there an updated gpu miner for olds cuda graphics card?

KBOMBA--

KBomba wrote the last CCminer version that supported compute 2.1.  Look at GitHub.com/Kbomba for release v1.02.       --scryptr
legendary
Activity: 1148
Merit: 1018
It's about time -- All merrit accepted !!!
As ram becomes less expensive memory intensive algos are likely to be more popular however from a foundation of cryptography & mathematics the security is not necessarly better just because something needs more ram.
legendary
Activity: 1176
Merit: 1015
Skål!

542ad358908973837664d71eb04bc134fd51cf33593c7b9811c42dfdd1bd2d89
sr. member
Activity: 247
Merit: 250
Is is a compute 3.5 card?
nvidia geforce gt 540m driver 358.50

This card is not supported..

check here. Compute 5.0 and up is supported. (Maxwell)

https://en.wikipedia.org/wiki/CUDA

your gpu is a compute 2.1 device
is there an updated gpu miner for olds cuda graphics card?
sr. member
Activity: 506
Merit: 252
Apparently with this n-factor, the dataset grows over 1GB...well you get the idea.

Uhm it seems the the 970 suffers from the same slowdown at above ~2.1 GB.
At about double the point from a 750ti (~1.05GB) ...

devtalk guru's agree:
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4696955/#4696955
 Grin


wow thats actually bad news because it seems memory heavy algos are the future for gpus
Jump to: