Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1013. (Read 2347664 times)

Quote from: scryptr on August 07, 2015, 04:28:50 PM

Yesterday I submitted a tiny speedup in x11. 2-5 KHASH on the 750ti. (simd) I think there are some more easy pickings here. might try to find them this weekend.

Don't take this the wrong way, sp_, as I think you do really good work - but you might want to establish a margin of error. I wouldn't be surprised if you reverted it later with a commit message saying it seems to have no effect - but it sure looked like it at 3AM after having been coding for 12h or so. Tongue

I've totally done that - and assuming that the max difference caused was 5Kh/s, and the usual 750Ti speed on X11 is 3Mh/s with your mods (do correct me if I'm mistaken; while I pay attention to the developments, the exact numbers I no longer watch as closely since I stopped working with CUDA myself), the percentage difference on that would be 0.000166%. My personal opinion on a VERY lax margin of error would probably be at least around 0.05% - 0.10%.

EVERY LITTLE BIT COUNTS--

The sum of many small changes will creep over the statistical barrier. --scryptr

Correct - assuming it's not something you've hallucinated, or the result of a single clock tick propagating across a certain circuit in the GPU somewhat faster right when you check hash, or some other freak accident...

CODE REVISION--
Thanks for the nod. If code is reduced to fewer instructions and loops, with precalculations and assembly code in place of bulkier, higher level code, it will run faster. A shorter path to the same end.
Lately, I don't compile every increment submitted, but have seen progress mining Quark on my cards over the last 2 months. It trends...

--scryptr

Excactly. By removing a few assembly instructions, the speed increase might be too small to measure. but if you do it 200 times, the speed increase will be visible..

The last simd improvements should be fixed by the compiler but it wasn't

You have a buffer that is set to zero:

Code:

Somebuffer[x] = 0;

Further down you have a loop that use this buffer to multiply and does some other stuff.

Somebuffer[x] = Somebuffer[x] * constmemory ^ blablabla

In my changeset I just removed some of the instructions that worked on zero values, and the gain was around 40 assembly instructions..

lawrencelyl

member

Activity: 94

Merit: 10

Once a while I will get the following error message after running ccminer for a while:
Cuda error in func 'cuda_check_cpu_setTarget' at line 28 : the launch timed out and was terminated.

Anyone have any idea what might be the issue? Thank you.

scryptr

legendary

Activity: 1797

Merit: 1028

Quote from: scryptr on August 07, 2015, 04:28:50 PM

Quote from: dominuspro on August 07, 2015, 04:31:24 AM

Yesterday I submitted a tiny speedup in x11. 2-5 KHASH on the 750ti. (simd) I think there are some more easy pickings here. might try to find them this weekend.

Don't take this the wrong way, sp_, as I think you do really good work - but you might want to establish a margin of error. I wouldn't be surprised if you reverted it later with a commit message saying it seems to have no effect - but it sure looked like it at 3AM after having been coding for 12h or so. Tongue

I've totally done that - and assuming that the max difference caused was 5Kh/s, and the usual 750Ti speed on X11 is 3Mh/s with your mods (do correct me if I'm mistaken; while I pay attention to the developments, the exact numbers I no longer watch as closely since I stopped working with CUDA myself), the percentage difference on that would be 0.000166%. My personal opinion on a VERY lax margin of error would probably be at least around 0.05% - 0.10%.

EVERY LITTLE BIT COUNTS--

The sum of many small changes will creep over the statistical barrier. --scryptr

Correct - assuming it's not something you've hallucinated, or the result of a single clock tick propagating across a certain circuit in the GPU somewhat faster right when you check hash, or some other freak accident...

CODE REVISION--

Thanks for the nod. If code is reduced to fewer instructions and loops, with precalculations and assembly code in place of bulkier, higher level code, it will run faster. A shorter path to the same end.

Lately, I don't compile every increment submitted, but have seen progress mining Quark on my cards over the last 2 months. It trends...

--scryptr

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: djm34 on August 07, 2015, 03:31:49 AM

Quote from: bathrobehero on August 06, 2015, 10:37:09 PM

Quote from: scryptr on August 06, 2015, 09:41:29 PM

RESTORING DRIVER--

I use PrecisionX 16 to restore the driver without rebooting the system. These are my steps for a card on Windows that has low hash (crashed driver):

   1) Start or re-open PrecisionX 16.
   2) Turn off K-Boost with the toggle switch (upper right-and corner).
   3) Turn on K-Boost with the same toggle switch.
   4) Re-select the boost profile that you prefer (important!).
   5) Verify that the fan profile and boost settings are again in place, and that temperature is appropriate.
   6) Close or minimize PrecisionX 16.
   7) Restart miner, it should again have appropriate hash readings for boost/overclock settings.

You may also have to open nVidia control panel and reset the display resolution if your graphics now "look odd". It isn't every time, but frequently I have to reset display resolution for normal graphics. This is for my work computer, Win 7 X64, with a GTX 960 that I mine with when not playing games. The GTX 960 will get 10.6Mh/s on Quark, but if it crashes with a segmentation fault, it will only get 3Mh/s on miner restart. I then perform the steps above and restart the miner.

There is a memory leak somewhere, but I was suspecting poorly programmed flash-media websites, like my local news site. I need to reboot about once a day because of increasing memory bloat. --scryptr

I'm using an ancient ccminer so I'm not sure about the issue but it does sound like a simple soft crash to me when the card reverts back to lower P state with 405 Mhz. That is how my cards crash if I have too high OC on them or set too high intensity or accidentaly mine on the same card with 2 instances. Memory leak would imply a leak of some sort causing excess memory usage and/or slow performance degradation over time.

I hadn't checked the p state when my card degrades but I think you're right about that. I don't know anything about
a "soft crash". When I set too high intensity ccminer errors out with an out of memory error. When I start two instances
on the same card they each hash at lower rates but the card doesn't crash and never gets stuck in a degraded state. I have also seen
driver crashes due to too high OC where I lose the display for a few seconds. If the degradation is the result of some sort of soft crash
or exception why leave the gpu degraded? Why not reset automatically like it does for a hard crash? (rhetorical questions,
I don't expect an answer)

and how do you do that exactly ?

I have a different solution if the driver crashes and sets the gpu to some lower state. I just go into the device manager and disable the problematic gpu and re-enable it.
It restores the default gpu state. Then You just have to click the profile in the MSI afterburner and You have it back.

I had another degradation on my Fedora20/GTX970 mining neoscrypt and took some notes.

- The performance level was still at 2 and the GPU clock was still around 1500 (+120 OC)
- gpu utilization was 100%
- mem utilization 69%, normally 41% (of 4 GB)
- hash rate was 61K, normally 540K

This eliminates a memory leak, at least to the point of mem exhaustion.
The GPU was pegged but only hashing at 1/9 normal, interesting that it is
almost an exact multiple.

Was it only using 1/9 of the cuda cores? If so what were the other cores doing?
Maybe a runaway process, a crash seems unlikely.

Could there be a problem with the way cuda processes are killed when the host process
dies leaving an orphan still running?

nvidia-smi has a reset command and a process monitoring command but I'm not sure
if they work on lowly geforce cards. I'll try that next time.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: pallas on August 07, 2015, 09:58:21 AM

Yesterday I submitted a tiny speedup in x11. 2-5 KHASH on the 750ti. (simd) I think there are some more easy pickings here. might try to find them this weekend.

Don't take this the wrong way, sp_, as I think you do really good work - but you might want to establish a margin of error. I wouldn't be surprised if you reverted it later with a commit message saying it seems to have no effect - but it sure looked like it at 3AM after having been coding for 12h or so. Tongue

I've totally done that - and assuming that the max difference caused was 5Kh/s, and the usual 750Ti speed on X11 is 3Mh/s with your mods (do correct me if I'm mistaken; while I pay attention to the developments, the exact numbers I no longer watch as closely since I stopped working with CUDA myself), the percentage difference on that would be 0.000166%. My personal opinion on a VERY lax margin of error would probably be at least around 0.05% - 0.10%.

+100%

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: joblo on August 07, 2015, 09:53:53 AM

In the continuing story of cuda 6.5 vs cuda 7 I tried using the Fedora 20 version of cuda 6.5
on Fedora 22 but it didn't work. The problem is with the gnu compiler on f22. The cuda library
headers fail to compile producing lots of syntax errors. I gave up.

It looks like the best option for Linux users is to use an LTS release like Centos 6 to have a supported
OS that also supports cuda 6.5.

on ubuntu you can have more than one gcc version installed, I'm sure you can do the same on fedora.
that's what I do to compile with cuda 6.5 which requires gcc 4.7, while the system's default is 4.9

actually it is also possible to compile cuda with "--override" to use whatever gcc version there is on the system
(always had bad experience when trying to install different version of gcc)

scryptr

legendary

Activity: 1797

Merit: 1028

Yesterday I submitted a tiny speedup in x11. 2-5 KHASH on the 750ti. (simd) I think there are some more easy pickings here. might try to find them this weekend.

Don't take this the wrong way, sp_, as I think you do really good work - but you might want to establish a margin of error. I wouldn't be surprised if you reverted it later with a commit message saying it seems to have no effect - but it sure looked like it at 3AM after having been coding for 12h or so. Tongue

I've totally done that - and assuming that the max difference caused was 5Kh/s, and the usual 750Ti speed on X11 is 3Mh/s with your mods (do correct me if I'm mistaken; while I pay attention to the developments, the exact numbers I no longer watch as closely since I stopped working with CUDA myself), the percentage difference on that would be 0.000166%. My personal opinion on a VERY lax margin of error would probably be at least around 0.05% - 0.10%.

EVERY LITTLE BIT COUNTS--

The sum of many small changes will creep over the statistical barrier. --scryptr

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: joblo on August 07, 2015, 07:29:24 AM

Just to be clear i wasn't expecting ccminer to do it, just lamenting the poor handling of the fault by nvidia.

That's not poorly handling a fault - now AMD... they know how to poorly handle a fault. Not only do cards have degraded hash after a driver crash, quite often I have seen the entire OpenCL portion of shit in the driver just refuse to service calls. I mean, SGMiner won't work, CodeXL can't compile OpenCL for analysis... the calls just hang. They will do this until reboot - the instance of SGMiner responsible for the driver crash simply hangs as well. And I mean HANGS - kill -9 does fuck all for it. Grin

LOL. I guess I'm just used to more sophistiated fault handling that includes fault detection, identification,
isolation, mitigation, recovery, post-analysis and reporting. But then again I'm not talking about consumer
grade systems but a fully redundant mission critical (non military) control system. I need to reset my expectations.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: pallas on August 07, 2015, 09:58:21 AM

Quote from: joblo on August 07, 2015, 09:53:53 AM

In the continuing story of cuda 6.5 vs cuda 7 I tried using the Fedora 20 version of cuda 6.5
on Fedora 22 but it didn't work. The problem is with the gnu compiler on f22. The cuda library
headers fail to compile producing lots of syntax errors. I gave up.

It looks like the best option for Linux users is to use an LTS release like Centos 6 to have a supported
OS that also supports cuda 6.5.

on ubuntu you can have more than one gcc version installed, I'm sure you can do the same on fedora.
that's what I do to compile with cuda 6.5 which requires gcc 4.7, while the system's default is 4.9

My only choices in the fedora repo are 4.9 and 5.1. I'ts not urgent for me as i can live with a stale
Fedora 20 for a while. Hopefully by then the cuda 7, 7.5 issues will be solved.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: joblo on August 07, 2015, 09:53:53 AM

In the continuing story of cuda 6.5 vs cuda 7 I tried using the Fedora 20 version of cuda 6.5
on Fedora 22 but it didn't work. The problem is with the gnu compiler on f22. The cuda library
headers fail to compile producing lots of syntax errors. I gave up.

It looks like the best option for Linux users is to use an LTS release like Centos 6 to have a supported
OS that also supports cuda 6.5.

on ubuntu you can have more than one gcc version installed, I'm sure you can do the same on fedora.
that's what I do to compile with cuda 6.5 which requires gcc 4.7, while the system's default is 4.9

joblo

legendary

Activity: 1470

Merit: 1114

In the continuing story of cuda 6.5 vs cuda 7 I tried using the Fedora 20 version of cuda 6.5
on Fedora 22 but it didn't work. The problem is with the gnu compiler on f22. The cuda library
headers fail to compile producing lots of syntax errors. I gave up.

It looks like the best option for Linux users is to use an LTS release like Centos 6 to have a supported
OS that also supports cuda 6.5.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: djm34 on August 07, 2015, 03:31:49 AM

Quote from: bathrobehero on August 06, 2015, 10:37:09 PM

Quote from: scryptr on August 06, 2015, 09:41:29 PM

RESTORING DRIVER--

I use PrecisionX 16 to restore the driver without rebooting the system. These are my steps for a card on Windows that has low hash (crashed driver):

1) Start or re-open PrecisionX 16.
2) Turn off K-Boost with the toggle switch (upper right-and corner).
3) Turn on K-Boost with the same toggle switch.
4) Re-select the boost profile that you prefer (important!).
5) Verify that the fan profile and boost settings are again in place, and that temperature is appropriate.
6) Close or minimize PrecisionX 16.
7) Restart miner, it should again have appropriate hash readings for boost/overclock settings.

You may also have to open nVidia control panel and reset the display resolution if your graphics now "look odd". It isn't every time, but frequently I have to reset display resolution for normal graphics. This is for my work computer, Win 7 X64, with a GTX 960 that I mine with when not playing games. The GTX 960 will get 10.6Mh/s on Quark, but if it crashes with a segmentation fault, it will only get 3Mh/s on miner restart. I then perform the steps above and restart the miner.

There is a memory leak somewhere, but I was suspecting poorly programmed flash-media websites, like my local news site. I need to reboot about once a day because of increasing memory bloat. --scryptr

I'm using an ancient ccminer so I'm not sure about the issue but it does sound like a simple soft crash to me when the card reverts back to lower P state with 405 Mhz. That is how my cards crash if I have too high OC on them or set too high intensity or accidentaly mine on the same card with 2 instances. Memory leak would imply a leak of some sort causing excess memory usage and/or slow performance degradation over time.

I hadn't checked the p state when my card degrades but I think you're right about that. I don't know anything about
a "soft crash". When I set too high intensity ccminer errors out with an out of memory error. When I start two instances
on the same card they each hash at lower rates but the card doesn't crash and never gets stuck in a degraded state. I have also seen
driver crashes due to too high OC where I lose the display for a few seconds. If the degradation is the result of some sort of soft crash
or exception why leave the gpu degraded? Why not reset automatically like it does for a hard crash? (rhetorical questions,
I don't expect an answer)

and how do you do that exactly ?

Exactly, I don't know. But I would assume the same way the card recovers automatically from a hard crash.
Whatever triggers the card to go into a lower performance state when under heavy load could also trigger the
reset. Just to be clear i wasn't expecting ccminer to do it, just lamenting the poor handling of the fault by nvidia.

dominuspro

full member

Activity: 201

Merit: 100

Quote from: djm34 on August 07, 2015, 03:31:49 AM

Quote from: bathrobehero on August 06, 2015, 10:37:09 PM

Quote from: scryptr on August 06, 2015, 09:41:29 PM

RESTORING DRIVER--

I use PrecisionX 16 to restore the driver without rebooting the system. These are my steps for a card on Windows that has low hash (crashed driver):

1) Start or re-open PrecisionX 16.
2) Turn off K-Boost with the toggle switch (upper right-and corner).
3) Turn on K-Boost with the same toggle switch.
4) Re-select the boost profile that you prefer (important!).
5) Verify that the fan profile and boost settings are again in place, and that temperature is appropriate.
6) Close or minimize PrecisionX 16.
7) Restart miner, it should again have appropriate hash readings for boost/overclock settings.

You may also have to open nVidia control panel and reset the display resolution if your graphics now "look odd". It isn't every time, but frequently I have to reset display resolution for normal graphics. This is for my work computer, Win 7 X64, with a GTX 960 that I mine with when not playing games. The GTX 960 will get 10.6Mh/s on Quark, but if it crashes with a segmentation fault, it will only get 3Mh/s on miner restart. I then perform the steps above and restart the miner.

There is a memory leak somewhere, but I was suspecting poorly programmed flash-media websites, like my local news site. I need to reboot about once a day because of increasing memory bloat. --scryptr

I'm using an ancient ccminer so I'm not sure about the issue but it does sound like a simple soft crash to me when the card reverts back to lower P state with 405 Mhz. That is how my cards crash if I have too high OC on them or set too high intensity or accidentaly mine on the same card with 2 instances. Memory leak would imply a leak of some sort causing excess memory usage and/or slow performance degradation over time.

I hadn't checked the p state when my card degrades but I think you're right about that. I don't know anything about
a "soft crash". When I set too high intensity ccminer errors out with an out of memory error. When I start two instances
on the same card they each hash at lower rates but the card doesn't crash and never gets stuck in a degraded state. I have also seen
driver crashes due to too high OC where I lose the display for a few seconds. If the degradation is the result of some sort of soft crash
or exception why leave the gpu degraded? Why not reset automatically like it does for a hard crash? (rhetorical questions,
I don't expect an answer)

and how do you do that exactly ?

I have a different solution if the driver crashes and sets the gpu to some lower state. I just go into the device manager and disable the problematic gpu and re-enable it.
It restores the default gpu state. Then You just have to click the profile in the MSI afterburner and You have it back.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Yesterday I submitted a tiny speedup in x11. 2-5 KHASH on the 750ti. (simd) I think there are some more easy pickings here. might try to find them this weekend.

djm34

legendary

Activity: 1400

Merit: 1050