[OS] nvOC easy-to-use Linux Nvidia Mining - page 28.

infowire

newbie

Activity: 96

Merit: 0

Quote from: TAKCuCT on May 14, 2018, 10:03:58 PM

Quote from: infowire on May 14, 2018, 05:48:23 PM

I don't know if they are related errors:

I have been messing with it for 2 days, installed the OS again... nothing

Did you try to change drivers? Try from 384.

No i haven't, current drivers are 387.34. So i should go back? All my rigs are 387.34, but this one have different cards in it.
Just download the older drivers and run the install ? or do i have to do something else ? Anyone have upgraded 390.48? https://www.geforce.com/drivers/results/132530

------------------------------------- EDIT

So it was one card that caused the NVML Error "NVML: cannot get fan speed, error 15" i removed it and everything seems fine. I plugged in the culprit card on windows and its fine.... for now. Yes i tried switching risers before that.

So to summarize my ordeal.

• Bad SSD - Slow running miner.
• A card that gave me an NVML error. EVGA 1060 Don't know if it was driver's fault or what.

TAKCuCT

newbie

Activity: 25

Merit: 0

Quote from: infowire on May 14, 2018, 05:48:23 PM

I don't know if they are related errors:

I have been messing with it for 2 days, installed the OS again... nothing

Did you try to change drivers? Try from 384.

infowire

newbie

Activity: 96

Merit: 0

I don't know if they are related errors:

Code:

Calymore log:
15:01:25:998 f1bb9700 NVML: cannot get current temperature, error 15
15:01:25:998 f1bb9700 NVML: cannot get fan speed, error 15

Watchdog:
Mon May 14 15:01:23 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0A:00.0: GPU is lost. Reboot the system to recover this GPU

MINER_TEMP_CONTROL

ERROR: Error assigning value 50 to attribute 'GPUTargetFanSpeed'
   (m1-desktop:0[fan:9]) as specified in assignment
   '[fan:9]/GPUTargetFanSpeed=50' (Unknown Error).

ERROR: Error assigning value 50 to attribute 'GPUTargetFanSpeed'
   (m1-desktop:0[fan:10]) as specified in assignment
   '[fan:10]/GPUTargetFanSpeed=50' (Unknown Error).

I did the default xorg. Nothing.

My 9th and 10th card's do not have a fan speed control in the nvidia panel.
Also only x0 - x8 screens too.

https://preview.ibb.co/kKZaGJ/1.png

This is how it's supposed to be. All the cards 0-8 have a fan speed control bar.
https://preview.ibb.co/cNwBqd/2.png

I have been messing with it for 2 days, installed the OS again... nothing

urnzwy

newbie

Activity: 44

Merit: 0

Well after much frustration with my rigs crashing, I thought I had it figured out but the problem continued.

Originally I had setup SMOS but really didn't care for the simplicity of the interface or should I say lack there of and found nvOC.

As my rigs are remote in another state, it was even more difficult to diagnose.

For ease, I had my brother-in-law load up SMOS on both rigs and they have been running for over 4 days with under-volting (100W 1070 & 1070 Ti / 175W 1080 & 1080 Ti) and overclocking (-50 core 1070/1070 Ti, 200 core 1080 & 1080 Ti with 1100 Mem for the 1070/1070 Ti and 1000 for the 1080 & 1080 Ti). Mining ETH.

Whatever my issue was, it was an issue within nvOC. The question is why and what was causing the crashing issue.

Later this week, I plan on loading up a fresh copy of nvOC and see if the issue continues. Unfortunately I cleared the USB sticks we were using that had the nvOC OS on them. Had I though about it, it would have been good to have them to see if anyone could identify any issues. The only thing I really did was re-compile the miners.

It is difficult to pin-point when the issue really showed up as we were having to re-set the miners at least once a week and then eventually even would crash once daily. Both at the same time even on different algorithms. I tried multiple different mining programs, no over clocks, half overclock, full overclock. Nothing seemed to matter. 18-24 hours the crash was inevitable.

I wish I had more information on this, but wanted to get it into the forum in-case this ever happened to anyone else. I will update once I re-load up nvOC and see what happens.

chem1990

newbie

Activity: 12

Merit: 0

That lost temperature is almost evrytime too much OC on GPU. try lowering values. If persists than change riser...

infowire

newbie

Activity: 96

Merit: 0

Quote from: papampi on May 12, 2018, 12:47:45 AM

Quote from: infowire on May 11, 2018, 08:36:04 PM

Quote from: papampi on May 11, 2018, 03:19:25 AM

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
'[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
'[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.

That error is from tempcontrol not watcdog,

Its weird, I had this error once when I added some cards to a rig, but restoring default xorg solved it.
May be try to re-install the image

Edit:
Are you using salfter switcher?

no i am not.

I got some other error after all night running good

Code:

Sun May 13 00:30:08 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:08:00.0: GPU is lost. Reboot the system to recover this GPU

but does not reboot since i don't have a reboot file or maybe it freezes.

EDIT: the os freezes and restarts the gui kinda slow... is that a sign of a bad card ?

I think the ssd went bad. Doesn't seem to write in hdd raw copy... even tho windows drive management says its healthy

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: infowire on May 11, 2018, 08:36:04 PM

Quote from: papampi on May 11, 2018, 03:19:25 AM

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
'[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
'[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.

That error is from tempcontrol not watcdog,

Its weird, I had this error once when I added some cards to a rig, but restoring default xorg solved it.
May be try to re-install the image

Edit:
Are you using salfter switcher?

infowire

newbie

Activity: 96

Merit: 0

Quote from: papampi on May 11, 2018, 03:19:25 AM

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
'[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
'[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: martyroz on May 11, 2018, 08:33:34 AM

What's the current state of this community release?
Is there support for X16R algo? (raven)

Add to 0miner and 1bash:

0miner:

Code:

if [ $COIN == "RVN" ]
  then
   HCD='/home/m1/z-enemy/z-enemy_miner'
   ADDR="$RVN_ADDRESS.$RVN_WORKER"
   screen -dmSL miner $HCD -a x16r -o stratum+tcp://$RVN_POOL:$RVN_PORT -u $ADDR -p $MINER_PWD -i $RVN_INTENSITY
  fi

1bash:

Code:

RVN_WORKER="$WORKERNAME"
RVN_ADDRESS="Account name or RVN_address"
RVN_POOL="pool address without startum+tcp:// "
RVN_PORT="pool port"
RVN_INTENSITY="19"

Run:

Code:

mkdir -p /home/m1/z-enemy/
wget -O- https://raw.githubusercontent.com/papampi/nvOC_miners/master/z-enemy/z-enemy-1.09a-cuda80.tar.gz | tar -xzC /home/m1/z-enemy/ --strip 1
chmod a+x /home/m1/z-enemy/z-enemy_miner

Change coin in 1bash, and restart miner with:

Code:

pkill -e miner

watchdog will catch no miner running and will restart it

martyroz

full member

Activity: 325

Merit: 110

What's the current state of this community release?
Is there support for X16R algo? (raven)

infowire

newbie

Activity: 96

Merit: 0

[/quote]
Those errors are usually xorg problem, did you restore nvOC default xorg as I posted?

Threshold is the minimum GPU utilization before watchdog triggers
There is no option in nvoc to disable a GPU yet
But you can raise the number of GPU below threshold check in watchdog
Open 5watchdog find this line and change 0 to the number of your disabled cards in miner

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]

So if you disabled 2 cards in miner command it will be :

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]

[/quote]

i did do the xorg. I will try the gpu threshold number thank you.

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: infowire on May 10, 2018, 10:58:32 AM

Threshold and utilization relate to memory or mem and core?

Utilization is too low
GPUs below threshold

i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC?

also i get this

Code:

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
'[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
'[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

Those errors are usually xorg problem, did you restore nvOC default xorg as I posted?

Threshold is the minimum GPU utilization before watchdog triggers
There is no option in nvoc to disable a GPU yet
But you can raise the number of GPU below threshold check in watchdog
Open 5watchdog find this line and change 0 to the number of your disabled cards in miner

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]

So if you disabled 2 cards in miner command it will be :

Code:

if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]

infowire

newbie

Activity: 96

Merit: 0

Threshold and utilization relate to memory or mem and core?

Utilization is too low
GPUs below threshold

i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC?

also i get this

Code:

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
'[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).

ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
'[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: infowire on May 09, 2018, 11:38:43 AM

I got a new problem.. ugh

I get

and it freezes and i got to manually restart. It helps if i close the miner_temp _control, until restarts the miner and loaded again.

Did you changed the cards slot or added new one?
I think thats a xorg problem.
Restore xorg with:

Code:

sudo wget -N https://raw.githubusercontent.com/papampi/nvOC_by_fullzero_Community_Release/19-2.1/xorg.conf -O /etc/X11/xorg.conf.default
sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf'
sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf.backup'
sudo reboot

infowire

newbie

Activity: 96

Merit: 0

I got a new problem.. ugh

I get
https://preview.ibb.co/haGOuS/IMG_20180509_112347.jpg
https://preview.ibb.co/bLSUZS/IMG_20180508_234639.jpg

and it freezes and i got to manually restart. It helps if i close the miner_temp _control, until restarts the miner and loaded again.

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: kostik2022 on May 08, 2018, 05:48:14 AM

Hello all

Got a strange issue.

Have a new installation of nvOC. Farm intend to run in headless mode.
After running

Code:

nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0"

the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running.

In fact have no idea how to fix it. Please, help with advice

As soon as rig started, close gnome-terminal
Set p106 headless mode in 1bash to yes, so 3main wont check the xorg.conf and restore from backup.
Put XORG_UPDATED in /home/m1/xorg_flag

Code:

echo "XORG_UPDATED" > /home/m1/xorg_flag

Then run your nvidia-xconfig and reboot.

Hope it helps.

Edit:
If your cards are not p106, open 3main and change:

Code:

if grep -q "P106-100" /tmp/tempa;
then
___1050_or_1050ti="YES"
P106_100="YES"
fi

To:

Code:

if grep -q -E 'P106|P104|P102' /tmp/tempa;
then
___1050_or_1050ti="YES"
P106_100="YES"
fi

kostik2022

newbie

Activity: 2

Merit: 0

Hello all

Got a strange issue.

Have a new installation of nvOC. Farm intend to run in headless mode.
After running

Code:

nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0"

the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running.

In fact have no idea how to fix it. Please, help with advice

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: terex on May 07, 2018, 12:07:06 PM

Hi Pap,

I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable.

https://github.com/ocminer/suprminer.git

./suprminer/build.sh

Don

Test frame.

ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK
--------------------------------------------------------------------------------
0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905
1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905
2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905
3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905
4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905
5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905
6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905
7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905
8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905
9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905
10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905

Yes suprminer is better than MSFT and better than that is Z-Enemy

X16R - RVN - Miner head to head test log

Zealot/Enemy (z-enemy) NVIDIA GPU miner.

urnzwy

newbie

Activity: 44

Merit: 0

I would like to give a special shout out to Stubo for so far helping me solve my crashing issue. Through multiple replies within the forum and additional through PM. Neither of my two 13 card Nvidia rigs has crashed in over 33 hours. Up from 12-18ish hours until crashing and freezing.

I had both machines at a 0/0 core/memory overlock running at 120W of power (1070 / 1070 Ti's) and 175W (1080 / 1080Ti).

I did not re-name the rigs, so both were named m1@m1-Desktop. This may have been causing an issue with the DNS registration (correct me if I am wrong Stubo).

The second thing I did was set my core to -100 and memory to +700 for mining ETH. The negative core suggested by Stubo. But keep the same power input. Nothing else was modified or changed.

I not only had an increase in my Mh/s (from 27 to 29.5) but the rigs seem stable thus far.

I will keep everyone posted if I come across anything else as I plan on setting back the original lack of overclock to see if there was any effect there or if the non-renaming was the issue.

terex

newbie

Activity: 7

Merit: 0

Hi Pap,

I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable.

https://github.com/ocminer/suprminer.git

./suprminer/build.sh

Don

Test frame.

ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK
--------------------------------------------------------------------------------
0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905
1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905
2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905
3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905
4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905
5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905
6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905
7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905
8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905
9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905
10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905

Topic: [OS] nvOC easy-to-use Linux Nvidia Mining - page 28. (Read 418313 times)