Pages:
Author

Topic: [OS] nvOC easy-to-use Linux Nvidia Mining - page 28. (Read 418313 times)

newbie
Activity: 96
Merit: 0
I don't know if they are related errors:


I have been messing with it for 2 days, installed the OS again... nothing



Did you try to change drivers? Try from 384.

No i haven't, current drivers are 387.34. So i should go back? All my rigs are 387.34, but this one have different cards in it.  
Just download the older drivers and run the install ? or do i have to do something else ? Anyone have upgraded 390.48? https://www.geforce.com/drivers/results/132530

------------------------------------- EDIT

So it was one card that caused the NVML Error "NVML: cannot get fan speed, error 15" i removed it and everything seems fine. I plugged in the culprit card on windows and its fine.... for now. Yes i tried switching risers before that.

 So to summarize my ordeal.

• Bad SSD - Slow running miner.
• A card that gave me an NVML error. EVGA 1060 Don't know if it was driver's fault or what.



newbie
Activity: 25
Merit: 0
I don't know if they are related errors:


I have been messing with it for 2 days, installed the OS again... nothing



Did you try to change drivers? Try from 384.
newbie
Activity: 96
Merit: 0
I don't know if they are related errors:

Code:
Calymore log:
15:01:25:998 f1bb9700 NVML: cannot get current temperature, error 15
15:01:25:998 f1bb9700 NVML: cannot get fan speed, error 15

Watchdog:
Mon May 14 15:01:23 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:0A:00.0: GPU is lost.  Reboot the system to recover this GPU

MINER_TEMP_CONTROL

ERROR: Error assigning value 50 to attribute 'GPUTargetFanSpeed'
       (m1-desktop:0[fan:9]) as specified in assignment
       '[fan:9]/GPUTargetFanSpeed=50' (Unknown Error).


ERROR: Error assigning value 50 to attribute 'GPUTargetFanSpeed'
       (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=50' (Unknown Error).




I did the default xorg. Nothing.

My 9th and 10th card's do not have a fan speed control in the nvidia panel.
Also only x0 - x8 screens too.
 
https://preview.ibb.co/kKZaGJ/1.png

This is how it's supposed to be. All the cards 0-8 have a fan speed control bar.
https://preview.ibb.co/cNwBqd/2.png




I have been messing with it for 2 days, installed the OS again... nothing

newbie
Activity: 44
Merit: 0
Well after much frustration with my rigs crashing, I thought I had it figured out but the problem continued.

Originally I had setup SMOS but really didn't care for the simplicity of the interface or should I say lack there of and found nvOC.

As my rigs are remote in another state, it was even more difficult to diagnose.

For ease, I had my brother-in-law load up SMOS on both rigs and they have been running for over 4 days with under-volting (100W 1070 & 1070 Ti / 175W 1080 & 1080 Ti) and overclocking (-50 core 1070/1070 Ti, 200 core 1080 & 1080 Ti with 1100 Mem for the 1070/1070 Ti and 1000 for the 1080 & 1080 Ti). Mining ETH.

Whatever my issue was, it was an issue within nvOC. The question is why and what was causing the crashing issue.

Later this week, I plan on loading up a fresh copy of nvOC and see if the issue continues. Unfortunately I cleared the USB sticks we were using that had the nvOC OS on them. Had I though about it, it would have been good to have them to see if anyone could identify any issues. The only thing I really did was re-compile the miners.

It is difficult to pin-point when the issue really showed up as we were having to re-set the miners at least once a week and then eventually even would crash once daily. Both at the same time even on different algorithms. I tried multiple different mining programs, no over clocks, half overclock, full overclock. Nothing seemed to matter. 18-24 hours the crash was inevitable.

I wish I had more information on this, but wanted to get it into the forum in-case this ever happened to anyone else. I will update once I re-load up nvOC and see what happens.
newbie
Activity: 12
Merit: 0
That lost temperature is almost evrytime too much OC on GPU. try lowering values. If persists than change riser...
newbie
Activity: 96
Merit: 0

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).


ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
       '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).




Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:
if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.


That error is from tempcontrol not watcdog,

Its weird, I had this error once when I added some cards to a rig, but restoring default xorg solved it.
May be try to re-install the image

Edit:
Are you using salfter switcher?


no i am not.


I got some other error after all night running good


Code:
Sun May 13 00:30:08 MST 2018 - Lost GPU so restarting system. Found GPU's:
Unable to determine the device handle for GPU 0000:08:00.0: GPU is lost.  Reboot the system to recover this GPU

but does not reboot since i don't have a reboot file or maybe it freezes.


EDIT: the os freezes and restarts the gui kinda slow... is that a sign of a bad card ?

I think the ssd went bad. Doesn't seem to write in hdd raw copy... even tho windows drive management says its healthy
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).


ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
       '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).




Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:
if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.


That error is from tempcontrol not watcdog,

Its weird, I had this error once when I added some cards to a rig, but restoring default xorg solved it.
May be try to re-install the image

Edit:
Are you using salfter switcher?
newbie
Activity: 96
Merit: 0

Still getting the same arrows with doing default xorg. Don't seem to do anything as far as crashes go.

Code:
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).


ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
       '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).




Is it ok if i leave the setting like this ? I tried my cards by the type of card, they seems to be alright 4 hours. Now im going to see if they play nice with each other.

Code:
if [ $NUM_GPU_BLW_THRSHLD -gt 12 ]

EDIT:
All the cards temperature control hang again. Blah.
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
What's the current state of this community release?
Is there support for X16R algo? (raven)


Add to 0miner and 1bash:

0miner:
Code:
 if [ $COIN == "RVN" ]
  then
    HCD='/home/m1/z-enemy/z-enemy_miner'
    ADDR="$RVN_ADDRESS.$RVN_WORKER"
    screen -dmSL miner $HCD -a x16r -o stratum+tcp://$RVN_POOL:$RVN_PORT -u $ADDR -p $MINER_PWD -i $RVN_INTENSITY
  fi

1bash:
Code:
RVN_WORKER="$WORKERNAME"
RVN_ADDRESS="Account name or RVN_address"
RVN_POOL="pool address without startum+tcp:// "
RVN_PORT="pool port"
RVN_INTENSITY="19"


Run:
Code:
mkdir -p /home/m1/z-enemy/
wget -O- https://raw.githubusercontent.com/papampi/nvOC_miners/master/z-enemy/z-enemy-1.09a-cuda80.tar.gz | tar -xzC /home/m1/z-enemy/ --strip 1
chmod a+x /home/m1/z-enemy/z-enemy_miner


Change coin in 1bash, and restart miner with:
Code:
pkill -e miner
watchdog will catch no miner running and will restart it

full member
Activity: 325
Merit: 110
What's the current state of this community release?
Is there support for X16R algo? (raven)
newbie
Activity: 96
Merit: 0
 
[/quote]
Those errors are usually xorg problem, did you restore nvOC default xorg as I posted?

Threshold is the minimum GPU utilization before watchdog triggers
There is no option in nvoc to disable a GPU yet
But you can raise the number of GPU below threshold check in watchdog
Open 5watchdog find this line and change 0 to the number of your disabled cards in miner
Code:
  if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]
So if you disabled 2 cards in miner command it will be :
Code:
  if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]

[/quote]


i did do the xorg.  I will try the gpu threshold number thank you.
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
Threshold and utilization relate to memory or mem and core?

Utilization is too low
GPUs below threshold

i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC?


also i get this

Code:
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).


ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
       '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).


Those errors are usually xorg problem, did you restore nvOC default xorg as I posted?

Threshold is the minimum GPU utilization before watchdog triggers
There is no option in nvoc to disable a GPU yet
But you can raise the number of GPU below threshold check in watchdog
Open 5watchdog find this line and change 0 to the number of your disabled cards in miner
Code:
  if [ $NUM_GPU_BLW_THRSHLD -gt 0 ]
So if you disabled 2 cards in miner command it will be :
Code:
  if [ $NUM_GPU_BLW_THRSHLD -gt 2 ]
newbie
Activity: 96
Merit: 0
Threshold and utilization relate to memory or mem and core?

Utilization is too low
GPUs below threshold

i have disabled -di through claymore, maybe that causes the reboot the system. How do you disable cards in nvOC?


also i get this

Code:
ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:10]) as specified in assignment
       '[fan:10]/GPUTargetFanSpeed=75' (Unknown Error).


ERROR: Error assigning value 75 to attribute 'GPUTargetFanSpeed' (m1-desktop:0[fan:11]) as specified in assignment
       '[fan:11]/GPUTargetFanSpeed=75' (Unknown Error).

full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
I got a new problem.. ugh

I get



and it freezes and i got to manually restart.  It helps if i close the miner_temp _control, until restarts the miner and loaded again.

Did you changed the cards slot or added new one?
I think thats a xorg problem.
Restore xorg with:

Code:
sudo wget -N https://raw.githubusercontent.com/papampi/nvOC_by_fullzero_Community_Release/19-2.1/xorg.conf -O /etc/X11/xorg.conf.default
sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf'
sudo cp '/etc/X11/xorg.conf.default' '/etc/X11/xorg.conf.backup'
sudo reboot
newbie
Activity: 96
Merit: 0
I got a new problem.. ugh

I get
https://preview.ibb.co/haGOuS/IMG_20180509_112347.jpg
https://preview.ibb.co/bLSUZS/IMG_20180508_234639.jpg

and it freezes and i got to manually restart.  It helps if i close the miner_temp _control, until restarts the miner and loaded again.
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
Hello all

Got a strange issue.

Have a new installation of nvOC. Farm intend to run in headless mode.
After running
Code:
nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0"
the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running.

In fact have no idea how to fix it. Please, help with advice Smiley

As soon as rig started, close gnome-terminal
Set p106 headless mode in 1bash to yes, so 3main wont check the xorg.conf and restore from backup.
Put XORG_UPDATED in /home/m1/xorg_flag
Code:
echo "XORG_UPDATED" > /home/m1/xorg_flag

Then run your nvidia-xconfig and reboot.

Hope it helps.


Edit:
If your cards are not p106, open 3main and change:
Code:
if grep -q "P106-100" /tmp/tempa;
then
  ___1050_or_1050ti="YES"
  P106_100="YES"
fi

To:
Code:
if grep -q -E 'P106|P104|P102' /tmp/tempa;
then
  ___1050_or_1050ti="YES"
  P106_100="YES"
fi


newbie
Activity: 2
Merit: 0
Hello all

Got a strange issue.

Have a new installation of nvOC. Farm intend to run in headless mode.
After running
Code:
nvidia-xconfig -a --allow-empty-initial-configuration --cool-bits=28 --use-display-device="DFP-0" --connected-monitor="DFP-0"
the system reboots, & I got a message "Xorg problems" and then system reboots automatically in 5 seconds. After reboot xorg.conf looks cutted and not correct, and thabks this no xorg processes are running.

In fact have no idea how to fix it. Please, help with advice Smiley
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
Hi Pap,

I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable.

https://github.com/ocminer/suprminer.git

./suprminer/build.sh

Don




Test frame.   

ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK
--------------------------------------------------------------------------------
0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905
1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905
2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905
3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905
4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905
5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905
6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905
7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905
8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905
9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905
10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905





Yes suprminer is better than MSFT and better than that is Z-Enemy

X16R - RVN - Miner head to head test log

Zealot/Enemy (z-enemy) NVIDIA GPU miner.
newbie
Activity: 44
Merit: 0
I would like to give a special shout out to Stubo for so far helping me solve my crashing issue. Through multiple replies within the forum and additional through PM. Neither of my two 13 card Nvidia rigs has crashed in over 33 hours. Up from 12-18ish hours until crashing and freezing.

I had both machines at a 0/0 core/memory overlock running at 120W of power (1070 / 1070 Ti's) and 175W (1080 / 1080Ti).

I did not re-name the rigs, so both were named m1@m1-Desktop. This may have been causing an issue with the DNS registration (correct me if I am wrong Stubo).

The second thing I did was set my core to -100 and memory to +700 for mining ETH. The negative core suggested by Stubo. But keep the same power input. Nothing else was modified or changed.

I not only had an increase in my Mh/s (from 27 to 29.5) but the rigs seem stable thus far.

I will keep everyone posted if I come across anything else as I plan on setting back the original lack of overclock to see if there was any effect there or if the non-renaming was the issue. 
newbie
Activity: 7
Merit: 0
Hi Pap,

I saw the MSFT ccminer for RVN in the miners update, I tried the code and it segmentation faults pretty quickly - I am running suprminer(no fee), it runs without issue - Tried it on two different frames, its been stable.

https://github.com/ocminer/suprminer.git

./suprminer/build.sh

Don




Test frame.   

ID,VENDOR,MODEL,PSTATE,TEMP,FAN,UTILIZATION,POWER,POWERLIMIT,MAXPOWER,GPUCLOCK,MEMCLOCK
--------------------------------------------------------------------------------
0, GIGABYTE, P106-100, P0, 49, 50, 100, 67.39, 85.00, 140.00, 1493, 3905
1, ASUS, P106-100, P0, 54, 50, 100, 66.95, 85.00, 140.00, 1594, 3905
2, ASUS, P106-100, P0, 41, 50, 99, 55.56, 85.00, 140.00, 1493, 3905
3, ASUS, P106-100, P0, 54, 50, 100, 51.85, 85.00, 140.00, 1657, 3905
4, ASUS, P106-100, P0, 56, 50, 99, 96.83, 85.00, 140.00, 1721, 3905
5, ASUS, P106-100, P0, 54, 50, 100, 57.54, 85.00, 140.00, 1493, 3905
6, ASUS, P106-100, P0, 57, 50, 100, 69.79, 85.00, 140.00, 1620, 3905
7, ASUS, P106-100, P0, 55, 50, 99, 61.98, 85.00, 140.00, 1493, 3905
8, MSI, P106-100, P0, 52, 50, 100, 51.64, 85.00, 140.00, 1468, 3905
9, MSI, P106-100, P0, 51, 50, 100, 54.11, 85.00, 140.00, 1493, 3905
10, MSI, P106-100, P0, 49, 50, 100, 52.74, 85.00, 140.00, 1468, 3905


Pages:
Jump to: