[ mining os ] nvoc - page 25. | Bitcointalksearch.org

fk1

full member

Activity: 216

Merit: 100

After months of mining without problems I messed it up again somehow Sad

I wanted to update to claymore 11.8 so I did the papmpi update script which was posted here around feb. When I recompiled miners somehow this took forever (>30mins) and seemed to loop so I decided to ctrl+c. Now somehow when reboot the mining process doesn't start anymore.

Code:

m1@m1-desktop:~$ screen -r miner
There is no screen to be resumed matching miner.

I tried ./nvOC patch but problem remains. Here is ./nvOC report

Code:

Software info:
Report ver : v0019-2.0.002
nvOC (1bash) : nvOC v0019-2.0 - Community Release
nvOC (3main) : nvOC v0019-2.0 - Community Release
1bash ver : v0019-2.0.003
3main ver : v0019-2.0.006
5watchdog ver : v0019-2.0.011
6tempcontrol v: v0019-2.0.003
wtm switch ver: v0019-2.0.0011
Kernel : 4.4.0-97-generic
OS : Ubuntu 16.04.3 LTS
System : (gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
nvidia driver : 390.59

Hardware info:
Motherboard : ASUS PRIME Z270-A Rev 1.xx
BIOS ver. : 1009 07/23/2017
CPU Model : Celeron(R) CPU G3900 @ 2.80GHz
CPU Cores : 2 (Cores + Threads)
Mem Total : 3984604 kB
Mem Free : 2848876 kB
Swap Total : 0 kB
Swap Free : 0 kB
Ethernet : Intel I219-V

HDD and Partion info:
NAME SIZE FSTYPE TYPE ROTA HOTPLUG TRAN VENDOR MODEL REV MOUNTPOINT
sda 74.5G disk 1 0 sata ATA TOSHIBA MK8052GS 0A
├─sda1 9M vfat part 1 0
└─sda2 74.5G ext4 part 1 0 /

VGA info:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
09:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

1bash settings:
1bash version .............: v0019-2.0.003
LOCAL or REMOTE ...........: REMOTE
TEAMVIEWER started ........: NO
SSH daemon started ........: YES
SLOW_USB_KEY_MODE .........: NO
SRR .......................: NO
Watchdog ..................: YES

Temp Control ..............: YES
TARGET_TEMP ...............: 82
__FAN_ADJUST ..............: 5
POWER_ADJUST ..............: 5
ALLOWED_TEMP_DIFF .........: 2
RESTORE_POWER_LIMIT .......: 90
MINIMAL_FAN_SPEED .........: 95

CLEAR_LOGS_ON_BOOT ........: NO
AUTO_UPDATE ...............: STABLE
AUTO_REBOOT ...............: NO
_Parallax_MODE (upPaste)...: NO

TELEGRAM_MESSAGES .: YES
TELEGRAM_ALERTS .: YES
TELEGRAM_TYPE .............: papampi
TELEGRAM_TIMEOUT_IN_MINUTES: 480
TELEGRAM_CHATID ...........: CHATID_NOT_SHOWN
TELEGRAM_APIKEY ...........: APIKEY_NOT_SHOWN

P106_100_FULL_HEADLESS_MODE: NO
GPUPowerMizerMode_Adjust...: NO
POWERLIMIT (global) .......: YES
POWERLIMIT_WATTS ..........: 80
CORE_OVERCLOCK (global)....: 150
MEMORY_OVERCLOCK (global) .: 800
MANUAL_FAN ................: NO
ALGO_SPECIFIC_OC ..........: YES

GLOBAL_WORKERNAME .........: YES
AUTO_WORKERNAME ...........: CUSTOM
CUSTOM_WORKERNAME .........: Luise01
plusCPU ...................: NO

ZM_or_EWBF ................: EWBF
EWBF_VERSION ..............: 3_4
EWBF_PERCENT ..............: 0

COIN ......................: FTC
FTC_WORKER ................: Luise01
FTC_ADDRESS ...............: LuisenMi...
FTC_POOL ..................: hub.miningpoolhub.com
FTC_PORT ..................: 20510
FTC_INTENSITY .............: 23

PS: When I hit 'bash 3main' it seems like its working but this is only for ETH. iE FTC ccminer starts but doesn't mine. When I do 'bash 3main' I also have to ctrl+C the following error twice in order to continue:

Code:

INFO:guake.guake_app:Logging configuration complete
/usr/lib/python2.7/dist-packages/guake/guake_app.py:1785: GtkWarning: gtk_box_pack: assertion 'child->parent == NULL' failed
self.mainframe.pack_start(self.notebook, expand=True, fill=True, padding=0)

** (guake:3394): WARNING **: Binding 'F12' failed!

CryptAtomeTrader44

full member

Activity: 340

Merit: 103

It is easier to break an atom than partialities AE

Miner update for xmr-stack :
https://github.com/fireice-uk/xmr-stak/releases

Changelog:

fix job update race conditions #1537 #1553 #1554 #1568 #1592
fix OpenBSD compile #1468
add currency
turtlecoin #1469
IPBC coin #1486
masari coin #1515
support for stellite v4 fork #1512 #1555
extent benchmark option #1473
fix that cli option --noAMDCache #1486
fix possible NVIDIA Volat bug #1569
add support for CUDA 9.2 #1580
fix broken AMD APP SDK download #1609

mtx_demon

full member

Activity: 273

Merit: 100

Quote from: mtx_demon on June 03, 2018, 06:59:07 PM

I love you man!! hahahah just kidding!!

2 minutes to reboot and miner to start

Now I wanna see when ccminer stop if can restart without having issues

Even HIVEOS it's really easy to use it's not what I am looking here

the NVOC for me is the best. was kinda hard 6 months ago to setup the way that I wanted, but
now I can run the new miners this is the best all control is in your hand

Thanks again!!

mtx_demon

full member

Activity: 273

Merit: 100

I love you man!! hahahah just kidding!!

2 minutes to reboot and miner to start

Now I wanna see when ccminer stop if can restart without having issues

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: mtx_demon on June 03, 2018, 12:35:33 PM

So it's been couple months now that I have this issue

I tried HIVEOS and ETHOS bought does not take the time to load as NVOC

something should be wrong the load GOES TOO HIGH than takes around 10 minutes to the rig start mining
after that everything works good just the load is around 10
using ccminer

I tried using z-enemy to mine x16r no problems load was low pretty good

On hiveos using klaust load was around 8 but system worked fine and reboot would take less than 1 minute

Also another trouble that I have here is when the ccminer reboots the temp takes too long to start the 12 GPUS and the watchdog kill before finishes how can I make this longer??
because watchdog is killing almost on GPU9 so I would need more 30 seconds to be safe

Anybody having issues running 12 GPUS??

Did you updated the rig recently?
I started updating rigs and it updated nvidia to latest 390 no problem on my 1070 rigs, but my 1060 rigs showed same sympthom
If you check with "top" you will see nvidia-settings at full 100% cpu usage
The only way I found to solve it was to remove nvidia completely, then download and install nvidia-387 manually

These are the steps I took to solve my issue:

Code:

cd Downloads/
wget -c http://us.download.nvidia.com/XFree86/Linux-x86_64/387.34/NVIDIA-Linux-x86_64-387.34.run
sudo service lightdm stop
sudo apt purge nvidia*
chmod +x NVIDIA-Linux-x86_64-387.34.run
sudo ./NVIDIA-Linux-x86_64-387.34.run
sudo apt-mark hold nvidia-387
sudo apt-mark hold nvidia-390
sudo apt install nvidia-settings
sudo wget -N https://raw.githubusercontent.com/papampi/nvOC_by_fullzero_Community_Release/19-2.1/xorg.conf -O /etc/X11/xorg.conf.default
sudo cp /etc/X11/xorg.conf.default /etc/X11/xorg.conf
sudo reboot

Or if you want NVIDIA-384 use this link

Code:

http://us.download.nvidia.com/XFree86/Linux-x86_64/384.111/NVIDIA-Linux-x86_64-384.111.run

mtx_demon

full member

Activity: 273

Merit: 100

So it's been couple months now that I have this issue

I tried HIVEOS and ETHOS bought does not take the time to load as NVOC

something should be wrong the load GOES TOO HIGH than takes around 10 minutes to the rig start mining
after that everything works good just the load is around 10
using ccminer

I tried using z-enemy to mine x16r no problems load was low pretty good

On hiveos using klaust load was around 8 but system worked fine and reboot would take less than 1 minute

Also another trouble that I have here is when the ccminer reboots the temp takes too long to start the 12 GPUS and the watchdog kill before finishes how can I make this longer??
because watchdog is killing almost on GPU9 so I would need more 30 seconds to be safe

Anybody having issues running 12 GPUS??

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: Stubo on June 02, 2018, 01:16:01 PM

Quote from: WaveFront on June 02, 2018, 09:49:18 AM

Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?

Here is a look at how much it uses one one of my rigs with 7 1070's:

Code:

m1@Miner1:~$ free
total used free shared buff/cache available
Mem: 8107204 1632364 5565128 42012 909712 6055112

As you can see, I have 8 Gb but it uses less than 2 in REMOTE (no local display). I tend to buy HW that I can re-purpose in the event that mining goes south, so I recommend running a single 8 Gb stick per mining host. I typically purchase them as a 16 Gb "kit" with 2 8 GB DIMMs. Clearly, you could go as low as 4 with no issues but I have no use for DIMMs that small outside of mining.

I think it depends on card counts and types.
As stubo said, I think 4Gb is enough too.

1x 1070 + 2x 1060 test rig:

Code:

total used free shared buff/cache available
Mem: 8135504 1061156 5751296 31272 1323052 6703400
Swap: 8388604 0 8388604

8 x 1070 :

Code:

total used free shared buff/cache available
Mem: 8130664 2414736 4060072 62760 1655856 5228228
Swap: 8388604 0 8388604

12 x 1060 :

Code:

total used free shared buff/cache available
Mem: 8130656 2281908 4463608 86440 1385140 5317732
Swap: 8388604 0 8388604

6 X 1060 :

Code:

total used free shared buff/cache available
Mem: 8135612 1571280 5353460 56224 1210872 6125424
Swap: 8388604 0 8388604

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: abdeldev on June 02, 2018, 09:26:06 PM

the full sequence didn't work, I'm back to :

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Thanks for confirmation
Wanted to know if it doesnt work on my test rig only or else ...

Added your suggestion for next release (your name included)
sysrq_reboot pull request

abdeldev

newbie

Activity: 4

Merit: 0

the full sequence didn't work, I'm back to :

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

abdeldev

newbie

Activity: 4

Merit: 0

Quote from: WaveFront on June 02, 2018, 09:47:26 AM

Quote from: papampi on June 02, 2018, 06:33:28 AM

Quote from: abdeldev on June 02, 2018, 05:16:56 AM

Quote from: papampi on June 02, 2018, 12:31:10 AM

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u

Isnt it better to do the full REISUB sequence?

Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul suggested that I go with the SUSB sequence

I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..

Hey, very interesting subject. After several tries, I approached the kernel panics problem with a hardware solution.
I have a RaspberryPI, with one of the GPIOs, interfaced to the reset pins of the motherboard. The RPI checks every 30 seconds the status of the SSH port of the rig (I find it more reliable than just pinging the mobo).
If the mobo is unresponsive for more than 10 minutes the RPI resets the rig.
I will publish the RPI scripts and schematics as soon as I have 10 minutes :-D

Yeah I'm trying a hardware solution too using RaspberryPI, and now I'm happy with the sysrq workaround as it prevent any freeze caused by faulty riser.
I'm testing the full REISUB sequence as papampi suggested, we'll give u a feed back tomorrow.

Stubo

member

Activity: 224

Merit: 13

Quote from: WaveFront on June 02, 2018, 09:49:18 AM

Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?

Here is a look at how much it uses one one of my rigs with 7 1070's:

Code:

m1@Miner1:~$ free
total used free shared buff/cache available
Mem: 8107204 1632364 5565128 42012 909712 6055112

As you can see, I have 8 Gb but it uses less than 2 in REMOTE (no local display). I tend to buy HW that I can re-purpose in the event that mining goes south, so I recommend running a single 8 Gb stick per mining host. I typically purchase them as a 16 Gb "kit" with 2 8 GB DIMMs. Clearly, you could go as low as 4 with no issues but I have no use for DIMMs that small outside of mining.

WaveFront

member

Activity: 126

Merit: 10

Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?

WaveFront

member

Activity: 126

Merit: 10

Quote from: papampi on June 02, 2018, 06:33:28 AM

Quote from: abdeldev on June 02, 2018, 05:16:56 AM

Quote from: papampi on June 02, 2018, 12:31:10 AM

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u

Isnt it better to do the full REISUB sequence?

Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul suggested that I go with the SUSB sequence

I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..

Hey, very interesting subject. After several tries, I approached the kernel panics problem with a hardware solution.
I have a RaspberryPI, with one of the GPIOs, interfaced to the reset pins of the motherboard. The RPI checks every 30 seconds the status of the SSH port of the rig (I find it more reliable than just pinging the mobo).
If the mobo is unresponsive for more than 10 minutes the RPI resets the rig.
I will publish the RPI scripts and schematics as soon as I have 10 minutes :-D

Oleg_filin

newbie

Activity: 31

Merit: 0

Can I run ohGodanETHlargementPill on NvOc 19.1.4?

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: abdeldev on June 02, 2018, 05:16:56 AM

Quote from: papampi on June 02, 2018, 12:31:10 AM

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u

Isnt it better to do the full REISUB sequence?

Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul suggested that I go with the SUSB sequence

I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..

LuKePicci

jr. member

Activity: 128

Merit: 1

Quote from: martyroz on May 28, 2018, 05:39:08 PM

Will there be another ISO version after 2.0? I'm not confident in the process to convert to beta 2.1

The 2.1 tree has been patched (after a lot of work) to be 100% compatible side-by-side with other nvOC versions. It's also very easy to switch back to 2.0 if you find some issues or bugs while testing, and is easy as well getting latest patches with fixes in a matter of seconds by just performing giving (in most cases) a single command. You can even do it remotely without having to detach your ssd/hdd/usb drive. And it takes only some minutes. It's not a conversion, nvOC 2.0 will remain untouched, ready to fall back in case. Future (at least minor) updates will be likely released in the same way, so I would strongly encourage you all in getting familiar with that easy workflow.

EDIT: you can always find updated, valid install/testing guide in the GitHub nvOC wiki: https://github.com/papampi/nvOC_by_fullzero_Community_Release/wiki/nvOC-19-2.1-beta-testing

abdeldev

newbie

Activity: 4

Merit: 0

Quote from: papampi on June 02, 2018, 12:31:10 AM

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u

Isnt it better to do the full REISUB sequence?

Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul suggested that I go with the SUSB sequence

LuKePicci

jr. member

Activity: 128

Merit: 1

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

As explained in the wikipedia page you linked, those magic sysrq keys cannot work if kernel panics occurred. I had months ago a faulty riser as well and experienced the "GPUx fallen off the bus error" but it never caused kernel panics, the watchdog detected the miner error state and restarted it (with one gpu less) without rebooting. What this script is intended to do?

Note I also have Intel WDT Driver in use, may be it didn't allow me to experience those reboot hangs you mentioned.

papampi

full member

Activity: 686

Merit: 140

Linux FOREVER! Resistance is futile!!!

Quote from: abdeldev on June 01, 2018, 09:45:15 PM

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u

Isnt it better to do the full REISUB sequence?

Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

abdeldev

newbie

Activity: 4

Merit: 0

Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger

Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team Wink

Topic: [ mining os ] nvoc - page 25. (Read 418529 times)