Pages:
Author

Topic: [ mining os ] nvoc - page 25. (Read 418529 times)

fk1
full member
Activity: 216
Merit: 100
June 06, 2018, 01:42:28 AM
After months of mining without problems I messed it up again somehow Sad
I wanted to update to claymore 11.8 so I did the papmpi update script which was posted here around feb. When I recompiled miners somehow this took forever (>30mins) and seemed to loop so I decided to ctrl+c. Now somehow when reboot the mining process doesn't start anymore.

Code:
m1@m1-desktop:~$ screen -r miner
There is no screen to be resumed matching miner.

I tried ./nvOC patch but problem remains. Here is ./nvOC  report

Code:

Software info:
Report ver    :  v0019-2.0.002
nvOC (1bash)  :  nvOC v0019-2.0 - Community Release
nvOC (3main)  :  nvOC v0019-2.0 - Community Release
1bash ver     :  v0019-2.0.003
3main ver     :  v0019-2.0.006
5watchdog ver :  v0019-2.0.011
6tempcontrol v:  v0019-2.0.003
wtm switch ver:  v0019-2.0.0011
Kernel        :  4.4.0-97-generic
OS            :  Ubuntu 16.04.3 LTS
System        :  (gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
nvidia driver :  390.59

Hardware info:
Motherboard   :  ASUS PRIME Z270-A  Rev 1.xx
BIOS ver.     :  1009 07/23/2017
CPU Model     :  Celeron(R) CPU G3900 @ 2.80GHz
CPU Cores     :  2 (Cores + Threads)
Mem Total     :         3984604 kB
Mem Free      :         2848876 kB
Swap Total    :               0 kB
Swap Free     :               0 kB
Ethernet      :  Intel I219-V

HDD and Partion info:
NAME    SIZE FSTYPE TYPE ROTA HOTPLUG TRAN   VENDOR   MODEL             REV MOUNTPOINT
sda    74.5G        disk    1       0 sata   ATA      TOSHIBA MK8052GS 0A
├─sda1    9M vfat   part    1       0
└─sda2 74.5G ext4   part    1       0                                       /

VGA info:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)
09:00.0 VGA compatible controller: NVIDIA Corporation Device 1c03 (rev a1)

1bash settings:
1bash version .............: v0019-2.0.003
LOCAL or REMOTE ...........: REMOTE
TEAMVIEWER started ........: NO
SSH daemon started ........: YES
SLOW_USB_KEY_MODE .........: NO
SRR .......................: NO
Watchdog ..................: YES

Temp Control ..............: YES
TARGET_TEMP ...............: 82
__FAN_ADJUST ..............: 5
POWER_ADJUST ..............: 5
ALLOWED_TEMP_DIFF .........: 2
RESTORE_POWER_LIMIT .......: 90
MINIMAL_FAN_SPEED .........: 95

CLEAR_LOGS_ON_BOOT ........: NO
AUTO_UPDATE ...............: STABLE
AUTO_REBOOT ...............: NO
_Parallax_MODE (upPaste)...: NO

TELEGRAM_MESSAGES .: YES
TELEGRAM_ALERTS .: YES
TELEGRAM_TYPE .............: papampi
TELEGRAM_TIMEOUT_IN_MINUTES: 480
TELEGRAM_CHATID ...........: CHATID_NOT_SHOWN
TELEGRAM_APIKEY ...........: APIKEY_NOT_SHOWN

P106_100_FULL_HEADLESS_MODE: NO
GPUPowerMizerMode_Adjust...: NO
POWERLIMIT (global) .......: YES
POWERLIMIT_WATTS ..........: 80
CORE_OVERCLOCK (global)....: 150
MEMORY_OVERCLOCK (global) .: 800
MANUAL_FAN ................: NO
ALGO_SPECIFIC_OC ..........: YES

GLOBAL_WORKERNAME .........: YES
AUTO_WORKERNAME ...........: CUSTOM
CUSTOM_WORKERNAME .........: Luise01
plusCPU ...................: NO

ZM_or_EWBF ................: EWBF
EWBF_VERSION ..............: 3_4
EWBF_PERCENT ..............: 0

COIN ......................: FTC
FTC_WORKER ................: Luise01
FTC_ADDRESS ...............: LuisenMi...
FTC_POOL ..................: hub.miningpoolhub.com
FTC_PORT ..................: 20510
FTC_INTENSITY .............: 23


PS: When I hit 'bash 3main' it seems like its working but this is only for ETH. iE FTC ccminer starts but doesn't mine. When I do 'bash 3main' I also have to ctrl+C the following error twice in order to continue:

Code:
INFO:guake.guake_app:Logging configuration complete
/usr/lib/python2.7/dist-packages/guake/guake_app.py:1785: GtkWarning: gtk_box_pack: assertion 'child->parent == NULL' failed
  self.mainframe.pack_start(self.notebook, expand=True, fill=True, padding=0)

** (guake:3394): WARNING **: Binding 'F12' failed!
full member
Activity: 340
Merit: 103
It is easier to break an atom than partialities AE
June 04, 2018, 06:55:09 PM
Miner update for xmr-stack :
https://github.com/fireice-uk/xmr-stak/releases

Changelog:

    fix job update race conditions #1537 #1553 #1554 #1568 #1592
    fix OpenBSD compile #1468
    add currency
        turtlecoin #1469
        IPBC coin #1486
        masari coin #1515
    support for stellite v4 fork #1512 #1555
    extent benchmark option #1473
    fix that cli option --noAMDCache #1486
    fix possible NVIDIA Volat bug #1569
    add support for CUDA 9.2 #1580
    fix broken AMD APP SDK download #1609
full member
Activity: 273
Merit: 100
June 03, 2018, 07:30:34 PM
I love you man!! hahahah just kidding!!

2 minutes to reboot and miner to start


Now I wanna see when ccminer stop if can restart without having issues

Even HIVEOS it's really easy to use it's not what I am looking here

the NVOC for me is the best. was kinda hard 6 months ago to setup the way that I wanted, but
now I can run the new miners this is the best all control is in your hand

Thanks again!!
full member
Activity: 273
Merit: 100
June 03, 2018, 06:59:07 PM
I love you man!! hahahah just kidding!!

2 minutes to reboot and miner to start


Now I wanna see when ccminer stop if can restart without having issues
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
June 03, 2018, 01:43:48 PM
So it's been couple months now that I have this issue

I tried HIVEOS and ETHOS bought does not take the time to load as NVOC

something should be wrong the load GOES TOO HIGH than takes around 10 minutes to the rig start mining
after that everything works good just the load is around 10
using ccminer

I tried using z-enemy to mine x16r no problems load was low pretty good

On hiveos using klaust load was around 8 but system worked fine and reboot would take less than 1 minute

Also another trouble that I have here is when the ccminer reboots the temp takes too long to start the 12 GPUS and the watchdog kill before finishes how can I make this longer??
because watchdog is killing almost on GPU9 so I would need more 30 seconds to be safe

Anybody having issues running 12 GPUS??

Did you updated the rig recently?
I started updating rigs and it updated nvidia to latest 390 no problem on my 1070 rigs, but my 1060 rigs showed same sympthom
If you check with "top" you will see nvidia-settings at full 100% cpu usage
The only way I found to solve it was to remove nvidia completely, then download and install nvidia-387 manually

These are the steps I took to solve my issue:

Code:
cd Downloads/
wget -c http://us.download.nvidia.com/XFree86/Linux-x86_64/387.34/NVIDIA-Linux-x86_64-387.34.run
sudo service lightdm stop
sudo apt purge nvidia*
chmod +x NVIDIA-Linux-x86_64-387.34.run
sudo ./NVIDIA-Linux-x86_64-387.34.run
sudo apt-mark hold nvidia-387
sudo apt-mark hold nvidia-390
sudo apt install nvidia-settings
sudo wget -N https://raw.githubusercontent.com/papampi/nvOC_by_fullzero_Community_Release/19-2.1/xorg.conf -O /etc/X11/xorg.conf.default
sudo cp /etc/X11/xorg.conf.default /etc/X11/xorg.conf
sudo reboot


Or if you want NVIDIA-384 use this link
Code:
http://us.download.nvidia.com/XFree86/Linux-x86_64/384.111/NVIDIA-Linux-x86_64-384.111.run
full member
Activity: 273
Merit: 100
June 03, 2018, 12:35:33 PM
So it's been couple months now that I have this issue

I tried HIVEOS and ETHOS bought does not take the time to load as NVOC

something should be wrong the load GOES TOO HIGH than takes around 10 minutes to the rig start mining
after that everything works good just the load is around 10
using ccminer

I tried using z-enemy to mine x16r no problems load was low pretty good

On hiveos using klaust load was around 8 but system worked fine and reboot would take less than 1 minute

Also another trouble that I have here is when the ccminer reboots the temp takes too long to start the 12 GPUS and the watchdog kill before finishes how can I make this longer??
because watchdog is killing almost on GPU9 so I would need more 30 seconds to be safe

Anybody having issues running 12 GPUS??
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
June 03, 2018, 02:01:49 AM
Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?

Here is a look at how much it uses one one of my rigs with 7 1070's:
Code:
m1@Miner1:~$ free
              total        used        free      shared  buff/cache   available
Mem:        8107204     1632364     5565128       42012      909712     6055112

As you can see, I have 8 Gb but it uses less than 2 in REMOTE (no local display). I tend to buy HW that I can re-purpose in the event that mining goes south, so I recommend running a single 8 Gb stick per mining host. I typically purchase them as a 16 Gb "kit" with 2 8 GB DIMMs. Clearly, you could go as low as 4 with no issues but I have no use for DIMMs that small outside of mining.

I think it depends on card counts and types.
As stubo said, I think 4Gb is enough too.

1x 1070 + 2x 1060 test rig:
Code:
             total        used        free      shared  buff/cache   available
Mem:        8135504     1061156     5751296       31272     1323052     6703400
Swap:       8388604           0     8388604


8 x 1070  :
Code:
             total        used        free      shared  buff/cache   available
Mem:        8130664     2414736     4060072       62760     1655856     5228228
Swap:       8388604           0     8388604

12 x 1060 :
Code:
             total        used        free      shared  buff/cache   available
Mem:        8130656     2281908     4463608       86440     1385140     5317732
Swap:       8388604           0     8388604


6 X 1060 :
Code:
             total        used        free      shared  buff/cache   available
Mem:        8135612     1571280     5353460       56224     1210872     6125424
Swap:       8388604           0     8388604
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
June 03, 2018, 01:56:16 AM
the full sequence didn't work, I'm back to :

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger




Thanks for confirmation
Wanted to know if it doesnt work on my test rig only or else ...

Added your suggestion for next release (your name included)
sysrq_reboot pull request
newbie
Activity: 4
Merit: 0
June 02, 2018, 09:26:06 PM
the full sequence didn't work, I'm back to :

#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


newbie
Activity: 4
Merit: 0
June 02, 2018, 07:02:22 PM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul  suggested that I go with the SUSB sequence



I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..
Hey, very interesting subject. After several tries, I approached the kernel panics problem with a hardware solution.
I have a RaspberryPI, with one of the GPIOs, interfaced to the reset pins of the motherboard. The RPI checks every 30 seconds the status of the SSH port of the rig (I find it more reliable than just pinging the mobo).
If the mobo is unresponsive for more than 10 minutes the RPI resets the rig.
I will publish the RPI scripts and schematics as soon as I have 10 minutes :-D

Yeah I'm trying a hardware solution too using RaspberryPI, and now I'm happy with the sysrq workaround as it prevent any freeze caused by faulty riser.
I'm testing the full REISUB sequence as papampi suggested, we'll give u a feed back tomorrow.
member
Activity: 224
Merit: 13
June 02, 2018, 01:16:01 PM
Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?

Here is a look at how much it uses one one of my rigs with 7 1070's:
Code:
m1@Miner1:~$ free
              total        used        free      shared  buff/cache   available
Mem:        8107204     1632364     5565128       42012      909712     6055112

As you can see, I have 8 Gb but it uses less than 2 in REMOTE (no local display). I tend to buy HW that I can re-purpose in the event that mining goes south, so I recommend running a single 8 Gb stick per mining host. I typically purchase them as a 16 Gb "kit" with 2 8 GB DIMMs. Clearly, you could go as low as 4 with no issues but I have no use for DIMMs that small outside of mining.
member
Activity: 126
Merit: 10
June 02, 2018, 09:49:18 AM
Hey :-)
What do you think is the ideal RAM size for a motherboard running nvOC?
member
Activity: 126
Merit: 10
June 02, 2018, 09:47:26 AM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul  suggested that I go with the SUSB sequence



I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..
Hey, very interesting subject. After several tries, I approached the kernel panics problem with a hardware solution.
I have a RaspberryPI, with one of the GPIOs, interfaced to the reset pins of the motherboard. The RPI checks every 30 seconds the status of the SSH port of the rig (I find it more reliable than just pinging the mobo).
If the mobo is unresponsive for more than 10 minutes the RPI resets the rig.
I will publish the RPI scripts and schematics as soon as I have 10 minutes :-D
newbie
Activity: 31
Merit: 0
June 02, 2018, 07:51:02 AM
Can I run ohGodanETHlargementPill on NvOc 19.1.4?
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
June 02, 2018, 06:33:28 AM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul  suggested that I go with the SUSB sequence



I think the 2nd (S)ync wont do any thing as system is mounted as read-only in previous (U)mount step.
Can you please check the full REISUB and see how it works ..
jr. member
Activity: 128
Merit: 1
June 02, 2018, 05:41:02 AM
Will there be another ISO version after 2.0? I'm not confident in the process to convert to beta 2.1

The 2.1 tree has been patched (after a lot of work) to be 100% compatible side-by-side with other nvOC versions. It's also very easy to switch back to 2.0 if you find some issues or bugs while testing, and is easy as well getting latest patches with fixes in a matter of seconds by just performing giving (in most cases) a single command. You can even do it remotely without having to detach your ssd/hdd/usb drive. And it takes only some minutes. It's not a conversion, nvOC 2.0 will remain untouched, ready to fall back in case. Future (at least minor) updates will be likely released in the same way, so I would strongly encourage you all in getting familiar with that easy workflow.

EDIT: you can always find updated, valid install/testing guide in the GitHub nvOC wiki: https://github.com/papampi/nvOC_by_fullzero_Community_Release/wiki/nvOC-19-2.1-beta-testing
newbie
Activity: 4
Merit: 0
June 02, 2018, 05:16:56 AM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger

Yeah I've Tested it for 2 days now and the rig reboot as soon as there's a GPU lost, preventing it from freezing.
At first I was issuing only :
echo 1 > /proc/sys/kernel/sysrq
and
echo b > /proc/sysrq-trigger
and it do work
then Doftorul  suggested that I go with the SUSB sequence
jr. member
Activity: 128
Merit: 1
June 02, 2018, 05:02:33 AM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



As explained in the wikipedia page you linked, those magic sysrq keys cannot work if kernel panics occurred. I had months ago a faulty riser as well and experienced the "GPUx fallen off the bus error" but it never caused kernel panics, the watchdog detected the miner error state and restarted it (with one gpu less) without rebooting. What this script is intended to do?

Note I also have Intel WDT Driver in use, may be it didn't allow me to experience those reboot hangs you mentioned.
full member
Activity: 686
Merit: 140
Linux FOREVER! Resistance is futile!!!
June 02, 2018, 12:31:10 AM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink



Nice idea
Have you tested it?
I tried to implement R.E.I.S.U.B once but was not successful with a different approach.
Does it need to sync twice?
(u) remount the filesystem as read-only, dont think it can sync data to disk after u


Isnt it better to do the full REISUB sequence?


Edit;
If you have a faulty GPU that causes system freeze can you please test this full REISUB sequence:

Code:
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq

# (un*R*aw) Takes back control of keyboard from X
echo r > /proc/sysrq-trigger

# (t*E*rminate) Send SIGTERM to all processes.
echo e > /proc/sysrq-trigger

# (k*I*ll) Send SIGKILL to all processes.
echo i > /proc/sysrq-trigger

# (*S*nc) Sync all cached disk operations to disk
echo s > /proc/sysrq-trigger

# (*U*mount) Umounts all mounted partitions
echo u > /proc/sysrq-trigger

# (re*B*oot) Reboots the system
echo b > /proc/sysrq-trigger
newbie
Activity: 4
Merit: 0
June 01, 2018, 09:45:15 PM
Hello guys.

First of all, big thanks to all the team for the great job on this project.

Been using your Distro for 3 month now and it rocks guys. lately I've got an issue with one of my rigs, a faulty riser cause the rig to freeze without being able to complete it's reboot routing.
As Doftorul pointed out more often than not having a GPU dropping off the bus triggers a kernel panic.
So I'm sharing my workaround for those who got the same issue and have only a remote access to the rig:
Create a magic reboot script that contain (magicreboot.sh) :
#!/bin/bash
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger


Edit 5watchdog :
Change all the sudo reboot occurrence to sudo magicreboot.sh
Do the same in 6tempcontrol

for more reading about Sysrq: https://en.wikipedia.org/wiki/Magic_SysRq_key
Hope this could help
Big thanks to Doftorul, sizzlephizzle and all the nvOC team  Wink

Pages:
Jump to: