Author

Topic: miner process won't exit & rig won't reboot (Read 129 times)

hero member
Activity: 630
Merit: 502
jr. member
Activity: 95
Merit: 2
January 06, 2018, 02:43:51 AM
#7
Try
Code:
$ sudo systemctl reboot --force

Thanks! I'll keep this in mind for next time it happens (hopefully it doesn't!).

Edit: ok, you made me nervous...

Quote
-f, --force
    When used with halt, poweroff, reboot or kexec, execute the selected operation without shutting down all units. However, all processes will be killed forcibly and all file systems are unmounted or remounted read-only. This is hence a drastic but relatively safe option to request an immediate reboot. If --force is specified twice for these operations (with the exception of kexec), they will be executed immediately, without terminating any processes or unmounting any file systems. Warning: specifying --force twice with any of these operations might result in data loss. Note that when --force is specified twice the selected operation is executed by systemctl itself, and the system manager is not contacted. This means the command should succeed even when the system manager has crashed.
hero member
Activity: 630
Merit: 502
January 06, 2018, 02:41:02 AM
#6
Try
Code:
$ sudo systemctl reboot --force
jr. member
Activity: 95
Merit: 2
January 06, 2018, 02:37:57 AM
#5
Ctrl-C, kill -9 and server reboot fails means that there is an internal nvidia kernel error. Miner process gets stuck and the system can’t properly unload the nvidia driver during the system shutdown process... All the system gets locked, never complete the reboot process that’s why ssh never come back.

So... for the miner freeze issue, I suggest:
First of all try to execute a ”dmsg” command to get more info about your error (is always the same GPU ? PCI port ? Same kind of error ? ecc ...) then try to :
  • change your PCI Risers (bad-quality ones can really become a pain in the ass)
  • update your os kernel
  • update your nvidia / CUDA drivers
  • desable your overclocking setting and/or underclock your GPU
By experience, 99% of the rigs problems are generated by bad PCI Riser so try to always follow this hypothesis first.

Other tips: when your system is stuck like this, you can reboot without using physical power button using “reboot -q”

Hope this will help you Wink

Thanks for the comprehensive response!

I did reboot the computer from the command line already. That's what I was saying - it fails to reboot, I need to physically power it down and start it up again.

Made some changes to the physical layout of the sytem, hopefully that helps with riser connections.
newbie
Activity: 13
Merit: 0
January 05, 2018, 07:37:49 PM
#4
On Ubuntu Linux 16.04, 6x GTX 1070s, doesn't matter what miner I'm using. Related to overclocking probably.

  • can't Ctrl-C to quit process
  • can't kill -9 the process
  • can't successfully reboot the server (ssh disconnects, but I can't reconnect afterwards)

The only thing that will let me back in is physically powering down the system by holding the power button down for a few seconds (I have no reset button) and starting the system back up. Then I can ssh back into it.

Can anyone help me with what's going on here? How to address it? It happens randomly. Is there anything I can do to prevent this? I'm sure lowering my overclock setting would help, but normally I'd just restart the miner to continue mining.

Ctrl-C, kill -9 and server reboot fails means that there is an internal nvidia kernel error. Miner process gets stuck and the system can’t properly unload the nvidia driver during the system shutdown process... All the system gets locked, never complete the reboot process that’s why ssh never come back.

So... for the miner freeze issue, I suggest:
First of all try to execute a ”dmsg” command to get more info about your error (is always the same GPU ? PCI port ? Same kind of error ? ecc ...) then try to :
  • change your PCI Risers (bad-quality ones can really become a pain in the ass)
  • update your os kernel
  • update your nvidia / CUDA drivers
  • desable your overclocking setting and/or underclock your GPU
By experience, 99% of the rigs problems are generated by bad PCI Riser so try to always follow this hypothesis first.

Other tips: when your system is stuck like this, you can reboot without using physical power button using “reboot -q”

Hope this will help you Wink
sr. member
Activity: 861
Merit: 281
January 05, 2018, 01:05:06 PM
#3
Well, tbh I don't know where the problem may reside for this issue but as you are using Linux as your mining OS.
I'll suggest you give nvOC a try and see if you can produce the same kind of issue using that OS.
jr. member
Activity: 95
Merit: 2
January 05, 2018, 11:08:47 AM
#2
bump
jr. member
Activity: 95
Merit: 2
January 05, 2018, 01:00:38 AM
#1
On Ubuntu Linux 16.04, 6x GTX 1070s, doesn't matter what miner I'm using. Related to overclocking probably.

  • can't Ctrl-C to quit process
  • can't kill -9 the process
  • can't successfully reboot the server (ssh disconnects, but I can't reconnect afterwards)

The only thing that will let me back in is physically powering down the system by holding the power button down for a few seconds (I have no reset button) and starting the system back up. Then I can ssh back into it.

Can anyone help me with what's going on here? How to address it? It happens randomly. Is there anything I can do to prevent this? I'm sure lowering my overclock setting would help, but normally I'd just restart the miner to continue mining.
Jump to: