[OpenWrt-Devel] Sysupgrade and Failed to kill all processes

Jo-Philipp Wich jo at mein.io
Wed May 13 04:57:50 EDT 2020


Hi,

> 
>     That loop-kill-all thing should be a kind of last resort really, what's
>     actually needed is some kind of "init 1" procd equivalent which shuts down all
>     services in a more or less clean manner.
> 
> 
> Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It
> explicitly shuts down (kill -9) telnet, dropbear, and ash before looping with
> sigTERM, and then again with sigKILL.
> 
> I find it very odd that it's explicitly singling out telnet, dropbear, and
> ash. My OpenWRT build doesn't have any of these installed in the first place.
> E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of sending
> sigTERM first like it should.

These are (in the case of telnet, were) the default services offering shell
access in standard images the sysupgrade script was tailored for.

The intention is to kill all user shell sessions to prevent interference with
the subsequent upgrade process. An openssh case simply hasn't been added since
it is uncommon, especially on lower end devices.

The subsequent TERM / KILL loops are a poor mans attempt to cleanly shut down
services. It obviously won't work for things having expensive teardown
procedures (databases, squid proxy, etc.) - those really should be handled
manually by the user before invoking sysupgrade. I mean obviously one can
extend the grace period, but I guess there will always be unhandled cases.

> I imagine this is the reason why I've had my SSH sessions hang
> indefinitely when sysupgrading a board with dropbear.

Hm, maybe. I usually see a "commencing upgrade" message and afterwards my SSH
connection is cleanly terminated.

>     I'm just not sure offhand how much possible error conditions there are besides
>     the actual image writing itself, which you cannot recover from if it dies
>     midway.
> 
> I would expect that if the image writing fails, at least one more attempt
> should be made before giving up. Rendering the device soft-bricked is very
> much not desirable...

Uhm, yeah sure, we could try writing the image again I guess. But eventually
you have to give up if the storage device simply cannot be written cleanly.

> [...]
> Perhaps a way to address this in a reliable way:
> 
> [...]
These points make sense, yes.

> 4) Now /lib/upgrade/stage2 doesn't need to worry about terminating processes,
> and can focus entirely on handling the ramdisk chroot logic.

Stuff like umounting external disks, fsync / swapoff etc. come to mind as well
which should be doable at this point.


~ Jo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20200513/9939048f/attachment.sig>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list