[OpenWrt-Devel] Sysupgrade and Failed to kill all processes

Wed May 13 13:29:18 EDT 2020

On Wed, May 13, 2020 at 3:58 AM Jo-Philipp Wich <jo at mein.io> wrote:

> Hi,
>
> >
> >     That loop-kill-all thing should be a kind of last resort really,
> what's
> >     actually needed is some kind of "init 1" procd equivalent which
> shuts down all
> >     services in a more or less clean manner.
> >
> >
> > Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It
> > explicitly shuts down (kill -9) telnet, dropbear, and ash before looping
> with
> > sigTERM, and then again with sigKILL.
> >
> > I find it very odd that it's explicitly singling out telnet, dropbear,
> and
> > ash. My OpenWRT build doesn't have any of these installed in the first
> place.
> > E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of
> sending
> > sigTERM first like it should.
>
> These are (in the case of telnet, were) the default services offering shell
> access in standard images the sysupgrade script was tailored for.
>
> The intention is to kill all user shell sessions to prevent interference
> with
> the subsequent upgrade process. An openssh case simply hasn't been added
> since
> it is uncommon, especially on lower end devices.
>
> The subsequent TERM / KILL loops are a poor mans attempt to cleanly shut
> down
> services. It obviously won't work for things having expensive teardown
> procedures (databases, squid proxy, etc.) - those really should be handled
> manually by the user before invoking sysupgrade. I mean obviously one can
> extend the grace period, but I guess there will always be unhandled cases.
>
>
I merely meant that i thought it odd that instead of using sigTERM on the
user-interactable processes, we jump straight to sigKILL.

I don't really see why singling out the user interactable processes does
any good, if they'd be sigTERM and then sigKILL-ed like everything else.

> Uhm, yeah sure, we could try writing the image again I guess. But
> eventually
> you have to give up if the storage device simply cannot be written cleanly.
>
>
Of course. Eventually we know it won't succeed, but a flaky storage doesn't
necessarily mean a second attempt won't succeed. Or an attempt to write the
data in smaller pieces.

My concern is that one error and giving up will lead to more soft-bricks
than two errors and giving up.

We could bikeshed on this forever though. I merely meant that one retry
isn't unreasonable. 50 probably is.

> Stuff like umounting external disks, fsync / swapoff etc. come to mind as
> well
> which should be doable at this point.
>
>
>
Right, that's also feasible.

In fact I don't see any code at all for unmounting existing filesystem
mounts. Thanks for pointing that out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20200513/a87eab5b/attachment.htm>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel