[OpenWrt-Devel] Sysupgrade and Failed to kill all processes

Wed May 13 04:12:49 EDT 2020

> That loop-kill-all thing should be a kind of last resort really, what's
> actually needed is some kind of "init 1" procd equivalent which shuts down
> all
> services in a more or less clean manner.
>
>
Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It
explicitly shuts down (kill -9) telnet, dropbear, and ash before looping
with sigTERM, and then again with sigKILL.

I find it very odd that it's explicitly singling out telnet, dropbear, and
ash. My OpenWRT build doesn't have any of these installed in the first
place. E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of
sending sigTERM first like it should.

I imagine this is the reason why I've had my SSH sessions hang
indefinitely when sysupgrading a board with dropbear.

I'm just not sure offhand how much possible error conditions there are
> besides
> the actual image writing itself, which you cannot recover from if it dies
> midway.
>

I would expect that if the image writing fails, at least one more attempt
should be made before giving up. Rendering the device soft-bricked is very
much not desirable...

No it is not. When the logic was implemented there wasn't any cgroup support
> in OpenWrt. Sysupgrade was introduced in 2007 when we still supported Linux
> 2.4 on some targets. Using the freezer cgroup probably makes sense
> nowadays,
> it will however further bloat the kernel which might hurt various lower end
> targets, flash space wise.
>
> Ok, noted.

I suppose I should point out that I'm not personally interested in the
lower end devices, but I understand where you're coming from there.

Perhaps a way to address this in a reliable way:

1) If cgroups support is detected at runtime (or conditional compilation to
save even more space in the binary), procd, acting as it's role of PID 1
places all services that it creates into their own cgroup. I don't know how
this interacts with procd jails, but perhaps some code from that can be
adapted and reused.
1.a) I would even add that there should be a top-level cgroup that should
contain all service-cgroups as nested cgroups, so that *everything* can be
terminated in one fell swoop.

2) on sysupgrade, just prior to execvp /sbin/upgraded, procd gracefully
shuts down all services that are running.
2.a) If cgroups are available, then after shutting down all services, use
the cgroup freezer to terminate any services cgroups that still have active
processes.
2.b) Use the global cgroup to nuke everything from orbit.

3) /sbin/upgraded handles terminating any remaining processes. This isn't
something that should be practically handled in a shell script. Moving the
logic for this into /sbin/upgraded means that the only safety check is that
it not try to terminate pid1.

4) Now /lib/upgrade/stage2 doesn't need to worry about terminating
processes, and can focus entirely on handling the ramdisk chroot logic.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20200513/e638bb75/attachment.htm>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel