[OpenWrt-Devel] Sysupgrade and Failed to kill all processes

Wed May 13 03:47:27 EDT 2020

Hi Michael,

> [...]
> 
> Now that the very rough summary is out of the way, I have 4 questions.
> 
> 1) I notice that the shell script /lib/upgrade/stage2 is doing a tight loop
> with kill -9 to terminate processes. However, it's only looping a maximum of
> 10 times, and its going as fast as the shell can loop. 
> 
> What's to stop this loop from quickly going through every process almost
> immediately 10 times, before a process that would be about to terminate
> terminates? The process in question may be handling some kind of IO, so the
> kernel wouldn't immediately terminate it.
> 
> Shouldn't there be some very brief sleep at the end of each loop iteration to
> ensure that the processes that are going to practically terminate have done so?

Yes, this likely makes sense. That killing logic was only ported forward from
each sysupgrade iteration to the next, without ever being revised.

That loop-kill-all thing should be a kind of last resort really, what's
actually needed is some kind of "init 1" procd equivalent which shuts down all
services in a more or less clean manner.

> 2) Why is the behavior on failure to terminate processes to just give up? That
> leaves devices hanging without any network connectivity. 
> A reboot with some logging on disk would allow for remote sysupgrades to have
> some kind of recoverability.

I do not know about a particular reason. Iirc I added some sysrq triggers in
the past because it sometimes failed to reboot, but it likely makes sense to
trap errors as well and handle them somehow.

I'm just not sure offhand how much possible error conditions there are besides
the actual image writing itself, which you cannot recover from if it dies midway.

> 3) Is looping over sigkill a reliable way to terminate all processes?
> I was under the impression that the only reliable way to ensure all processes
> terminate is to use cgroups, and put the processes to terminate in the freezer
> group and then kill them off after they've been frozen. Otherwise you have
> basically a race condition between the termination of processes and the
> creation of children. E.g. a fork-bomb could prevent all processes from being
> terminated.

No it is not. When the logic was implemented there wasn't any cgroup support
in OpenWrt. Sysupgrade was introduced in 2007 when we still supported Linux
2.4 on some targets. Using the freezer cgroup probably makes sense nowadays,
it will however further bloat the kernel which might hurt various lower end
targets, flash space wise.

> 4) Why doesn't procd, prior to execvp the /sbin/upgraded program, shutdown all
> the services that are running? 

No technical reason, just nobody bothered to implement something like that, yet.

~ Jo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20200513/d5f9d2cf/attachment.sig>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel