[OpenWrt-Devel] Notes on (dangerous ?) sysupgrade
jo at mein.io
Sun Jan 13 07:31:10 EST 2019
> After having several unpleasant encounters using sysupgrade, I had a
> quick glance at the code, after more or less successfully implementing
> workarounds for incomplete sysupgrades, resulting in inconsistent systems.
> My questions are:
> - Is it safe, simply to kill running processes during sysupgrade ? As
> there might be services, restarted automatically (by procd ?).
Roughly, the sysupgrade process is as follows:
1) /sbin/sysupgrade (shell script)
Parses arguments, sets default, assembles conffiles to backup, runs
partials scripts in /lib/upgrade, checks the image, ends with `ubus call
system sysupgrade`. All fatal exit conditions (such as invalid image)
should be handled here.
2) ubus call system sysupgrade (procd ubus procedure)
Invokes a procedure in procd that instructs procd to terminate itself
and exec into /sbin/upgraded (which has been copied to a ramdisk at
/tmp/root first), turning /tmp/root/sbin/upgraded into pid 1 and
releasing the pid 1 use of /.
3) /tmp/root/sbin/upgraded (binary)
Functions as pid 1 placeholder to prevent the kernel from panicking. It
does two things; keep serving the watchdog to prevent spontaneous resets
and executing /lib/upgrade/stage2
4) /lib/upgrade/stage2 (shell script)
Assemble backup tarball, write image, append backup tarball to just
written image. The exact procedure depends on the platform.
So yes, it is safe to simply kill processes in the sense that there will
be no procd running anymore at this point which would relaunch them.
Merely killing processes instead of shutting them down through their
respective init scripts is not ideal though, that eventually needs rework.
Ideally sysupgrade should try to cleanly stop as many services through
their respective init scripts as possible before invoking stage2, then
only do the 'kill TERM; sleep 3; kill KILL' sequence on processes that
somehow failed to stop initially (buggy init scripts, timeouts, ...).
> - What about a killed process, simply taking some time to shut down ?
> (example: squid closing lot of open files on block-device; having
> internal shutdown timer 30s by default)
Such services are not gracefully handled atm, see above.
> - What about open swap file on block-device ?
From a cursory look, it does not appear that sysupgrade currently
performs any swapoff at all, adding a `swapoff -a` after the process
termination would certainly make sense.
> - What about mounted block-device for mass storage ?
Same as swap, there is no umount handling either as far as I can see. I
think this should be added as well along with the swapoff. Since the
sysupgrade runs off a pivot_root'ed /tmp/root at this point, all fses
should be free to umount. (Might still need two or three cycles due to
> - What about (slow) wwan connection, managed by pppd. When killed by
> sysupgrade, will netifd restart pppd ?
It should not happen. Theoretically it could be that pppd is killed
first while netifd is still running, netifd will then try to restart
pppd shortly before netifd itself will get killed, but the second KILL
loop three seconds later should catch this rare circumstance.
However, as discussed above a graceful service shutdown would be better.
> As a workaround, before calling sysupgrade I
> - explicitly use /etc/init.d/most_services stop
> - explicitly kill squid and wait for termination
> - explicitly disable swap
> - explicitly dismount mounted block-device
> - ifdown wwan
That certainly makes a lot sense and most of this should probably go
into sysupgrade (stage1 aka /sbin/sysupgrade) directly. A slight
difficulty is see is how to identify "most_services" but I guess a
hardcoded whitelist for things like "dropbear", "openssh" or "telnetd"
As for awaiting squid termination - I think if its not already the case,
the squid init script should be reworked so that /etc/init.d/squid stop
does not return (successfully) before squid is actually stopped.
> Before I had several cases, that
> sysupgrade -n -v -f /tmp/newfiles.tar.gz /tmp/new_fw.bin
> updated all files from /tmp/newfiles.tar.gz, but did not do the flash of
This is quite strange as appending the /tmp/newfiles.tar.gz archive will
only happen after /tmp/new_fw.bin has been written. I could only imagine
that the image write procedure itself somehow failed, but appending the
archive still worked.
How exactly this could fail depends on the platform. Can you provide
some more details about the device this issue occurred on?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: OpenPGP digital signature
-------------- next part --------------
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
More information about the openwrt-devel