sysupgrade broken on imx nand targets (and maybe others too)

Koen Vandeputte koen.vandeputte at citymesh.com
Tue Jan 24 13:08:52 PST 2023


On Tue, Jan 24, 2023 at 9:46 PM Lanchon <lanchon at gmail.com> wrote:
>
>
>
> On 1/24/23 17:35, Lanchon wrote:
> >
> >
> > On 1/24/23 13:25, Koen Vandeputte wrote:
> >> On Tue, Jan 24, 2023 at 4:26 PM Koen Vandeputte
> >> <koen.vandeputte at citymesh.com> wrote:
> >>> On Tue, Jan 24, 2023 at 7:59 AM Lanchon <lanchon at gmail.com> wrote:
> >>>> hi Koen, thanks again.
> >>>>
> >>>> i copied your log here for ease of reference:
> >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db
> >>>>
> >>>>
> >>>> first let me say:
> >>>>
> >>>> - ubinized sysupgrade is not used by any of my devices.
> >>>>
> >>>> - ubinized sysupgrade happens when when an ubi partition image is
> >>>> fed as
> >>>> an upgrade file. the image contains the complete set of ubi volumes
> >>>> that
> >>>> are normally stored within the ubi partition on your device: typically
> >>>> kernel (raw image), R/O rootfs (sqfs), and R/W overlay (ubifs). during
> >>>> said sysupgrade, the current configuration is first copied to RAM,
> >>>> then
> >>>> the ubi partition image is written, and finally -if current config is
> >>>> kept- the RAM contents are written back to the new overlay volume.
> >>>>
> >>>> - ubinized sysupgrades were known to be broken by other people at the
> >>>> time of my commits, and they wanted to remove the feature altogether
> >>>> because it was unused (look it up in the relevant pull request for my
> >>>> commits on github). as i remember it, it was broken because some ubi
> >>>> volumes within the ubi partition were sometimes mounted or R/O block
> >>>> devices were created on top of them (/dev/ubiblock*), locking the ubi
> >>>> partition and preventing the upgrade.
> >>>>
> >>>> - although my devices would normally not use such upgrades, i could
> >>>> still take a whole ubi partition backup and then test ubinized
> >>>> sysupgrade with it on my devices. in fact, if you restore the ubi
> >>>> partition image without conserving the current configuration, then
> >>>> this
> >>>> procedure is the best way to do a backup/restore of the complete state
> >>>> of the router: kernel, rootfs, and overlay are completely saved and
> >>>> restored. btw, i think this should be documented. (and this is mostly
> >>>> the reason why i added gzip support on sysupgrade: ubinized images of
> >>>> backed-up ubi partitions compress like crazy.)
> >>>>
> >>>> - my tests of ubinized sysupgrade worked after these changes but not
> >>>> before. specifically the fix is in: af34733593 base-files: fix
> >>>> ubinized
> >>>> nand sysupgrade
> >>>>
> >>>>
> >>>> regarding your log:
> >>>>
> >>>> - nand_do_platform_check succeeds:
> >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L438-L469
> >>>>
> >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L192
> >>>>
> >>>>
> >>>> - next comes the actual nand_do_flash_file:
> >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L379-L405
> >>>>
> >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2061
> >>>>
> >>>>
> >>>> - it is determined that passed file is a ubi partition image, so
> >>>> nand_upgrade_ubinized is invoked:
> >>>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L260-L269
> >>>>
> >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2088
> >>>>
> >>>>
> >>>> nand_upgrade_ubinized is basically a one-liner:
> >>>> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - &&
> >>>> ubiattach
> >>>> -m "$mtdnum"
> >>>>
> >>>> cat/zcat the image, feeding that to ubiformat -f - which writes it.
> >>>>
> >>>>
> >>>> and the write does take place, but take a look:
> >>>>
> >>>> -------------------
> >>>>
> >>>> + cat /tmp/nandnew.ubi
> >>>> + ubiformat /dev/mtd2 -y -f -
> >>>> ubiformat: mtd2 (nand), size 250609664 bytes (239.0 MiB), 1912
> >>>> eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
> >>>>
> >>>> libscan: scanning eraseblock 0 --  0 % complete
> >>>> libscan: scanning eraseblock 1 --  0 % complete
> >>>> libscan: scanning eraseblock 2 --  0 % complete
> >>>>      ...
> >>>> libscan: scanning eraseblock 1868 -- 97 % complete
> >>>>
> >>>> libscan: scanning eraseblock
> >>>> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
> >>>> 1869 -- 97 % complete
> >>>>
> >>>> libscan:
> >>>> [  207.883339] usb usb2: USB disconnect, device number 1
> >>>> scanning eraseblock 1870 -- 97 %
> >>>> [  207.891238] usb 2-1: USB disconnect, device number 2
> >>>> complete
> >>>>
> >>>> libscan: scanning eras
> >>>> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
> >>>> eblock 1871 -- 97 % complete
> >>>>
> >>>> libscan: scanning eraseblock 1872
> >>>> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
> >>>>    -- 97 % complete
> >>>>
> >>>> libscan: scan
> >>>> [  207.917055] usb usb1: USB disconnect, device number 1
> >>>> ning eraseblock 1873 -- 98 % comp
> >>>> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
> >>>> lete
> >>>>
> >>>> libscan: scanning eraseblo
> >>>> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect
> >>>> reboot!
> >>>> ck 1874 -- 98 % complete
> >>>>
> >>>> libsca
> >>>> [  207.942382] reboot: Restarting system
> >>>>
> >>>> -----------------------
> >>>>
> >>>>
> >>>> while sysupgrade is flashing UBI the partition, the system is
> >>>> rebooted;
> >>>> i don't know why.
> >>>>
> >>>> here are the cleaned-up kernel messages:
> >>>>
> >>>> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
> >>>> [  207.883339] usb usb2: USB disconnect, device number 1
> >>>> [  207.891238] usb 2-1: USB disconnect, device number 2
> >>>> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
> >>>> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
> >>>> [  207.917055] usb usb1: USB disconnect, device number 1
> >>>> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
> >>>> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect
> >>>> reboot!
> >>>> [  207.942382] reboot: Restarting system
> >>>>
> >>>>
> >>>> next the reboot takes place. u-boot mounts what it knows as mtd3 as an
> >>>> ubi partition:
> >>>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2183
> >>>>
> >>>>
> >>>> but openwrt used mtd2 to write to ubi, not mtd3. i don't know if in
> >>>> your
> >>>> device what openwrt calls mtd2 is called mtd3 by your current
> >>>> (default)
> >>>> u-boot config.
> >>>>
> >>>>
> >>>> so some takeaways...
> >>>>
> >>>> - something is rebooting the system while i write the sysupgrade
> >>>> image.
> >>>> maybe another thread? maybe an unattended watchdog?
> >>>>
> >>>> - the reboot happens while 98% of the image is written. maybe this
> >>>> issue
> >>>> is time dependent and only shows up with my code because it is a
> >>>> little
> >>>> slower than the previous code. maybe the previous code reached 100%
> >>>> before being rebooted and thus the upgrade went through.
> >>>>
> >>>> - the image *IS* written, albeit partially, which means that the
> >>>> previous image that was there is definitely erased. what is booted
> >>>> then?
> >>>> IDK, it depends on the details of your device. maybe it is booting a
> >>>> recovery image? or maybe the ubi partition format is not finished, but
> >>>> the ubi volumes within are fully written before the reboot, so the
> >>>> system doesn't brick by chance. (but then the newer image would be
> >>>> booted, but you say that the sysupgrade has no effect and the prior
> >>>> image is booted instead, so that can't be it.) maybe two ubi
> >>>> partitions
> >>>> are used on your device to implement an A/B dual system boot. so
> >>>> maybe a
> >>>> flag needs to be toggled to switch between A and B after the image is
> >>>> written, but since that code is never executed, the previous system is
> >>>> booted instead.
> >>>>
> >>>> - several issues cropped up with a set of sysupgrade changes i did
> >>>> (among them, these you mention here). there are many device types and
> >>>> several sysupgrade mechanisms with their own files, and then some
> >>>> common
> >>>> files. i assumed other types of upgrades would invoke common routines
> >>>> but not -for instance- nand flash routines. i was wrong: the
> >>>> codebase is
> >>>> spaghetti calling any file form any sysupgrade method, and this
> >>>> caused a
> >>>> couple of issues with my nand sysupgrade changes. i don't think
> >>>> this is
> >>>> one of those instances though. i think that the sysupgrade code is
> >>>> doing
> >>>> the right thing and the fault is elsewhere, but i may be wrong.
> >>>> without
> >>>> knowledge of your device and without a device to test, it is hard
> >>>> to tell.
> >>>>
> >>>>
> >>>> maybe we should could call the attention of the device maintainer to
> >>>> this thread now?
> >>>>
> >>>>
> >>>> thanks!
> >>>>
> >>>>
> >>> Hi,
> >>>
> >>> Many thanks for the very detailed reply.
> >>> It aided hugely in debugging this further.
> >>>
> >>> I'm happy to share that I found the culprit and it works nicely
> >>> again now.
> >>>
> >>> within nand.sh:
> >>>
> >>>
> >>> nand_upgrade_ubinized() {
> >>>          local ubi_file="$1"
> >>>          local gz="$2"
> >>>
> >>>          nand_detach_ubi "$CI_UBIPART" || return 1
> >>>
> >>>          local mtdnum="$( find_mtd_index "$CI_UBIPART" )"
> >>>          ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - &&
> >>> ubiattach -m "$mtdnum"
> >>> }
> >>>
> >>> I changed the last line to:
> >>>
> >>>          ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach
> >>> -m "$mtdnum"
> >>>
> >>>
> >>> Although ubiformat indeed states "-f -" to use stdinput, it does not
> >>> seem to work.
> >>> Writing the image seemed to be simply skipped using that method.
> >>>
> >>> actually presenting the absolute filepath to "-f" fixed it.
> >>>
> >>> Any thoughts?
> >>>
> >>> Thanks again for your prompt reply,
> >>>
> >>> Koen
> >>>
> >>>
> >
> > hi,
> >
> > my answers inline...
> >
> >> Summarized after more testing:
> >>
> >> --> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - &&
> >> ubiattach -m "$mtdnum"
> >> Does not work.
> > it works fine on other openwrt platforms. also, the commands cat and
> > ubiformat are mature and stable and they work fine on your platform
> > too. i am repeating myself, but according to your posted log,
> > something is rebooting the system before the command ends.
> >
> > from your posted log:
> >
> > -----
> > libscan: scan[  207.917055] usb usb1: USB disconnect, device number 1
> > ning eraseblock 1873 -- 98 % comp[  207.925651] ci_hdrc ci_hdrc.0: USB
> > bus 1 deregistered
> > lete
> >
> > libscan: scanning eraseblo[  207.934010] imx2-wdt 20bc000.watchdog:
> > Device shutdown: Expect reboot!
> > ck 1874 -- 98 % complete
> >
> > // check this out
> > //    |
> > //    v
> > libsca[  207.942382] reboot: Restarting system
> > \FE
> > U-Boot SPL 2020.04-dirty (May 20 2020 - 12:35:32 +0200)
> > Booting from NAND
> > -----
> >
> > as you can see, the process is interrupted in the middle of printing a
> > status line beginning with "libsca". there is no way that ubiformat
> > will fail without a diagnostic message, much less fail in the middle
> > of printing a diagnostics message.
> >
> > this is a platform issue and you will not fix it by changing this code.
> >
> >> Looks like ubiformat gets empty/invalid input from stdin, writing the
> >> image seems to be skipped completely.
> >> The board simply boots again in it's original state without any changes
> > possibly an A/B dual boot system, possibly with dual ubi partitions.
> > IDK, but the platform maintainer knows.
> >
> >> I confirmed that this command expands to "cat" and not "zcat"
> > yes, this is visible in your log.
> >
> >>
> >>
> >> --> ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach -m
> >> "$mtdnum"
> >> Works fine
> >> image is written as expected and boots properly using the newly
> >> written image
> > possibly because it is a few percent faster than piping.
> >
> > IMHO, this is an extremely brittle workaround, not a fix. as soon as
> > your flash becomes slower to erase or write or the openwrt image grows
> > a bit, the problem will reappear.
> >
> >>
> >>
> >> --> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y && ubiattach
> >> -m "$mtdnum"
> >> Omitting "-f -" here
> >> 'cat' throws a "Broken Pipe" error during the flash process.
> > this is expected. without "-f -" ubiformat will not open stdin and
> > then cat has nowhere to pipe to.
> >
> >> The data is written, but the image is corrupted and does not boot
> >> afterwards.
> > no. without -f the ubi partition is formatted but nothing is written
> > onto it.
> >> Is this error/corruption not present on other targets ??
> >>
> >>
> >> So unless anyone has other ideas .. it looks for 'ubinized' the file
> >> needs to be provided as-is, and not using stdin.
> >> This drops support for gzip'd files .. but it does not seem to be
> >> gzipped anyway,  at least on my target.
> > the other idea is that the platform maintainer needs to be involved to
> > track down what is rebooting the device in the middle of an update.
> >
> > i suppose this problem is platform-specific. in that case it might be
> > a watchdog on your platform. my suspicion: the normal userland is
> > brought down before sysupgrade (as rootfs is not available) and thus
> > userland tickle of a watchdog ceases to happen. the watchdog expires
> > and reboots the system mid upgrade. the correct fix is disabling the
> > watchdog before sysupgrade. alternatively, the userland tickle could
> > be kept working during sysupgrade.
> >
> > in particular, note this log line just 8 milliseconds before the reboot:
> > [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
> > [  207.942382] reboot: Restarting system
> >
> > i don't know nor have the platform, so please contact the maintainer.
> >
> >>
> >> Regards,
> >>
> >> Koen
> >
> >
> > thanks!
>
>
> could you collect more logs during sysupgrade? let's see if the reboot
> happens always approximately at the same time.
>
> also, you could you collect a full bootlog?
>
> thanks!

Hi,

I think our previous mails overlapped a bit as  I didn't notice your
previous response :)

I'll send the data tomorrow.
I'm also wondering if adding a sleep before ubiformat in the old way
would influence/break it's behaviour?


Piotr,
Would you have any idea here?


Thanks again for your efforts,

Koen



More information about the openwrt-devel mailing list