sysupgrade broken on imx nand targets (and maybe others too)

Koen Vandeputte koen.vandeputte at citymesh.com
Tue Jan 24 07:26:11 PST 2023


On Tue, Jan 24, 2023 at 7:59 AM Lanchon <lanchon at gmail.com> wrote:
>
> hi Koen, thanks again.
>
> i copied your log here for ease of reference:
> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db
>
>
> first let me say:
>
> - ubinized sysupgrade is not used by any of my devices.
>
> - ubinized sysupgrade happens when when an ubi partition image is fed as
> an upgrade file. the image contains the complete set of ubi volumes that
> are normally stored within the ubi partition on your device: typically
> kernel (raw image), R/O rootfs (sqfs), and R/W overlay (ubifs). during
> said sysupgrade, the current configuration is first copied to RAM, then
> the ubi partition image is written, and finally -if current config is
> kept- the RAM contents are written back to the new overlay volume.
>
> - ubinized sysupgrades were known to be broken by other people at the
> time of my commits, and they wanted to remove the feature altogether
> because it was unused (look it up in the relevant pull request for my
> commits on github). as i remember it, it was broken because some ubi
> volumes within the ubi partition were sometimes mounted or R/O block
> devices were created on top of them (/dev/ubiblock*), locking the ubi
> partition and preventing the upgrade.
>
> - although my devices would normally not use such upgrades, i could
> still take a whole ubi partition backup and then test ubinized
> sysupgrade with it on my devices. in fact, if you restore the ubi
> partition image without conserving the current configuration, then this
> procedure is the best way to do a backup/restore of the complete state
> of the router: kernel, rootfs, and overlay are completely saved and
> restored. btw, i think this should be documented. (and this is mostly
> the reason why i added gzip support on sysupgrade: ubinized images of
> backed-up ubi partitions compress like crazy.)
>
> - my tests of ubinized sysupgrade worked after these changes but not
> before. specifically the fix is in: af34733593 base-files: fix ubinized
> nand sysupgrade
>
>
> regarding your log:
>
> - nand_do_platform_check succeeds:
> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L438-L469
> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L192
>
> - next comes the actual nand_do_flash_file:
> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L379-L405
> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2061
>
> - it is determined that passed file is a ubi partition image, so
> nand_upgrade_ubinized is invoked:
> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L260-L269
> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2088
>
> nand_upgrade_ubinized is basically a one-liner:
> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && ubiattach
> -m "$mtdnum"
>
> cat/zcat the image, feeding that to ubiformat -f - which writes it.
>
>
> and the write does take place, but take a look:
>
> -------------------
>
> + cat /tmp/nandnew.ubi
> + ubiformat /dev/mtd2 -y -f -
> ubiformat: mtd2 (nand), size 250609664 bytes (239.0 MiB), 1912
> eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
>
> libscan: scanning eraseblock 0 --  0 % complete
> libscan: scanning eraseblock 1 --  0 % complete
> libscan: scanning eraseblock 2 --  0 % complete
>     ...
> libscan: scanning eraseblock 1868 -- 97 % complete
>
> libscan: scanning eraseblock
> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
> 1869 -- 97 % complete
>
> libscan:
> [  207.883339] usb usb2: USB disconnect, device number 1
> scanning eraseblock 1870 -- 97 %
> [  207.891238] usb 2-1: USB disconnect, device number 2
> complete
>
> libscan: scanning eras
> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
> eblock 1871 -- 97 % complete
>
> libscan: scanning eraseblock 1872
> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
>   -- 97 % complete
>
> libscan: scan
> [  207.917055] usb usb1: USB disconnect, device number 1
> ning eraseblock 1873 -- 98 % comp
> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
> lete
>
> libscan: scanning eraseblo
> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
> ck 1874 -- 98 % complete
>
> libsca
> [  207.942382] reboot: Restarting system
>
> -----------------------
>
>
> while sysupgrade is flashing UBI the partition, the system is rebooted;
> i don't know why.
>
> here are the cleaned-up kernel messages:
>
> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
> [  207.883339] usb usb2: USB disconnect, device number 1
> [  207.891238] usb 2-1: USB disconnect, device number 2
> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
> [  207.917055] usb usb1: USB disconnect, device number 1
> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
> [  207.942382] reboot: Restarting system
>
>
> next the reboot takes place. u-boot mounts what it knows as mtd3 as an
> ubi partition:
> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2183
>
> but openwrt used mtd2 to write to ubi, not mtd3. i don't know if in your
> device what openwrt calls mtd2 is called mtd3 by your current (default)
> u-boot config.
>
>
> so some takeaways...
>
> - something is rebooting the system while i write the sysupgrade image.
> maybe another thread? maybe an unattended watchdog?
>
> - the reboot happens while 98% of the image is written. maybe this issue
> is time dependent and only shows up with my code because it is a little
> slower than the previous code. maybe the previous code reached 100%
> before being rebooted and thus the upgrade went through.
>
> - the image *IS* written, albeit partially, which means that the
> previous image that was there is definitely erased. what is booted then?
> IDK, it depends on the details of your device. maybe it is booting a
> recovery image? or maybe the ubi partition format is not finished, but
> the ubi volumes within are fully written before the reboot, so the
> system doesn't brick by chance. (but then the newer image would be
> booted, but you say that the sysupgrade has no effect and the prior
> image is booted instead, so that can't be it.) maybe two ubi partitions
> are used on your device to implement an A/B dual system boot. so maybe a
> flag needs to be toggled to switch between A and B after the image is
> written, but since that code is never executed, the previous system is
> booted instead.
>
> - several issues cropped up with a set of sysupgrade changes i did
> (among them, these you mention here). there are many device types and
> several sysupgrade mechanisms with their own files, and then some common
> files. i assumed other types of upgrades would invoke common routines
> but not -for instance- nand flash routines. i was wrong: the codebase is
> spaghetti calling any file form any sysupgrade method, and this caused a
> couple of issues with my nand sysupgrade changes. i don't think this is
> one of those instances though. i think that the sysupgrade code is doing
> the right thing and the fault is elsewhere, but i may be wrong. without
> knowledge of your device and without a device to test, it is hard to tell.
>
>
> maybe we should could call the attention of the device maintainer to
> this thread now?
>
>
> thanks!
>
>

Hi,

Many thanks for the very detailed reply.
It aided hugely in debugging this further.

I'm happy to share that I found the culprit and it works nicely again now.

within nand.sh:


nand_upgrade_ubinized() {
        local ubi_file="$1"
        local gz="$2"

        nand_detach_ubi "$CI_UBIPART" || return 1

        local mtdnum="$( find_mtd_index "$CI_UBIPART" )"
        ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - &&
ubiattach -m "$mtdnum"
}

I changed the last line to:

        ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach -m "$mtdnum"


Although ubiformat indeed states "-f -" to use stdinput, it does not
seem to work.
Writing the image seemed to be simply skipped using that method.

actually presenting the absolute filepath to "-f" fixed it.

Any thoughts?

Thanks again for your prompt reply,

Koen




>
> On 1/23/23 12:37, Koen Vandeputte wrote:
> > Hi Rodrigo,
> >
> > After a long absence and now testing the latest master, I noticed that
> > imx nand flash sysupgrade was broken:
> >
> > expected behaviour:
> > - scan the nand
> > - write image
> > - format empty space
> > -reboot
> >
> > Seen behaviour:
> > - Scan the nand
> > - reboots
> >
> > I traced it back to this batch of commits by you:
> >
> > 9d1e687da3 base-files: verify nand sysupgrade images
> > 9710712120 base-files: accept gzipped nand sysupgrade images
> > af34733593 base-files: fix ubinized nand sysupgrade
> > e25e6d8e54 base-files: fix and clean up nand sysupgrade code
> >
> >
> > It can be easily confirmed by reverting /lib/upgrade/nand.sh with a
> > version before these commits are applied to it.
> >
> > I added a "set -x" to nand.sh to get more detailed logs:
> > https://pastebin.com/raw/yxY0SK1x
> >
> > Any idea?
> >
> > Thanks,
> >
> > Koen
>



More information about the openwrt-devel mailing list