sysupgrade broken on imx nand targets (and maybe others too)

Lanchon lanchon at gmail.com
Mon Jan 23 22:59:48 PST 2023


hi Koen, thanks again.

i copied your log here for ease of reference:
https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db


first let me say:

- ubinized sysupgrade is not used by any of my devices.

- ubinized sysupgrade happens when when an ubi partition image is fed as 
an upgrade file. the image contains the complete set of ubi volumes that 
are normally stored within the ubi partition on your device: typically 
kernel (raw image), R/O rootfs (sqfs), and R/W overlay (ubifs). during 
said sysupgrade, the current configuration is first copied to RAM, then 
the ubi partition image is written, and finally -if current config is 
kept- the RAM contents are written back to the new overlay volume.

- ubinized sysupgrades were known to be broken by other people at the 
time of my commits, and they wanted to remove the feature altogether 
because it was unused (look it up in the relevant pull request for my 
commits on github). as i remember it, it was broken because some ubi 
volumes within the ubi partition were sometimes mounted or R/O block 
devices were created on top of them (/dev/ubiblock*), locking the ubi 
partition and preventing the upgrade.

- although my devices would normally not use such upgrades, i could 
still take a whole ubi partition backup and then test ubinized 
sysupgrade with it on my devices. in fact, if you restore the ubi 
partition image without conserving the current configuration, then this 
procedure is the best way to do a backup/restore of the complete state 
of the router: kernel, rootfs, and overlay are completely saved and 
restored. btw, i think this should be documented. (and this is mostly 
the reason why i added gzip support on sysupgrade: ubinized images of 
backed-up ubi partitions compress like crazy.)

- my tests of ubinized sysupgrade worked after these changes but not 
before. specifically the fix is in: af34733593 base-files: fix ubinized 
nand sysupgrade


regarding your log:

- nand_do_platform_check succeeds:
https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L438-L469
https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L192

- next comes the actual nand_do_flash_file:
https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L379-L405
https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2061

- it is determined that passed file is a ubi partition image, so 
nand_upgrade_ubinized is invoked:
https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L260-L269
https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2088

nand_upgrade_ubinized is basically a one-liner:
${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && ubiattach 
-m "$mtdnum"

cat/zcat the image, feeding that to ubiformat -f - which writes it.


and the write does take place, but take a look:

-------------------

+ cat /tmp/nandnew.ubi
+ ubiformat /dev/mtd2 -y -f -
ubiformat: mtd2 (nand), size 250609664 bytes (239.0 MiB), 1912 
eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes

libscan: scanning eraseblock 0 --  0 % complete
libscan: scanning eraseblock 1 --  0 % complete
libscan: scanning eraseblock 2 --  0 % complete
    ...
libscan: scanning eraseblock 1868 -- 97 % complete

libscan: scanning eraseblock
[  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
1869 -- 97 % complete

libscan:
[  207.883339] usb usb2: USB disconnect, device number 1
scanning eraseblock 1870 -- 97 %
[  207.891238] usb 2-1: USB disconnect, device number 2
complete

libscan: scanning eras
[  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
eblock 1871 -- 97 % complete

libscan: scanning eraseblock 1872
[  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
  -- 97 % complete

libscan: scan
[  207.917055] usb usb1: USB disconnect, device number 1
ning eraseblock 1873 -- 98 % comp
[  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
lete

libscan: scanning eraseblo
[  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
ck 1874 -- 98 % complete

libsca
[  207.942382] reboot: Restarting system

-----------------------


while sysupgrade is flashing UBI the partition, the system is rebooted; 
i don't know why.

here are the cleaned-up kernel messages:

[  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
[  207.883339] usb usb2: USB disconnect, device number 1
[  207.891238] usb 2-1: USB disconnect, device number 2
[  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
[  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
[  207.917055] usb usb1: USB disconnect, device number 1
[  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
[  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
[  207.942382] reboot: Restarting system


next the reboot takes place. u-boot mounts what it knows as mtd3 as an 
ubi partition:
https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2183

but openwrt used mtd2 to write to ubi, not mtd3. i don't know if in your 
device what openwrt calls mtd2 is called mtd3 by your current (default) 
u-boot config.


so some takeaways...

- something is rebooting the system while i write the sysupgrade image. 
maybe another thread? maybe an unattended watchdog?

- the reboot happens while 98% of the image is written. maybe this issue 
is time dependent and only shows up with my code because it is a little 
slower than the previous code. maybe the previous code reached 100% 
before being rebooted and thus the upgrade went through.

- the image *IS* written, albeit partially, which means that the 
previous image that was there is definitely erased. what is booted then? 
IDK, it depends on the details of your device. maybe it is booting a 
recovery image? or maybe the ubi partition format is not finished, but 
the ubi volumes within are fully written before the reboot, so the 
system doesn't brick by chance. (but then the newer image would be 
booted, but you say that the sysupgrade has no effect and the prior 
image is booted instead, so that can't be it.) maybe two ubi partitions 
are used on your device to implement an A/B dual system boot. so maybe a 
flag needs to be toggled to switch between A and B after the image is 
written, but since that code is never executed, the previous system is 
booted instead.

- several issues cropped up with a set of sysupgrade changes i did 
(among them, these you mention here). there are many device types and 
several sysupgrade mechanisms with their own files, and then some common 
files. i assumed other types of upgrades would invoke common routines 
but not -for instance- nand flash routines. i was wrong: the codebase is 
spaghetti calling any file form any sysupgrade method, and this caused a 
couple of issues with my nand sysupgrade changes. i don't think this is 
one of those instances though. i think that the sysupgrade code is doing 
the right thing and the fault is elsewhere, but i may be wrong. without 
knowledge of your device and without a device to test, it is hard to tell.


maybe we should could call the attention of the device maintainer to 
this thread now?


thanks!



On 1/23/23 12:37, Koen Vandeputte wrote:
> Hi Rodrigo,
>
> After a long absence and now testing the latest master, I noticed that
> imx nand flash sysupgrade was broken:
>
> expected behaviour:
> - scan the nand
> - write image
> - format empty space
> -reboot
>
> Seen behaviour:
> - Scan the nand
> - reboots
>
> I traced it back to this batch of commits by you:
>
> 9d1e687da3 base-files: verify nand sysupgrade images
> 9710712120 base-files: accept gzipped nand sysupgrade images
> af34733593 base-files: fix ubinized nand sysupgrade
> e25e6d8e54 base-files: fix and clean up nand sysupgrade code
>
>
> It can be easily confirmed by reverting /lib/upgrade/nand.sh with a
> version before these commits are applied to it.
>
> I added a "set -x" to nand.sh to get more detailed logs:
> https://pastebin.com/raw/yxY0SK1x
>
> Any idea?
>
> Thanks,
>
> Koen



More information about the openwrt-devel mailing list