sysupgrade broken on imx nand targets (and maybe others too)

Lanchon lanchon at gmail.com
Tue Jan 24 08:13:22 PST 2023



On 1/24/23 12:26, Koen Vandeputte wrote:
> On Tue, Jan 24, 2023 at 7:59 AM Lanchon <lanchon at gmail.com> wrote:
>> hi Koen, thanks again.
>>
>> i copied your log here for ease of reference:
>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db
>>
>>
>> first let me say:
>>
>> - ubinized sysupgrade is not used by any of my devices.
>>
>> - ubinized sysupgrade happens when when an ubi partition image is fed as
>> an upgrade file. the image contains the complete set of ubi volumes that
>> are normally stored within the ubi partition on your device: typically
>> kernel (raw image), R/O rootfs (sqfs), and R/W overlay (ubifs). during
>> said sysupgrade, the current configuration is first copied to RAM, then
>> the ubi partition image is written, and finally -if current config is
>> kept- the RAM contents are written back to the new overlay volume.
>>
>> - ubinized sysupgrades were known to be broken by other people at the
>> time of my commits, and they wanted to remove the feature altogether
>> because it was unused (look it up in the relevant pull request for my
>> commits on github). as i remember it, it was broken because some ubi
>> volumes within the ubi partition were sometimes mounted or R/O block
>> devices were created on top of them (/dev/ubiblock*), locking the ubi
>> partition and preventing the upgrade.
>>
>> - although my devices would normally not use such upgrades, i could
>> still take a whole ubi partition backup and then test ubinized
>> sysupgrade with it on my devices. in fact, if you restore the ubi
>> partition image without conserving the current configuration, then this
>> procedure is the best way to do a backup/restore of the complete state
>> of the router: kernel, rootfs, and overlay are completely saved and
>> restored. btw, i think this should be documented. (and this is mostly
>> the reason why i added gzip support on sysupgrade: ubinized images of
>> backed-up ubi partitions compress like crazy.)
>>
>> - my tests of ubinized sysupgrade worked after these changes but not
>> before. specifically the fix is in: af34733593 base-files: fix ubinized
>> nand sysupgrade
>>
>>
>> regarding your log:
>>
>> - nand_do_platform_check succeeds:
>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L438-L469
>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L192
>>
>> - next comes the actual nand_do_flash_file:
>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L379-L405
>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2061
>>
>> - it is determined that passed file is a ubi partition image, so
>> nand_upgrade_ubinized is invoked:
>> https://github.com/openwrt/openwrt/blob/ac21dff5b67698c09f54a4b98d6f9f12af17edd4/package/base-files/files/lib/upgrade/nand.sh#L260-L269
>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2088
>>
>> nand_upgrade_ubinized is basically a one-liner:
>> ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - && ubiattach
>> -m "$mtdnum"
>>
>> cat/zcat the image, feeding that to ubiformat -f - which writes it.
>>
>>
>> and the write does take place, but take a look:
>>
>> -------------------
>>
>> + cat /tmp/nandnew.ubi
>> + ubiformat /dev/mtd2 -y -f -
>> ubiformat: mtd2 (nand), size 250609664 bytes (239.0 MiB), 1912
>> eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
>>
>> libscan: scanning eraseblock 0 --  0 % complete
>> libscan: scanning eraseblock 1 --  0 % complete
>> libscan: scanning eraseblock 2 --  0 % complete
>>      ...
>> libscan: scanning eraseblock 1868 -- 97 % complete
>>
>> libscan: scanning eraseblock
>> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
>> 1869 -- 97 % complete
>>
>> libscan:
>> [  207.883339] usb usb2: USB disconnect, device number 1
>> scanning eraseblock 1870 -- 97 %
>> [  207.891238] usb 2-1: USB disconnect, device number 2
>> complete
>>
>> libscan: scanning eras
>> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
>> eblock 1871 -- 97 % complete
>>
>> libscan: scanning eraseblock 1872
>> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
>>    -- 97 % complete
>>
>> libscan: scan
>> [  207.917055] usb usb1: USB disconnect, device number 1
>> ning eraseblock 1873 -- 98 % comp
>> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
>> lete
>>
>> libscan: scanning eraseblo
>> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
>> ck 1874 -- 98 % complete
>>
>> libsca
>> [  207.942382] reboot: Restarting system
>>
>> -----------------------
>>
>>
>> while sysupgrade is flashing UBI the partition, the system is rebooted;
>> i don't know why.
>>
>> here are the cleaned-up kernel messages:
>>
>> [  207.876200] ci_hdrc ci_hdrc.1: remove, state 1
>> [  207.883339] usb usb2: USB disconnect, device number 1
>> [  207.891238] usb 2-1: USB disconnect, device number 2
>> [  207.901522] ci_hdrc ci_hdrc.1: USB bus 2 deregistered
>> [  207.910396] ci_hdrc ci_hdrc.0: remove, state 4
>> [  207.917055] usb usb1: USB disconnect, device number 1
>> [  207.925651] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
>> [  207.934010] imx2-wdt 20bc000.watchdog: Device shutdown: Expect reboot!
>> [  207.942382] reboot: Restarting system
>>
>>
>> next the reboot takes place. u-boot mounts what it knows as mtd3 as an
>> ubi partition:
>> https://gist.github.com/Lanchon/f24ea9c16eda5ffaa5085a7b240238db#file-imx6dl-gw52xx-ubinized-sysupgrade-log-txt-L2183
>>
>> but openwrt used mtd2 to write to ubi, not mtd3. i don't know if in your
>> device what openwrt calls mtd2 is called mtd3 by your current (default)
>> u-boot config.
>>
>>
>> so some takeaways...
>>
>> - something is rebooting the system while i write the sysupgrade image.
>> maybe another thread? maybe an unattended watchdog?
>>
>> - the reboot happens while 98% of the image is written. maybe this issue
>> is time dependent and only shows up with my code because it is a little
>> slower than the previous code. maybe the previous code reached 100%
>> before being rebooted and thus the upgrade went through.
>>
>> - the image *IS* written, albeit partially, which means that the
>> previous image that was there is definitely erased. what is booted then?
>> IDK, it depends on the details of your device. maybe it is booting a
>> recovery image? or maybe the ubi partition format is not finished, but
>> the ubi volumes within are fully written before the reboot, so the
>> system doesn't brick by chance. (but then the newer image would be
>> booted, but you say that the sysupgrade has no effect and the prior
>> image is booted instead, so that can't be it.) maybe two ubi partitions
>> are used on your device to implement an A/B dual system boot. so maybe a
>> flag needs to be toggled to switch between A and B after the image is
>> written, but since that code is never executed, the previous system is
>> booted instead.
>>
>> - several issues cropped up with a set of sysupgrade changes i did
>> (among them, these you mention here). there are many device types and
>> several sysupgrade mechanisms with their own files, and then some common
>> files. i assumed other types of upgrades would invoke common routines
>> but not -for instance- nand flash routines. i was wrong: the codebase is
>> spaghetti calling any file form any sysupgrade method, and this caused a
>> couple of issues with my nand sysupgrade changes. i don't think this is
>> one of those instances though. i think that the sysupgrade code is doing
>> the right thing and the fault is elsewhere, but i may be wrong. without
>> knowledge of your device and without a device to test, it is hard to tell.
>>
>>
>> maybe we should could call the attention of the device maintainer to
>> this thread now?
>>
>>
>> thanks!
>>
>>
> Hi,
>
> Many thanks for the very detailed reply.
> It aided hugely in debugging this further.
>
> I'm happy to share that I found the culprit and it works nicely again now.
>
> within nand.sh:
>
>
> nand_upgrade_ubinized() {
>          local ubi_file="$1"
>          local gz="$2"
>
>          nand_detach_ubi "$CI_UBIPART" || return 1
>
>          local mtdnum="$( find_mtd_index "$CI_UBIPART" )"
>          ${gz}cat "$ubi_file" | ubiformat "/dev/mtd$mtdnum" -y -f - &&
> ubiattach -m "$mtdnum"
> }
>
> I changed the last line to:
>
>          ubiformat "/dev/mtd$mtdnum" -y -f "$ubi_file" && ubiattach -m "$mtdnum"
>
>
> Although ubiformat indeed states "-f -" to use stdinput, it does not
> seem to work.
> Writing the image seemed to be simply skipped using that method.
>
> actually presenting the absolute filepath to "-f" fixed it.
>
> Any thoughts?
>
> Thanks again for your prompt reply,
>
> Koen


with that change you broke gzipped ubinized upgrades.

$gz might contain nothing or the character "z", invoking either cat or zcat.
(it is also used elsewhere to conditionally add "z" to tar commands.)

it seems:
   ubiformat -f "$ubi_file"
is a little bit faster than:
   cat "$ubi_file" | ubiformat -f -
thus making it work for your hardware.

again, the latter form gets interrupted by a reboot at 98%, while the 
former (original) form is a bit faster and reaches 100% before the reboot.


the problem lies elsewhere: on your platform something is rebooting the 
system asynchronously while it is updating. this is very dangerous and 
must be fixed elsewhere in code.

somebody should ping the device/platform maintainer for this one.


thanks!



>
>> On 1/23/23 12:37, Koen Vandeputte wrote:
>>> Hi Rodrigo,
>>>
>>> After a long absence and now testing the latest master, I noticed that
>>> imx nand flash sysupgrade was broken:
>>>
>>> expected behaviour:
>>> - scan the nand
>>> - write image
>>> - format empty space
>>> -reboot
>>>
>>> Seen behaviour:
>>> - Scan the nand
>>> - reboots
>>>
>>> I traced it back to this batch of commits by you:
>>>
>>> 9d1e687da3 base-files: verify nand sysupgrade images
>>> 9710712120 base-files: accept gzipped nand sysupgrade images
>>> af34733593 base-files: fix ubinized nand sysupgrade
>>> e25e6d8e54 base-files: fix and clean up nand sysupgrade code
>>>
>>>
>>> It can be easily confirmed by reverting /lib/upgrade/nand.sh with a
>>> version before these commits are applied to it.
>>>
>>> I added a "set -x" to nand.sh to get more detailed logs:
>>> https://pastebin.com/raw/yxY0SK1x
>>>
>>> Any idea?
>>>
>>> Thanks,
>>>
>>> Koen



More information about the openwrt-devel mailing list