[OpenWrt-Devel] Squashfs breakage lottery with UBI WAS: [PATCH RFC 2/2] amp821xx: use newly added pad-squashfs for Meraki MR24

Jonas Gorski jonas.gorski at gmail.com
Sun Sep 1 06:36:29 EDT 2019


On Sat, 31 Aug 2019 at 15:31, Christian Lamparter <chunkeey at gmail.com> wrote:
>
> Hello,
>
> On Saturday, August 31, 2019 2:09:55 PM CEST Jonas Gorski wrote:
> > On Sat, 31 Aug 2019 at 01:19, Christian Lamparter <chunkeey at gmail.com> wrote:
> > >
> > > On Friday, August 30, 2019 11:10:54 PM CEST Russell Senior wrote:
> > > > >>>>> "Christian" == Christian Lamparter <chunkeey at gmail.com> writes:
> > > >
> > > > Christian> Ok.
> > > >
> > > > Christian> I did push a patch titled: "build: remove harmful -nopad
> > > > Christian> option from mksquashfs"
> > > > Christian> <https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=1c0290c5cc6258c48b8ba46b4f9c85a21de4f875>
> > > >
> > > > Christian> so, let's see if this triggers more undefined behaviour and
> > > > Christian> exposes more hidden broken code.
> > > >
> > > > Just to re-iterate my earlier worry, if for example:
> > > >
> > > >   kernel_size + rootfs_with_padding_size
> > > >
> > > > crosses an erase block boundary that:
> > > >
> > > >   kernel_size + rootfs_without_padding_size
> > > >
> > > > does not, then we'll be wasting an erase block of flash space on NOR.
> > > Not to worry. I guess that part of the magic is also explained in the wiki:
> > > <https://openwrt.org/docs/techref/flash.layout#example_flash_partitioning>
> > > <https://openwrt.org/docs/techref/flash.layout#explanations>
> > > In case my attempt now gets too confusing :-/.
> > >
> > > The "rootfs" partition also contains the rootfs_data inside. It should
> > > be possible to verify this with any of the routers that use the "firmware"
> > > parition label.
> > >
> > > Please check /proc/mtd. For example on my WNDR3700v2 (which is very similar
> > > to your WNDR3800). It looks like this
> > >
> > > |# cat /proc/mtd
> > > |dev:    size   erasesize  name
> > > |mtd0: 00050000 00010000 "u-boot"
> > > |mtd1: 00020000 00010000 "u-boot-env"
> > > |mtd2: 00f80000 00010000 "firmware"
> > > |mtd3: 0018c440 00010000 "kernel"
> > > |mtd4: 00df3bc0 00010000 "rootfs"
> > > |mtd5: 00620000 00010000 "rootfs_data"
> > > |mtd6: 00010000 00010000 "art
> > >
> > > The rootfs is 14629824 Bytes = 13.95 MiB. (The kernel + uboot + env + art
> > > fills out the remaining ~2MiB to a total of 16 MiB). So the padding we both
> > > are talking about already has to exists between the squashfs portion and the
> > > jffs2/overlay portion inside the "rootfs" partition and it's a full erase-size
> > > block. Sadly, OpenWrt does not readily print the "end" of just the squashfs
> > > partition (as it does with the kernel partition above) and your message from
> > > earlier with the three routers didn't include the "squashfs-split" and its
> > > results. (I think that maybe this is the missing information that got lost?)
> >
> > After a bit more investigation, those devices that use padjffs2 (or
> > have the rootfs
> > start at an at least 4k aligned offset) will be fine.
> >
> > The issue are those devices with an unaligned rootfs start, which put their own
> > EOF marker in the image by the firmware util (e.g. brcm63xx or tp-link devices).
> >
> > There it can happen that e.g. we get
> >
> > 0x18fff2  <squashfs end> 00 00 00 00
> > 0x19000 00 00 00 00 ... <end of padding> FF FF
> > 0x19010 FF FF FF ...
> > *
> > 0x20000 DE AD C0 DE
> >
> > The 0xff are only a guess, I haven't checked what the different
> > firmware utils use for padding the end of the rootfs - but mksquashfs
> > definitely uses 0x00.
> >
> > which leads to:
> >
> > 1. squashfs-split will put the roofs_data partition start at 0x19000
> > because that's the first aligned erase block after the end of the
> > rootfs start + squashfs length
> > 2. jffs2 will see that the first block is neither a valid jff2 node,
> > an EOF marker, nor all 0xff
> > 3. jffs2 will refuse to erase/mount the filesystem (AFAIU the code) [1]
> >
> > So removing the -nopad might actually break those devices.
> >
> > We could fix this case by making sure that mksquashfs and all firmware
> > utils use 0xff's to pad (so the erase block will then be treated as
> > empty/all 0xff). But then there is the question how jffs2 reacts if
> > the first block is 0xff, and the second block is a valid jffs2 node,
> > which happens when we sysupgrade with keeping config on NOR devices.
> > The jffs2 code isn't the most readable code ... .
> >
> No need to worry, see one of the previous mails in this thread:
>
> http://lists.infradead.org/pipermail/openwrt-devel/2019-August/018638.html
>
> It contains a patch at the end titled:
> "[PATCH] base-files: pad root.squashfs to 64KiB in ubi volumes"
> This is another approach that just deals with the UBI+squashfs
> issue but works with "-nopad". Soooooo.... do we all agree there?

a) 64k is excessive, we only need 4k (actually 1k would be enough,
since we don't enable CONFIG_SQUASHFS_4K_DEVBLK_SIZE).

The referenced issue with 64k page size happens when loop-mounting a
squashfs, since loop defaults to PAGE_SIZE as its block size. But we
never do that in OpenWrt, and we don't support any targets with that
huge PAGE_SIZE - biggest is ARC with 8k.

b) it misses the squashfs's in generic sysupgrade images itself - we
need to pad their length as well, to avoid breaking devices with a
sysupgrade image hitting the corner case being flashed from an unfixed
firmware with the old nand.sh.

Also IMHO "1c0290c5cc6258c48b8ba46b4f9c85a21de4f875" should be
reverted, for the previously mentioned issues.

Regards



Jonas

_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel



More information about the openwrt-devel mailing list