SquashFS mixed errors (decompression failed and others)
Ibrahim Tachijian
barhom at gmail.com
Sun May 23 01:21:06 PDT 2021
> Is your firmware (sysupgrade) bigger than 16MB?
No, the sysupgrade file is currently 13MB.
> So maybe it has to do with switching to 4-address-mode...
What is this exactly?
> My guess is that the error already happens when reading the flash.
At least we know that the flash is not being written to incorrectly
since after a reboot the flash is intact and does not produce any
errors. It's simply random if the system boots into this "faulty
state" or not (happens approx 1-2% of the time).
Does anyone maybe know how I can re-read the squashfs partition and
verify the integrity while the system is booted to see if I encounter
the squashfs errors.
I'm really at a loss here - no idea where to even look into diagnosing
the issue.
On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
<vincent.wiemann at ironai.com> wrote:
>
>
>
> On 5/21/21 3:58 PM, Koen Vandeputte wrote:
> >
> > On 21.05.21 13:19, Ibrahim Tachijian wrote:
> >> Hello,
> >>
> >> We use approximately 10k IPQ40XX devices and we have noticed that
> >> every time we run "sysupgrade -n" we lose approximately 1% of the
> >> routers in the process.
> >> After further investigation I'm almost confident that it is not the
> >> sysupgrade process that is the culprit - so what I did was that I put
> >> one test router into a reboot loop.
> >>
> >> This is what I do;
> >>
> >> Boot the router in a fresh state after a newly installed image.
> >> The image contains a reboot loop that consists of a shell script that
> >> runs every minute.
> >>
> >> The shell script tries to run a php-script which simply echoes "Hello
> >> World". If the php-script exists normally then we reboot the router.
> >>
> >> However the php-script exists abnormally then the router stops and
> >> does nothing other than informing me that there was a bus-error making
> >> php not able to process the hello world script.
> >>
> >> When this process runs the router reboots approximately 50 times
> >> before it boots into a state which is faulty where I see bus-errors
> >> when I try to run php scripts for example.
> >>
> >>
> >> Looking into dmesg you can see some errors such as,
> >>
> >> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >> [11045.218685] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >> [11105.228157] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x3a803e
> >>
> >> or
> >>
> >> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
> >> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
> >> 10234
> >>
> >> or
> >>
> >> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62773.347234] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62790.132661] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62790.216746] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62800.810525] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block
> >> 0x732ae2
> >> [62828.336267] SQUASHFS error: xz decompression failed, data probably
> >> corrupt
> >>
> >>
> >>
> >> Now, you would assume that the squashfs-partition is broken - but if
> >> this was the case then a reboot should not help. It does.
> >> Rebooting the router after it boots in this faulty state fixes the issue.
> >>
> >> So approximately 1-2% of my reboots make the router go into this
> >> faulty state.
> >>
> >> I am clueless on how to further investigate this issue. For now my
> >> work around is restarting the router via a bash script should it
> >> notice there are bus-errors or i/o errors.
> >>
> >> Thanks
> >>
> > In the next kernel bump, following patch is also present:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3
> >
> >
> > I think it's worth a shot to retry the tests once it's bumped.
> >
> > Koen
> >
>
> My guess is that the error already happens when reading the flash.
> Is your firmware (sysupgrade) bigger than 16MB?
> So maybe it has to do with switching to 4-address-mode...
>
> Best,
>
> Vincent
>
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel at lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
--
Ibrahim Tachijian
More information about the openwrt-devel
mailing list