SquashFS mixed errors (decompression failed and others)

Koen Vandeputte koen.vandeputte at citymesh.com
Fri May 21 06:58:08 PDT 2021


On 21.05.21 13:19, Ibrahim Tachijian wrote:
> Hello,
>
> We use approximately 10k IPQ40XX devices and we have noticed that
> every time we run "sysupgrade -n" we lose approximately 1% of the
> routers in the process.
> After further investigation I'm almost confident that it is not the
> sysupgrade process that is the culprit - so what I did was that I put
> one test router into a reboot loop.
>
> This is what I do;
>
> Boot the router in a fresh state after a newly installed image.
> The image contains a reboot loop that consists of a shell script that
> runs every minute.
>
> The shell script tries to run a php-script which simply echoes "Hello
> World". If the php-script exists normally then we reboot the router.
>
> However the php-script exists abnormally then the router stops and
> does nothing other than informing me that there was a bus-error making
> php not able to process the hello world script.
>
> When this process runs the router reboots approximately 50 times
> before it boots into a state which is faulty where I see bus-errors
> when I try to run php scripts for example.
>
>
> Looking into dmesg you can see some errors such as,
>
> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
> [11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt
> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
> [11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt
> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
>
> or
>
> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234
>
> or
>
> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
> [62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt
> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
> [62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt
> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
> [62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt
> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
> [62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt
> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
> [62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt
>
>
>
> Now, you would assume that the squashfs-partition is broken - but if
> this was the case then a reboot should not help. It does.
> Rebooting the router after it boots in this faulty state fixes the issue.
>
> So approximately 1-2% of my reboots make the router go into this faulty state.
>
> I am clueless on how to further investigate this issue. For now my
> work around is restarting the router via a bash script should it
> notice there are bus-errors or i/o errors.
>
> Thanks
>
In the next kernel bump, following patch is also present:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3

I think it's worth a shot to retry the tests once it's bumped.

Koen




More information about the openwrt-devel mailing list