SquashFS mixed errors (decompression failed and others)
Ibrahim Tachijian
barhom at gmail.com
Fri May 21 04:19:03 PDT 2021
Hello,
We use approximately 10k IPQ40XX devices and we have noticed that
every time we run "sysupgrade -n" we lose approximately 1% of the
routers in the process.
After further investigation I'm almost confident that it is not the
sysupgrade process that is the culprit - so what I did was that I put
one test router into a reboot loop.
This is what I do;
Boot the router in a fresh state after a newly installed image.
The image contains a reboot loop that consists of a shell script that
runs every minute.
The shell script tries to run a php-script which simply echoes "Hello
World". If the php-script exists normally then we reboot the router.
However the php-script exists abnormally then the router stops and
does nothing other than informing me that there was a bus-error making
php not able to process the hello world script.
When this process runs the router reboots approximately 50 times
before it boots into a state which is faulty where I see bus-errors
when I try to run php scripts for example.
Looking into dmesg you can see some errors such as,
[10985.209438] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11045.218685] SQUASHFS error: xz decompression failed, data probably corrupt
[11045.218731] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
[11105.228157] SQUASHFS error: xz decompression failed, data probably corrupt
[11105.228203] SQUASHFS error: squashfs_read_data failed to read block 0x3a803e
or
[26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
[26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size 10234
[26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
[26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size 10234
or
[62745.801178] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62773.347234] SQUASHFS error: xz decompression failed, data probably corrupt
[62773.347281] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.132661] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.132706] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62790.216746] SQUASHFS error: xz decompression failed, data probably corrupt
[62790.216792] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62800.810525] SQUASHFS error: xz decompression failed, data probably corrupt
[62800.810570] SQUASHFS error: squashfs_read_data failed to read block 0x732ae2
[62828.336267] SQUASHFS error: xz decompression failed, data probably corrupt
Now, you would assume that the squashfs-partition is broken - but if
this was the case then a reboot should not help. It does.
Rebooting the router after it boots in this faulty state fixes the issue.
So approximately 1-2% of my reboots make the router go into this faulty state.
I am clueless on how to further investigate this issue. For now my
work around is restarting the router via a bash script should it
notice there are bus-errors or i/o errors.
Thanks
--
Ibrahim Tachijian
More information about the openwrt-devel
mailing list