SquashFS mixed errors (decompression failed and others)

Vincent Wiemann vincent.wiemann at ironai.com
Sun May 23 01:28:37 PDT 2021


On 5/23/21 10:21 AM, Ibrahim Tachijian wrote:
>> Is your firmware (sysupgrade) bigger than 16MB?
> 
> No, the sysupgrade file is currently 13MB.
> 
>> So maybe it has to do with switching to 4-address-mode...
> What is this exactly?

The 4-"byte"-address mode is used on 32 MiB flash chips.
We had similar issues with other 32 MiB devices in the past
which were fixed at some point by Felix Fietkau.

>> My guess is that the error already happens when reading the flash.
> At least we know that the flash is not being written to incorrectly
> since after a reboot the flash is intact and does not produce any
> errors. It's simply random if the system boots into this "faulty
> state" or not (happens approx 1-2% of the time).
> 
> Does anyone maybe know how I can re-read the squashfs partition and
> verify the integrity while the system is booted to see if I encounter
> the squashfs errors.
> I'm really at a loss here - no idea where to even look into diagnosing
> the issue.
> 

I guess the reset line of the flash chip is not hold long enough so
that it is in an unclean state. I think the reset duration during
booting needs to be increased. But I don't know the code and can't point
you there. It's just a guess...

> 
> 
> 
> On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
> <vincent.wiemann at ironai.com> wrote:
>>
>>
>>
>> On 5/21/21 3:58 PM, Koen Vandeputte wrote:
>>>
>>> On 21.05.21 13:19, Ibrahim Tachijian wrote:
>>>> Hello,
>>>>
>>>> We use approximately 10k IPQ40XX devices and we have noticed that
>>>> every time we run "sysupgrade -n" we lose approximately 1% of the
>>>> routers in the process.
>>>> After further investigation I'm almost confident that it is not the
>>>> sysupgrade process that is the culprit - so what I did was that I put
>>>> one test router into a reboot loop.
>>>>
>>>> This is what I do;
>>>>
>>>> Boot the router in a fresh state after a newly installed image.
>>>> The image contains a reboot loop that consists of a shell script that
>>>> runs every minute.
>>>>
>>>> The shell script tries to run a php-script which simply echoes "Hello
>>>> World". If the php-script exists normally then we reboot the router.
>>>>
>>>> However the php-script exists abnormally then the router stops and
>>>> does nothing other than informing me that there was a bus-error making
>>>> php not able to process the hello world script.
>>>>
>>>> When this process runs the router reboots approximately 50 times
>>>> before it boots into a state which is faulty where I see bus-errors
>>>> when I try to run php scripts for example.
>>>>
>>>>
>>>> Looking into dmesg you can see some errors such as,
>>>>
>>>> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x3a803e
>>>> [11045.218685] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x3a803e
>>>> [11105.228157] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x3a803e
>>>>
>>>> or
>>>>
>>>> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
>>>> 10234
>>>> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
>>>> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
>>>> 10234
>>>> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
>>>> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
>>>> 10234
>>>> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
>>>> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
>>>> 10234
>>>>
>>>> or
>>>>
>>>> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x732ae2
>>>> [62773.347234] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x732ae2
>>>> [62790.132661] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x732ae2
>>>> [62790.216746] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x732ae2
>>>> [62800.810525] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block
>>>> 0x732ae2
>>>> [62828.336267] SQUASHFS error: xz decompression failed, data probably
>>>> corrupt
>>>>
>>>>
>>>>
>>>> Now, you would assume that the squashfs-partition is broken - but if
>>>> this was the case then a reboot should not help. It does.
>>>> Rebooting the router after it boots in this faulty state fixes the issue.
>>>>
>>>> So approximately 1-2% of my reboots make the router go into this
>>>> faulty state.
>>>>
>>>> I am clueless on how to further investigate this issue. For now my
>>>> work around is restarting the router via a bash script should it
>>>> notice there are bus-errors or i/o errors.
>>>>
>>>> Thanks
>>>>
>>> In the next kernel bump, following patch is also present:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3
>>>
>>>
>>> I think it's worth a shot to retry the tests once it's bumped.
>>>
>>> Koen
>>>
>>
>> My guess is that the error already happens when reading the flash.
>> Is your firmware (sysupgrade) bigger than 16MB?
>> So maybe it has to do with switching to 4-address-mode...
>>
>> Best,
>>
>> Vincent
>>
>> _______________________________________________
>> openwrt-devel mailing list
>> openwrt-devel at lists.openwrt.org
>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> 
> 
> 
> --
> Ibrahim Tachijian
> 




More information about the openwrt-devel mailing list