SquashFS mixed errors (decompression failed and others)
Ibrahim Tachijian
barhom at gmail.com
Sun May 23 15:06:24 PDT 2021
> The 4-"byte"-address mode is used on 32 MiB flash chips.
> We had similar issues with other 32 MiB devices in the past
> which were fixed at some point by Felix Fietkau.
My device is 32MiB. I'll check with Felix if he can give me any clues.
@Everyone else reading this, do you know how one can increase "the
reset duration during
booting" for the flash chip? (Not even sure I fully understand what this means)
On Sun, May 23, 2021 at 10:28 AM Vincent Wiemann
<vincent.wiemann at ironai.com> wrote:
>
> On 5/23/21 10:21 AM, Ibrahim Tachijian wrote:
> >> Is your firmware (sysupgrade) bigger than 16MB?
> >
> > No, the sysupgrade file is currently 13MB.
> >
> >> So maybe it has to do with switching to 4-address-mode...
> > What is this exactly?
>
> The 4-"byte"-address mode is used on 32 MiB flash chips.
> We had similar issues with other 32 MiB devices in the past
> which were fixed at some point by Felix Fietkau.
>
> >> My guess is that the error already happens when reading the flash.
> > At least we know that the flash is not being written to incorrectly
> > since after a reboot the flash is intact and does not produce any
> > errors. It's simply random if the system boots into this "faulty
> > state" or not (happens approx 1-2% of the time).
> >
> > Does anyone maybe know how I can re-read the squashfs partition and
> > verify the integrity while the system is booted to see if I encounter
> > the squashfs errors.
> > I'm really at a loss here - no idea where to even look into diagnosing
> > the issue.
> >
>
> I guess the reset line of the flash chip is not hold long enough so
> that it is in an unclean state. I think the reset duration during
> booting needs to be increased. But I don't know the code and can't point
> you there. It's just a guess...
>
> >
> >
> >
> > On Fri, May 21, 2021 at 6:16 PM Vincent Wiemann
> > <vincent.wiemann at ironai.com> wrote:
> >>
> >>
> >>
> >> On 5/21/21 3:58 PM, Koen Vandeputte wrote:
> >>>
> >>> On 21.05.21 13:19, Ibrahim Tachijian wrote:
> >>>> Hello,
> >>>>
> >>>> We use approximately 10k IPQ40XX devices and we have noticed that
> >>>> every time we run "sysupgrade -n" we lose approximately 1% of the
> >>>> routers in the process.
> >>>> After further investigation I'm almost confident that it is not the
> >>>> sysupgrade process that is the culprit - so what I did was that I put
> >>>> one test router into a reboot loop.
> >>>>
> >>>> This is what I do;
> >>>>
> >>>> Boot the router in a fresh state after a newly installed image.
> >>>> The image contains a reboot loop that consists of a shell script that
> >>>> runs every minute.
> >>>>
> >>>> The shell script tries to run a php-script which simply echoes "Hello
> >>>> World". If the php-script exists normally then we reboot the router.
> >>>>
> >>>> However the php-script exists abnormally then the router stops and
> >>>> does nothing other than informing me that there was a bus-error making
> >>>> php not able to process the hello world script.
> >>>>
> >>>> When this process runs the router reboots approximately 50 times
> >>>> before it boots into a state which is faulty where I see bus-errors
> >>>> when I try to run php scripts for example.
> >>>>
> >>>>
> >>>> Looking into dmesg you can see some errors such as,
> >>>>
> >>>> [10985.209438] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x3a803e
> >>>> [11045.218685] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [11045.218731] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x3a803e
> >>>> [11105.228157] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [11105.228203] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x3a803e
> >>>>
> >>>> or
> >>>>
> >>>> [26218.687905] SQUASHFS error: Unable to read page, block 1b99a, size
> >>>> 10234
> >>>> [26221.057472] SQUASHFS error: Unable to read data cache entry [1b99a]
> >>>> [26221.057551] SQUASHFS error: Unable to read page, block 1b99a, size
> >>>> 10234
> >>>> [26221.062926] SQUASHFS error: Unable to read data cache entry [1b99a]
> >>>> [26221.069742] SQUASHFS error: Unable to read page, block 1b99a, size
> >>>> 10234
> >>>> [26224.460239] SQUASHFS error: Unable to read data cache entry [1b99a]
> >>>> [26224.460320] SQUASHFS error: Unable to read page, block 1b99a, size
> >>>> 10234
> >>>>
> >>>> or
> >>>>
> >>>> [62745.801178] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x732ae2
> >>>> [62773.347234] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [62773.347281] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x732ae2
> >>>> [62790.132661] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [62790.132706] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x732ae2
> >>>> [62790.216746] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [62790.216792] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x732ae2
> >>>> [62800.810525] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>> [62800.810570] SQUASHFS error: squashfs_read_data failed to read block
> >>>> 0x732ae2
> >>>> [62828.336267] SQUASHFS error: xz decompression failed, data probably
> >>>> corrupt
> >>>>
> >>>>
> >>>>
> >>>> Now, you would assume that the squashfs-partition is broken - but if
> >>>> this was the case then a reboot should not help. It does.
> >>>> Rebooting the router after it boots in this faulty state fixes the issue.
> >>>>
> >>>> So approximately 1-2% of my reboots make the router go into this
> >>>> faulty state.
> >>>>
> >>>> I am clueless on how to further investigate this issue. For now my
> >>>> work around is restarting the router via a bash script should it
> >>>> notice there are bus-errors or i/o errors.
> >>>>
> >>>> Thanks
> >>>>
> >>> In the next kernel bump, following patch is also present:
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.38&id=2ed1d90162a0c0683ecbe0c4802187fa22d641c3
> >>>
> >>>
> >>> I think it's worth a shot to retry the tests once it's bumped.
> >>>
> >>> Koen
> >>>
> >>
> >> My guess is that the error already happens when reading the flash.
> >> Is your firmware (sysupgrade) bigger than 16MB?
> >> So maybe it has to do with switching to 4-address-mode...
> >>
> >> Best,
> >>
> >> Vincent
> >>
> >> _______________________________________________
> >> openwrt-devel mailing list
> >> openwrt-devel at lists.openwrt.org
> >> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> >
> >
> >
> > --
> > Ibrahim Tachijian
> >
>
--
Ibrahim Tachijian
More information about the openwrt-devel
mailing list