[OpenWrt-Devel] Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

Richard Weinberger richard at nod.at
Fri Oct 19 10:45:53 EDT 2018


Rafał,

----- Ursprüngliche Mail -----
> Von: "Rafał Miłecki" <zajec5 at gmail.com>
> An: "Amir Goldstein" <amir73il at gmail.com>, "Miklos Szeredi" <miklos at szeredi.hu>, linux-unionfs at vger.kernel.org,
> linux-fsdevel at vger.kernel.org, "richard" <richard at nod.at>, "Artem Bityutskiy" <dedekind1 at gmail.com>, "Adrian Hunter"
> <adrian.hunter at intel.com>, linux-mtd at lists.infradead.org, "Russell Senior" <russell at personaltelco.net>, "OpenWrt
> Development List" <openwrt-devel at lists.openwrt.org>
> Gesendet: Freitag, 19. Oktober 2018 14:31:29
> Betreff: Regression in handling power cuts since 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up")

> Hi,
> 
> Since OpenWrt switch from kernel 4.9 to 4.14 users started randomly
> reporting file system corruptions. OpenWrt uses overlay(fs) with
> squashfs as lowerdir and ubifs as upperdir. Russell managed to isolate
> & describe test case for reproducing corruption when doing a power cut
> after first boot.
> 
> Interestingly it cannot be reproduced on all devices (NAND dependant?
> arch dependant?!). I couldn't reproduce that problem on none of my
> Broadcom devices (ARM=y ARCH_BCM_5301X=y) so I had to buy Ubiquiti
> EdgeRouter X (ER-X) (MIPS=y RALINK=y). I reproduced it then and
> bisected down to the commit 3a1e819b4e80 ("ovl: store file handle of
> lower inode on copy up").
> 
> FWIW I was told it also affects:
> Asus RT-AC58U (ARCH_IPQ40XX=y)
> powerpc
> RB493G, DIR-860L (ATH79=y)
> 
> Steps to reproduce the problem:
> 1) Flash firmware
> 2) Boot (for the first time)
> 3) Let the init script copy config files from lowerdir to the upperdir
> 4) Wait for boot to finish
> 5) Verify content of some unmodified config on overlay, using either:
> hexdump -C /etc/config/dropbear
> hexdump -C /overlay/upper/etc/config/dropbear
> 6) Power cut & boot again
> 7) Check the content of the same file

Do you have something also I can test?
A C reproducer? An xfstest case?

> After above regressing commit the later check confirms the file size
> looks correct but it's filled with all 00-es only.
> 
> Can I ask you to check if there is something possibly wrong with the
> above ovl commit? Or does it expose some problem with the ubifs? Or
> maybe the whole UBI?

Well, I fear it uncovers a problem in UBIFS. We had already problems with overlayfs.
Did you bisect the problem and you are sure that the said commit is the first bad commit?
 
> FWIW testing above commit (and one before it) always results in single
> error in the kernel log:
> [   14.250184] UBIFS error (ubi0:1 pid 637): ubifs_add_orphan: orphaned twice

Please show the full log.
The orphan thing rings a bell, we had such a bug already.

Thanks,
//richard

_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list