[PATCH] kernel: fix mtd partition erase<parent_erasesize writing to wrong address
Thibaut
hacks at slashdirt.org
Sat Sep 19 07:46:53 EDT 2020
Hi,
Thanks for the summary.
I’m however under the impression that two distinct things are being confused here: partial erase and 4K_LIMIT.
Specifically, prior to the recent (4.17) changes in the mtd subsystem, Mikrotik devices made use of two separate hacks:
1) the ability to set different EB sizes (e.g. 64K or 4K) for different mtd partitions (which AIUI https://lists.infradead.org/pipermail/linux-mtd/2020-August/081636.html is trying to bring back to life)
2) the ability to perform partial erases when e.g. a small (4KB) partition with 4K EB was overlapping the boundary of an adjacent 64K EB
As far as MikroTik devices are concerned, the only reason we need 1) & 2) above is to be able to edit the bootloader configuration which is stored in “soft_config”, a 4K partition (registered with 4K EB) that is typically adjacent to a larger, read-only, bootloader (using 64K EB).
However, the other issue that has arisen and the reason for the sysupgrade borkage is somewhat different: until said recent changes, Mikrotik (and others AIUI) devices used the CONFIG_MTD_SPI_NOR_USE_4K_SECTORS_LIMIT hack so that the small r/w partition (such as soft_config) properly used 4K EB, while the larger partitions (e.g. rootfs) kept using 64K EB (for performance reasons). That hack is also broken.
On Mikrotik devices, the sysupgrade failures are due to the fact that when 4K_SECTORS is turned on, 4K_SECTORS_LIMIT is ignored and so _all_ partitions end up using 4K EB. However rootfs is properly 64K-aligned and thus can (and should) use 64K EB. That's also what the build system expects, which results in jffs2 failure to recover sysupgrade saved data.
That second problem is AIUI addressed by John’s attempt to fix upstream so that mtd-spi-nor-attempt-4K-erase.patch works.
HTH,
Thibaut
> On 19 Sep 2020, at 13:09, Adrian Schmutzler <mail at adrianschmutzler.de> wrote:
>
> Hi,
>
>> -----Original Message-----
>> From: openwrt-devel [mailto:openwrt-devel-bounces at lists.openwrt.org]
>> On Behalf Of John Thomson
>> Sent: Mittwoch, 5. August 2020 23:14
>> To: openwrt-devel at lists.openwrt.org
>> Cc: John Thomson <git at johnthomson.fastmail.com.au>
>> Subject: [PATCH] kernel: fix mtd partition erase<parent_erasesize writing to
>> wrong address
>>
>> This bug applied where mtd partition end address, or erase start address,
>> was not cleanly divisible by parent mtd erasesize.
>>
>> This error would cause the bits following the end of the partition to the next
>> erasesize block boundary to be erased, and this partition-overflow data to be
>> written to the partition erase address (missing additional partition offset
>> address) of the mtd (top) parent device.
>
> We had a longer discussion about that on IRC yesterday, where we also discussed (stylistically) different solutions.
>
> As it appears, kernel did a major overhaul of mtdpart.c in a more recent major kernel version:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef
>
> So, any cosmetic improvement would probably be in vain anyway. Therefore, I decided to merge the patch in its current (tested) state.
>
> However, this essentially means we will meet the same problem again with the next kernel bump, and it won't become easier.
> In the discussion, jow had the idea to circumvent these problems and handle the non-aligned partitions in user-space. Maybe one can exploit the mtd-rw tool (available as kmod-mtd-rw in packages repo) for unlocking the relevant areas and then do everything else in user-space.
>
> If that seems suitable, we could use the merged patch for the 20.xx release, and then migrate master to a better solution when we have one.
>
> I put the discussion below in case somebody feels inspired to play around with that (first half is the general discussion, second half mentions the user-space solution).
>
> Best
>
> Adrian
>
> ---
>
> [...]
> 12:31 jow: adrianschmutzler: I took a look at the difference and in hindsight it seems to make a lot of sense
> 12:33 jow: adrianschmutzler: plain diff: (Link: https://pastebin.com/AqE89TXB)https://pastebin.com/AqE89TXB - code context: (Link: https://pastebin.com/Hn2ny4Lt)https://pastebin.com/Hn2ny4Lt
> 12:34 jow: adrianschmutzler: so the "instr->addr -= part->offset;" address manipulation is moved after the _write() which makes sense since we're supposed to both erase and (partially) re-write the same block
> 12:34 jow: (my second paste shows the old, not the patched code)
> 12:35 jow: the way the code is right now, it would erase the entire block partly occupied by a partition, then overwrite the first block of the mtd device
> 12:35 jow: this is my interpretation at least, I am not an mtd subsystem expert
> 12:38 adrianschmutzler: okay, thanks. will have a look at the older kernel now, as in the PR it is called "a merge bug in 4.19 & 5.4" ...
> 12:39 jow: yeah, my first thought was the same, probably a merge/patch rebase quirk
> 12:41 jow: tbh, I'd refactor the code while at it, directly substract instr->addr after part->parent->_erase(part->parent, instr); like it is done in the current broken code and explicitly pass instr->addr + part->offset and instr->addr + instr->len + part->offset respectively to part->parent->_write() to mirror what is done for mtd_read() a few further lines up
> 12:41 adrianschmutzler: unfortunately upstream also doesn't seem to be interested in commenting on the patches
> 12:43 jow: mtd_read() and part->parent->_write() take explicit offsets as arguments (part->offset + instr->addr + ...) while part->parent->_erase() uses the ->addr member from instr, hence the address manipulation
> 12:43 jow: and I'd revert the address to its original value as soon as possible and continue passing calculated offsets later
> 12:44 adrianschmutzler: jow: So, it's actually bad design after all?
> 12:45 jow: this would be my fix on top of the current broken code: (Link: https://pastebin.com/diff/EjUdXshw)https://pastebin.com/diff/EjUdXshw
> 12:45 adrianschmutzler: the need to increase/decrease instr->addr ...
> 12:47 adrianschmutzler: jow: okay. IIRC, they stirred around mtd in one of the recent kernel anyway, and also did some fairly similarly looking changes with offsets there as well
> 12:47 jow: I do not know enough about the struct erase_info and part->parent->_erase() semantics to understand why _erase() apparently uses instr->addr relative to the partition offset while part->parent->_write() expects an absolute offset relative to the mtd block
> 12:48 jow: that seems inconsitent to me, maybe the _erase() operates on a wrong offset as well
> 12:53 jow: adrianschmutzler: I took at the upstream code as well: (Link: https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197)https://lxr.missinglinkelectronics.com/linux/drivers/mtd/mtdpart.c#L197 and I think I got a better idea now. So the various part_*() functions expect an erase_info struct with partition-relative addresses, they then interally manipulate those addresses to be mtd-relative, invoke the lowlevel whole mtd ops, then revert the address back to their
> 12:53 jow: original partition-relative value
> 12:54 jow: so the originally proposed patch which simply moves the reversal (instr->addr -= part->offset;) directly before the `return ret;` is fine and matches the style of the kernel part_*() functions
> 12:58 jow: adrianschmutzler: just for completeness, and even better fix would be (Link: https://pastebin.com/diff/mFSb8Uqp)https://pastebin.com/diff/mFSb8Uqp
> 13:02 adrianschmutzler: jow: why is the additional mtd_to_part(mtd) superior?
> 13:03 jow: it should be optimized away by the compiler, the code is superior because the intention is clearer
> 13:03 jow: likewise the mtd_read() at the beginning could be changed to part_read() and the `part->offset + ` removed in each invocation
> 13:04 jow: this way the only "address fiddling" that remains is in the original function body of part_erase()
> 13:04 jow: (we can't obviously call part_erase() from within part_erase() since that'd recurse forever)
> 13:05 jow: but this is all just a matter of style and taste, not sure what the subsystem maintainers would prefer
> 13:06 jow: I guess however that this partial erase will likely will never go upstream
> 13:06 adrianschmutzler: okay, I see your point now
> 13:06 adrianschmutzler: probably that's why they remained silent on the other issues
> 13:07 jow: I actually wonder why this can't be done in userspace
> 13:08 jow: or in wahtever code using the mtd api
> 13:08 jow: after all it is just a matter of reading an entire block, holding it in memory, erasing the entire block and then simply partially write back the parts we did not want to erase
> 13:09 jow: the only obstacle would be the write protection for non-eb aligned parts I guess
> 13:10 adrianschmutzler: no idea; but I will throw that idea towards the affected people, maybe they want to go into that direction
> 13:11 jow: in the meanwhile I'd say we should apply the proposed fix
> 13:11 jow: it makes sense
> 13:11 jow: and its not the submitters fault that the orginal code is not beautiful to begin with
> 13:11 adrianschmutzler: sysupgrade won't be a problem for a user-space solution, as other stuff there happens in user-space as well?
> 13:12 jow: if I remember correctly, that partial erase thing was introduced years ago to support sysupgrade on Redboot systems
> 13:13 adrianschmutzler: that's one of currently broken systems and the patch fixes them apparently
> 13:13 jow: there we wanted to write accross the FIS table dictated kernel/rootfs boundary using the mtd util and nbd implemented a kernel patch and corresponding functionality in mtd to do exactly that
> 13:13 jow: and I believe one of (the?) problem was that the kernel mtd subsystem write-protects all non-aligned partitions by default
> 13:14 adrianschmutzler: but that sounds more like the subsequent patch:
> 13:14 adrianschmutzler: 412-mtd-partial_eraseblock_unlock.patch
> 13:15 adrianschmutzler: which states it's for ixp4xx only, which probably isn't true anymore ...
> 13:16 jow: adrianschmutzler: I think both is needed
> 13:19 jow: adrianschmutzler: I think the partial erase was needed to rewrite the FIS partition while keeping the Redboot config (often both share one eraseblock)
> 13:20 adrianschmutzler: okay, then since this is tested and waiting some time already, and since part_write is removed with newer kernel anyway, I merge the existing patch.
> 13:20 jow: adrianschmutzler: and the partial erase unlock patch covers cases where the partition you manipulate is locked because its not spanning an entire eb
> 13:21 jow: at least this is my understanding
> 13:22 adrianschmutzler: jow: okay. but then the 412 patch is probably relevant to all devices with redboot partitions and the ixp4xx references should be removed from the description ...
> 13:22 jow: or rather, it allows unlocking non-aligned parts by simply tweaking the flash area begin and end offsets to start at the previous eb boundary and end on the next
> 13:22 jow: so you're not only unlocking the target area but a little bit of stuff before and after too
> 13:23 jow: otherwise the mtd subsystem would reject the unlock
> 13:24 adrianschmutzler: so, it's effectively a hack
> 13:24 jow: yes
> 13:24 adrianschmutzler: btw just found that Rafał wanted to remove these scripts already in March: (Link: https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/)https://patchwork.ozlabs.org/project/openwrt/patch/20200309114302.14383-1-zajec5@gmail.com/
> 13:25 adrianschmutzler: scripts->patches
> 13:26 jow: I think that if we somehow have a mechanism to forcibly unlock all flash areas (which could be done by a much leaner and more isolated patch I suppose) we could handle all the other edge cases in userspace
> 13:26 jow: by disregarding any partitioning and simply treating the entire mtd as continuous block device with offsets
> 13:27 jow: like you would e.g. dd an x86 image to the blockdev
> 13:28 jow: ah, just remebered this: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw
> 13:28 jow: something like this could be a solution for sysupgrade
> 13:28 jow: not sure if its still compatible to 5.4 but if yes the we wouldn't even need a patch at all
> 13:29 adrianschmutzler: mtd-rw is in packages repo and works fine AFAIK
> 13:29 jow: a slightly less dangerous solution would be a kernel module or some other facility which would allow us to create mtd partitions on demand, not sure if something like this is possible
> 13:30 jow: one could thne simply "spwan" a new mtd partition for the flash area relevant for sysupgrade, then "mtd write" in there
> 13:30 adrianschmutzler: I just hesitate to include and enable a tool like that by default
> 13:30 jow: while "masking" other sensible parts like bootloader etc. to avoid bricks if something goes wrong
> 13:33 jow: hm well, yeah I understand your concerns
> 13:33 jow: but on x86 for example you can also happily dd over your entire disk
> 13:33 jow: granted, it is way easier to recover, but still...
> 13:34 adrianschmutzler: I will push that forward to the Mikrotik people anyway, as they seem to be the most affected at the moment. Maybe they have time to play around with idea going into that direction
> 13:34 adrianschmutzler: "Mikrotik people" -> contributors working with Mikrotik devices, not the vendor of course
> 13:37 jow: I could imagine that adding mtd-rw by defualt and adding some kind of sysctl to trigger it would be good enough
> 13:37 adrianschmutzler: together with Tomasz on the Redboot front there should be enough people interested
> 13:37 adrianschmutzler: hmm, I really have to think about that
> 13:37 jow: that sysctl could then be toggled by "mtd unlock" which never actually worked for me when I needed it
> 13:38 jow: or some other equivalent commend to not break existing "mtd unlock" users
> 13:38 jow: *command
> 13:38 adrianschmutzler: I fear this may end up in a lot of scripts then, where people will use it to take shortcuts ...
> 13:39 adrianschmutzler: but if it solves the problem, we should consider it
> 13:40 adrianschmutzler: because maintaining these patches won't become easier
> 13:41 jow: I agree
> 14:12 rmilecki: adrianschmutzler: jow: omg, that parial erase size crap again?
> 14:13 jow: rmilecki: we've been discussing its original intnetion and possible, non-invasive alternatives
> 14:14 jow: rmilecki: current working idea is to ship some kind of mtd unlock mechanism and handling the overlapping write/erase things in userspace
> 14:15 jow: rmilecki: (Link: https://github.com/jclehner/mtd-rw)https://github.com/jclehner/mtd-rw plus some clever scripting should already achive that
> 14:15 adrianschmutzler: rmilecki: the problem is that this partial erase stuff is needed for the redboot devices and all of the mikrotik ones as well (which currently all have "broken" sysupgrade)
> 14:16 rmilecki: the problem is it exists in a current form
> 14:16 rmilecki: afair it enables those partial wries for all ops, without checking if access is aligned or not
> 14:18 rmilecki: adrianschmutzler: i see jow posted few improvement ideas, do you want to implement them? if so, as part of submitted patch? or on top of that (later)?
> 14:21 adrianschmutzler: rmilecki: most of jow's improvement were about style. however, in (Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef)https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46b5889cc2c54bac7d7e727a44d28a298df23cef kernel changes partition handling anyway
> 14:21 rmilecki: oh, that one is a hure change
> 14:21 adrianschmutzler: so, I don't think it's worth to care too much about style when we have to do a different implementation for next stable kernel anyway.
> 14:21 rmilecki: it's in some recent kernel though, right?
> 14:21 adrianschmutzler: and the existing patch is tested and known to be working on the affected devices
> 14:22 rmilecki: adrianschmutzler: ok, i agree, let's just make it work
> 14:22 adrianschmutzler: haven't checked explicitly, but somewhere after 5.4 of course
More information about the openwrt-devel
mailing list