[OpenWrt-Devel] Firmware loading in user space via hotplug (procd) is racy, radios don’t come up at first boot

Mark Mentovai mark at moxienet.com
Mon Jan 14 13:38:49 EST 2019


User-space firmware loading is handled by hotplug in procd. It’s directed
by /etc/hotplug.json. Paraphrasing:

[
[ "case", "ACTION", {
"add": [
[ "if",
[ "has", "FIRMWARE" ],
[
[ "exec", "/sbin/hotplug-call", "%SUBSYSTEM%" ],
[ "load-firmware", "/lib/firmware" ],
[ "return" ]
]
]
],
} ],
]

hotplug-call is responsible for locating the correct firmware and placing
it in /lib/firmware. load-firmware instructs hotplug to feed the firmware
from /lib/firmware to the kernel via the appropriate protocol.

There’s no synchronization between the hotplug-call exec and load-firmware
actions, so they’re effectively done in parallel. A long-running
hotplug-call is sure to be missed by load-firmware.

I discovered this in an ath79 build for Netgear WNDR3x00 (3700, 3700v2,
3800) devices. At first boot following a sysupgrade, the radios don’t come
up, although they’re functional on subsequent boots. Calibration data for
the radios is requested by owl-loader via the firmware mechanism.
/etc/hotplug.d/firmware/10-ath9k-eeprom is able to pull it out of the art
MTD partition, but this is fairly time-consuming on first boot, taking
around 15 seconds of wall clock time. 10-ath9k-eeprom does eventually place
a file in /lib/firmware where it’s usable on subsequent boots, but this
isn’t finished until well after the load-firmware action runs (and doesn’t
find any firmware).

As a local workaround, for now, I’ve gotten rid of the load-firmware action
and replaced hotplug-call with another script that runs hotplug-call and
then, in sequence, follows the firmware protocol itself. This works well
(although it’s still somewhat troubling that it takes 15 seconds per radio
to read less than 4kB each from art) but it strikes me that it just papers
over a procd bug.

Is there any reason that procd doesn’t waitpid() for hotplug-call before
advancing to load-firmware? It seems like a problem that load-firmware
doesn’t wait for an exec that precedes it. As it stands now, it’s likely
that when load-firmware runs, firmware either won’t be present or will only
have been partially written to /lib/firmware.

ath79 relies much more heavily on the firmware loading mechanism for
calibration data, where ar71xx handled it very differently. I expect that
this affects a number of other ath79 devices, as well as devices on other
platforms that rely on hotplug-call to locate firmware.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20190114/045c4da8/attachment.htm>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list