[OpenWrt-Devel] Sysupgrade possibly broken in recent development snapshots: "message": "Firmware image couldn't be validated"

Hannu Nyman hannu.nyman at iki.fi
Thu Jan 2 10:48:08 EST 2020

Petr Štetiar kirjoitti 1.1.2020 klo 22.46:
> Petr Novák <petrn at me.com> [2020-01-01 21:11:30]:
>> But how come the workaround was to use an older libubox and ubus - was there
>> any new check which was not there before?
> I don't have definitive answer, as I would need RPi-4 (or any other real
> hardware with Cortex-A72 core) to find the actual bit in the libubox which
> caused this change in the behavior, but here is a part of the commit
> description[1] which might help answering that:
>   It seems like the recent fixes in the libubox library, particulary in
>   the jshn sub-component (which empowers json_dump used in the shell
>   script executed by the child process) made the execution somehow faster,
>   thus exposing this racy behaviour in the validate_firmware_image_call at
>   least on RPi-4 (Cortex-A72) target.
> As I was unable to trigger this issue even in the QEMU/Cortex-A72 I assume,
> that it was simply some kind of race, needed specific timing, provided
> preciously only by that RPi-4 hardware.

I think that there may have been an older race condition behaviour that has 
now just surfaced better with RPi4 after the recent changes. It has earlier 
manifested itself sometimes with some routers, but more rarely.

I have seen an occasional failure of sysupgrade in one of my routers since 
October (ar71xx or ath79  / WNDR3700v2). I wrote about that to the mailing 
list in November, although then I thought that it might be just a "force" 
option failure:

Others have seen that also, based on forum discussion:

Petr Novak describes similar thing as my error as: "it does just reboot but 
does not flash anything."

I have tried to debug that in my WNDR3800 that has serial console connection, 
but have not managed to produce the error in that 3800. With 3800 the 
sysupgrade has succeeded always. However, in my 3700v2 (that has identical 
hardware except the RAM size) on the other side of the building, I still 
occasionally see the behaviour of LuCI based sysupgrade starting ok, but the 
router booting back to the same firmware after an invisible error. After that 
reboot the next sysupgrade attempt via LuCI usually works quite ok. (sounds 
like a sysupgrade from a recently booted system usually works, but 
sysupgrading a system after some runtime does sometimes not work.)

I first thought that it was related to using force in the ar71xx/ath79 jump, 
but it has been present in normal sysupgrades.

Possibly a manifestation of the same race condition in 
sysupgrade/procd/libubox, so hopefully your patches will fix also that.

