Routers go 'rogue' randomly with recent trunk build
Leon Busch-George
leon at georgemail.eu
Tue Sep 23 12:04:12 PDT 2025
It looks as if the underlying problem is a memory leak that just spontaneously started to occur in existing (untouched for years) code linking against ubus, ubox, json-c, and openssl.
Sadly, there also appears to be a problem with musl+valgrind that makes it a lot harder to track down the cause. Everything I tried using valgrind on segfauls before the main is entered. It looks to me as if a strcpy corrupts valgrind's memory. This is what it looks like:
$ valgrind --leak-check=full --track-origins=yes busybox echo
==24829== Memcheck, a memory error detector
==24829== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==24829== Using Valgrind-3.23.0 and LibVEX; rerun with -h for copyright info
==24829== Command: /bin/sh
==24829==
==24829== Conditional jump or move depends on uninitialised value(s)
==24829== at 0x408CBAC: stpcpy (in /lib/libc.so)
==24829== by 0x408D054: strcpy (in /lib/libc.so)
==24829== Uninitialised value was created by a stack allocation
==24829== at 0x40A1FF8: ??? (in /lib/libc.so)
==24829==
==24829==
==24829== Process terminating with default action of signal 4 (SIGILL)
==24829== Illegal opcode at address 0x4097F1
==24829== at 0x4097F1: ??? (in /bin/busybox)
==24829== by 0x4025CDC: ??? (in /lib/libc.so)
The memory at that address is:
00000000 63 7f 67 07 d3 08 d4 09 d5 23 b4 00 1a 65 24 00 |c.g......#...e$.|
00000010 65 09 92 04 |e...|
No idea whether this memory corruption also occurs without valgrind running (why wouldn't it?).
I'm trying to let this rest for a few days.
On Mon, 22 Sep 2025 20:29:42 +0200
Leon Busch-George <leon at georgemail.eu> wrote:
> Hi,
>
> I've noticed that, in the week after updating to 125c974bf7ee, a
> sizeable amount of routers stops working, seemingly at random times
> of day.
>
> Addressing is (still?) intact but connections to the router don't
> work. In this state, WiFi is broken as well (bridged to ethernet).
> Interestingly, the behaviour is slightly different on ath79/ramips:
>
> On ath79, a TCP connection to SSH is accepted. All traffic is ACK'ed
> but not answered by dropbear (.pcap attached). On ramips, the
> connection is just refused.
>
> There's a daemon sending out broadcast probes at a regular interval.
> At least 'hopefully' there is. That traffic also can't be seen from
> the outside.
>
> Is anyone else seeing something like this?
>
> kind regards,
> Leon
More information about the openwrt-devel
mailing list