Mesh bug: rough routes causing blackhole
Federico Capoano
f.capoano at openwisp.io
Mon Sep 27 16:35:02 PDT 2021
Hi everyone,
I am quite confident there is a bug affecting mesh mode when mesh
interfaces are bridge with the LAN.
I have noticed this bug since 21.02 rc1. but it also happens on the
latest master of OpenWrt (today) and the latest master of hostapd.
Somehow the mac addresses of devices connected to the LAN end up in
the routing table of the mesh interfaces of the root node (the one
which has internet connection) with an invalid next hop.
If this goes on unchecked, after a tipping point I have not been able
to identify, packets are not routed to the root node anymore, even
though the mesh link is up and running.
It took me months to understand in detail what was going on, because
there's no error or warning logged to syslog, the only symptom I have
been able to observe consistently is the routing table of the mesh
filled with routes that look like garbage.
EG:
iw mesh1 mpath dump
DEST ADDR NEXT HOP IFACE SN METRIC QLEN EXPTIME DTIM
DRET FLAGS HOP_COUNT PATH_CHANGE
90:38:3d:dd:42:61 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
3a:bd:b8:ec:00:da 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
2e:bc:05:55:1b:fe 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
b6:75:f9:d4:75:21 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
2a:81:a7:88:b2:3d 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
38:cb:3b:46:a3:b0 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
1c:48:1f:b9:ff:2f 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
a4:e1:c5:1d:0b:67 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
0c:33:7e:b6:6f:aa 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
e4:fa:0b:af:d5:eb 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
32:e7:70:d6:00:b8 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
16:90:cc:07:c7:f4 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
f8:71:4d:52:77:92 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
0e:7e:0d:96:51:37 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
9a:bb:4e:36:2a:bc 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
a4:79:30:19:69:3c 00:00:00:00:00:00 mesh1 0 0 0 0 1600 4 0x0 0 0
[...cut...(there were more)...]
I am not the only one affected by this, there's a detailed discussion here:
https://forum.openwrt.org/t/mesh-802-11s-routing-table-gets-filled-with-garbage-causing-a-black-hole-openwrt-21-02-rc4-mt7603e-mt7615e/104808
I would be very grateful if anyone could advise how to collect more
detailed data which would help me to identify where the bug is
originating from, so that I can send a proper bug report and keep
investigating to nail down what's going on (it's incomprehensible to
me why something like this is happening).
Best regards
Federico Capoano
More information about the openwrt-devel
mailing list