nslookup failures with coarse CLOCK_MONOTONIC

Uwe Kleine-König uwe+openwrt at kleine-koenig.org
Fri Oct 7 16:04:25 PDT 2022


Hello,

on a TP-Link RE200 v1 (platform = ramips/mt7620) I experience often:

  root at ares:~# nslookup www.openwrt.org
  Server:		127.0.0.1
  Address:	127.0.0.1:53

  Non-authoritative answer:
  www.openwrt.org	canonical name = wiki-01.infra.openwrt.org
  Name:	wiki-01.infra.openwrt.org
  Address: 2a03:b0c0:3:d0::1af1:1

  *** Can't find www.openwrt.org: No answer

I narrowed the problem down to the following:

nslookup creates and sends two querys (for A and AAAA) using 
res_mkquery(). Each query has a more or less random ID and nslookup 
matches the received responses using these IDs to the sent querys.

Looking at the sent queries using tcpdump, I saw that in the above 
scenario the two IDs are identical. Then nslookup matches the first 
received answer to the first query and discards the second reply, as 
it's matched to the already handled first query, too.

In a few cases where both lookups succeed, I saw the following pairs of IDs:

17372 37373
40961 60961
45955 419
47302 1766

Musl does the following to create the 16 bit ID:

          /* Make a reasonably unpredictable id */
          clock_gettime(CLOCK_REALTIME, &ts);
          id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff;
          q[0] = id/256;
          q[1] = id;

(from musl's src/network/res_mkquery.c) My hypothesis now is that the 
monotonic clock has a resolution of 20 µs only. So if the two 
res_mkquery() calls are called within the same 20 µs tick, the IDs end 
up being identical. If they happen in two consecutive ticks, the IDs 
have a delta of 20000 or 20001 which matches the four cases observed above.

To improve the situation I suggest something like:

diff --git a/src/network/res_mkquery.c b/src/network/res_mkquery.c
index 614bf7864b48..78b3095fe959 100644
--- a/src/network/res_mkquery.c
+++ b/src/network/res_mkquery.c
@@ -11,6 +11,7 @@ int __res_mkquery(int op, const char *dname, int 
class, int type,
         struct timespec ts;
         size_t l = strnlen(dname, 255);
         int n;
+       static unsigned int querycnt;

         if (l && dname[l-1]=='.') l--;
         if (l && dname[l-1]=='.') return -1;
@@ -34,6 +35,8 @@ int __res_mkquery(int op, const char *dname, int 
class, int type,

         /* Make a reasonably unpredictable id */
         clock_gettime(CLOCK_REALTIME, &ts);
+       /* force a different ID if mkquery was called twice during the 
same tick */
+       ts.tv_nsec += querycnt++;
         id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff;
         q[0] = id/256;
         q[1] = id;

Would that make sense?

Note I'm not subscribed to the musl mailing list, so please Cc: me on 
replies.

Best regards
Uwe



More information about the openwrt-devel mailing list