[OpenWrt-Devel] latency on PPPoA ADSL Annex A on using Lantiq

Leonid Evdokimov leon at darkk.net.ru
Sun Aug 7 05:33:00 EDT 2016


On Wed, Aug 3, 2016 at 10:02 AM, Martin Blumenstingl
<martin.blumenstingl at googlemail.com> wrote:
> I did not have time to look into this yet, but maybe Hauke or John
> have an idea what's going on.

I have a testcase, maybe it'll be useful to debug the issue.

brie.darkk.net.ru (139.162.172.21) is server hosted in London.
2.30.18.250 is ADSL router in UK.

Non-empty TCP packets are delayed till TCP RTO timer fires starting
with the third datagram.
I get following RTT estimates using tcp-echo, different `--delay`
between packets does not change the result:

brie $ ./mturtt --host 2.30.18.250 --port 4242 --count 10 --tcp --size
1400 --delay 1
sz      seq rtt     there   back
1400    1   0.036   0.006   0.030
1400    2   0.036   0.006   0.030
1400    3   0.261   0.006   0.255 # woops! plus TCP RTO
1400    4   0.263   0.007   0.256
1400    5   0.264   0.006   0.258
1400    6   0.265   0.007   0.258
1400    7   0.266   0.006   0.260
1400    8   0.267   0.006   0.260
1400    9   0.266   0.006   0.260
1400    10  0.265   0.006   0.259

Running busy-loop at the router with `ss  -4t 'sport = 4242' -ieom`
reports following timer inforation:

ESTAB   0   0      2.30.18.250:4242   139.162.172.21:53328   ino:42363
sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0
ESTAB   0   0      2.30.18.250:4242   139.162.172.21:53328   ino:42363
sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0
ESTAB   0   0      2.30.18.250:4242   139.162.172.21:53328   ino:42363
sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0·
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,230ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,220ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,200ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,190ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,180ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,160ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,150ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,140ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,130ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,110ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,100ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,090ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,080ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,060ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,050ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,040ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,020ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(persist,010ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04
qack:16 cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(on,230ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15
bidir cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(on,220ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15
bidir cwnd:10 reordering:0
ESTAB   0   1400   2.30.18.250:4242   139.162.172.21:53328
timer:(on,210ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15
bidir cwnd:10 reordering:0
ESTAB   0   0      2.30.18.250:4242   139.162.172.21:53328   ino:42363
sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0

Same behavior is observed with small payload (50 bytes).

Looking at tcpdump[1] recorded at the router:

00:00:00.000000 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[S], seq 2422862909, win 29200, options [mss 1460,sackOK,TS val
1141259943 ecr 0,nop,wscale 7], length 0
00:00:00.000314 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[S.], seq 4132184401, ack 2422862910, win 28800, options [mss
1452,sackOK,TS val 3539818 ecr 1141259943,nop,wscale 3], length 0
00:00:00.022541 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[.], ack 1, win 229, options [nop,nop,TS val 1141259950 ecr 3539818],
length 0
00:00:00.022999 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[P.], seq 1:51, ack 1, win 229, options [nop,nop,TS val 1141259950 ecr
3539818], length 50
00:00:00.023192 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[.], ack 51, win 3600, options [nop,nop,TS val 3539824 ecr
1141259950], length 0
00:00:00.029527 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[P.], seq 1:51, ack 51, win 3600, options [nop,nop,TS val 3539826 ecr
1141259950], length 50 #### ~6ms to echo
00:00:00.052116 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[.], ack 51, win 229, options [nop,nop,TS val 1141259959 ecr 3539826],
length 0
00:00:00.553865 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[P.], seq 51:101, ack 51, win 229, options [nop,nop,TS val 1141260110
ecr 3539826], length 50
00:00:00.556725 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[P.], seq 51:101, ack 101, win 3600, options [nop,nop,TS val 3539957
ecr 1141260110], length 50 #### ~3ms
00:00:00.579130 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[.], ack 101, win 229, options [nop,nop,TS val 1141260117 ecr
3539957], length 0
00:00:01.080645 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[P.], seq 101:151, ack 101, win 229, options [nop,nop,TS val
1141260267 ecr 3539957], length 50
00:00:01.120859 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[.], ack 151, win 3600, options [nop,nop,TS val 3540099 ecr
1141260267], length 0
00:00:01.304887 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags
[P.], seq 101:151, ack 151, win 3600, options [nop,nop,TS val 3540145
ecr 1141260267], length 50 #### ~224ms!
00:00:01.327310 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags
[.], ack 151, win 229, options [nop,nop,TS val 1141260341 ecr
3540145], length 0
...

`strace` shows no evidence[2] that it's userspace delay, the echo
server gets the data promptly and responds within couple of
milliseconds.
TCP_NODELAY is turned on, TCP_CORK guard is added to tcp server code
to get clear timestamps in strace.

[1] https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf#file-trace-2-pcap-txt
[2] https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf#file-trace-2-strace

tc qdisc is default one and does not report drops, moreover, changing
qdisc to plain fifo does not help:

qdisc fq_codel 0: dev pppoa-wan root refcnt 2 limit 1024p flows 1024
quantum 1496 target 5.0ms interval 100.0ms ecn
 Sent 10859227 bytes 18757 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

I've also run testcase using the link with different RTT (thus
different RTO), got same results — the third and following packets are
delayed by the router for one RTO so it does not look like hardcoded
Nagle's delay. It looks like TCP retransmission, but the first packet
is never logged. It's not captured by `tcpdump` or corresponding
`iptables -j LOG` rule. The kernel has no /proc/net/snmp, so I can't
tell if the RetransSegs counter is bumped.

You can get the tools I use to reproduce the issue at
https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf

UDP and ICMP are not affected. I've also seen some issues in IP
fragmentation/reassembly of over-MTU UDP datagrams, but they were not
reproducible with another ISP.

Versions:
MOTD: DESIGNATED DRIVER (Bleeding Edge, r49162)
uname: Linux OpenWrt 4.4.6 #1 Fri May 13 12:56:05 UTC 2016 mips GNU/Linux

-- 
WBRBW, Leonid Evdokimov
xmpp:leon at darkk.net.ru && http://darkk.net.ru && tel:+79816800702
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list