[OpenWrt-Devel] [PATCH] Fix 'Dropping frame due to full tx queue' for Ralink wifi get stuck.

N.Leiten nickleiten at gmail.com
Fri Sep 11 08:43:21 EDT 2015


In email dated Пятница - 11 сентября 2015 13:49:26 user Felix Fietkau wrote:
> On 2015-09-11 13:33, N.Leiten wrote:
> > Fix instability of Ralink WiFi general queue management on high load.
> > rt2x00 driver logs in dmesg "Dropping frame due to full queue ..." several times and at some point get stuck.
> > 
> > Solutions in patch:
> > 1) Increasing number of frames in each TX queue helps with speed and
> > decreases queue overflows. Actually 256 frames can be increased to
> > 512 (this number of frames used in proprietary drivers for every
> > queue).
> 512 frames seems to be overly excessive. Ever heard of bufferbloat?
> It seems to me that the driver should simply call ieee80211_stop_queue
> earlier to ensure that mac80211 does not attempt to fill the queues as
> much. The driver queue length should probably be around 64 or less.
> 

I agree. The patch was made for internal use on own hardware reference design in our company, so I had to look through proprietary and opensource drivers to see the difference. The point was to use recent versions of OpenWRT with mac80211 stack because proprietary still uses linux-2.6. And it makes porting full of pain in nails and bloody eyes reading that piece of code. For now mac80211 is preferrable.
As to bufferbloat - yes I'm aware of it. Maybe I'm wrong in observations, but simply increasing queue num to 128 frames make driver more stable but not enough, so I started digging in ralink drivers for maximum value, tried it and it increased stability even more, but still hang at some point. So I decreased it to 256. Well, I'll try to remove this part and test again.

> > 2) Setting number of frames in TX/RX queues to equal values
> > resolves async speed behaviour with better RX on AP-side (uplink from
> > STAs), where it needs to be at least equal or better on TX queue on
> > AP (download to STA).
> That also doesn't make much sense to me as queueing behavior is
> completely different between rx and tx. Rx queue processing speed is
> determined by CPU speed, Tx queue processing speed is determined by
> shared medium speed.
> 

I agree, that was another step in digging. As I can understand it's better even use NAPI for RX queue. But I found only one mention of it in mac80211 sources, need to explore this.
I'll remove it, test again and make new patch. Can you direct me with this kind of behaviour? On rt3092 (2T2R) we can get 30-40Mbit/s download and 80-120Mbit/s upload speeds in separate tests, on 1T1R and 2T2R stations.

> > 3) In rt2x00mac.c additional check for queue
> > full added and reassignment in this case, so interface will not drop
> > frame.
> Why? If the queues are full, it's better to just drop packets instead of
> making the problem worse.
> 

It was the simpliest solution at this point. I'll try to use ieee80211_stop_queue instead, but don't know how much time it'll consume.
Maybe there's problem in "kick/pause" queue mechanism? I managed to explore behaviour after hang-point. It seems that only QID_AC_* queues stuck, station assoc/auth goes fine but actual data not transmitted on AP-side, but it receives DHCP-requests from station and it looks like no network access on hanged AP.

> > 4) Fixes in queue initialization. Default values for AC_BK,
> > AC_BE, AC_VI, AC_VO set from WMM.
> Why do you hardcode that stuff inside the driver? What difference do
> these values make?
> 
> > Tested on RT3883, RT5350, MT7620 SoCs and on RT3092 pcie interface for 10 days.
> > 
> > I'm planning to send this patch to mac80211 soon, but need to be sure that it works on other Ralink/Mediatek platforms and it's appropriate to do so.
> > 
> > 
> > Signed-off-by: Nick Leiten <nickleiten at gmail.com>
> > diff --git a/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch b/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch
> > new file mode 100644
> > index 0000000..9239bec
> > --- /dev/null
> > +++ b/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch
> > @@ -0,0 +1,142 @@
> > +Only in compat-wireless-2015-03-09/drivers/net/wireless/rt2x00: limit
> > +diff -c -r compat-wireless-2015-03-09-orig/drivers/net/wireless/rt2x00/rt2800mmio.c compat-wireless-2015-03-09/drivers/net/wireless/rt2x00/rt2800mmio.c
> > +*** compat-wireless-2015-03-09-orig/drivers/net/wireless/rt2x00/rt2800mmio.c	2015-06-16 13:02:30.000000000 +0300
> > +--- compat-wireless-2015-03-09/drivers/net/wireless/rt2x00/rt2800mmio.c	2015-09-04 11:50:09.665148666 +0300
> > +***************
> > +*** 700,706 ****
> > +  
> > +  	switch (queue->qid) {
> > +  	case QID_RX:
> > +! 		queue->limit = 128;
> > +  		queue->data_size = AGGREGATION_SIZE;
> > +  		queue->desc_size = RXD_DESC_SIZE;
> > +  		queue->winfo_size = rxwi_size;
> > +--- 700,706 ----
> > +  
> > +  	switch (queue->qid) {
> > +  	case QID_RX:
> > +! 		queue->limit = 256;
> > +  		queue->data_size = AGGREGATION_SIZE;
> > +  		queue->desc_size = RXD_DESC_SIZE;
> > +  		queue->winfo_size = rxwi_size;
> > +***************
> > +*** 711,717 ****
> > +  	case QID_AC_VI:
> > +  	case QID_AC_BE:
> > +  	case QID_AC_BK:
> > +! 		queue->limit = 64;
> > +  		queue->data_size = AGGREGATION_SIZE;
> > +  		queue->desc_size = TXD_DESC_SIZE;
> > +  		queue->winfo_size = txwi_size;
> > +--- 711,717 ----
> > +  	case QID_AC_VI:
> > +  	case QID_AC_BE:
> > +  	case QID_AC_BK:
> > +! 		queue->limit = 256;
> > +  		queue->data_size = AGGREGATION_SIZE;
> > +  		queue->desc_size = TXD_DESC_SIZE;
> > +  		queue->winfo_size = txwi_size;
> Wrong patch style, please run make package/mac80211/refresh.
> 
Ok, will fix it.
> - Felix
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list