[PATCH v2] ubusd: convert tx_queue to linked list

Thu Mar 25 22:56:14 GMT 2021

On 2021-03-25 21:43, Arnout Vandecappelle wrote:
> 
> 
> On 25/03/2021 10:48, Felix Fietkau wrote:
>> 
>> On 2021-03-24 13:27, Arnout Vandecappelle (Essensium/Mind) wrote:
>>> ubusd maintains a per-client tx_queue containing references to message
>>> buffers that have not been sent yet (due to the socket blocking). This
>>> is a fixed-size, 64-element queue.
>>>
>>> When more than 64 elements are queued, subsequent elements are simply
>>> dropped. Thus, a client that is waiting for those messages will block
>>> indefinitely. In particular, this happens when more than +- 250 objects
>>> are registered on the bus and either "ubus list" or "ubus wait_for" is
>>> called. The responses to these requests consist of a message buffer per
>>> object. Since in practice, ubusd will not yield between the sends of
>>> these message buffers, the client has no time to process them and
>>> eventually the output socket blocks. After 64 more objects, the rest is
>>> dropped, including the final message that indicates termination. Thus,
>>> the client waits indefinitely for the termination message.
>>>
>>> To solve this, turn the tx_queue into a variable-sized linked list
>>> instead of a fixed-size queue.
>> In order to reduce memory allocation overhead, I would propose the
>> following:
>> 
>> struct ubus_msg_backlog {
>> 	struct ubus_msg_backlog *next;
>> 	struct ubus_msg_buf *msg[UBUSD_CLIENT_BACKLOG];
>> 	int tail;
>> };
> 
>  This saves space by open-coding a single-linked list rather than using the
> normal double-linked list. This comes at the cost of iterating over the entire
> list in order to append to it.
> 
>  It also saves space by dropping the offset member. But for that, we don't need
> to go to this extra structure.
> 
>  Applying those to "optimisations" to struct ubus_msg_buf_list would make that
> one 8 bytes (compared to 16 now). So that's 8 bytes per queued buffer, in
> addition to the 24 bytes of struct ubus_msg_buf and the actual buffer itself,
> which you can expect to be hundreds of bytes.
> 
>  struct ubus_msg_backlog is 24 bytes (assuming UBUSD_CLIENT_BACKLOG is 4),
> that's 6 bytes per element. So not much gained here. Increasing
> UBUSD_CLIENT_BACKLOG will decrease the overhead for large backlog, but gives a
> much bigger overhead for smaller backlog. So not a clear win.
UBUSD_CLIENT_BACKLOG is currently 32. I'm not so much counting bytes for
the optimization, I'm more interested in reducing the number of memory
allocations. This helps reduce memory allocator overhead and fragmentation.

>  Finally, the backlog should normally be empty. It's pretty unusual for a ubus
> message to take more then one write to transfer.
> 
>  In conclusion:
> - I think your proposal doesn't save much;
> - and it complicates things quite a bit;
> - in addition, the single-linked list potentially poses significant time overhead.
With a reasonable upper limit for the maximum number of messages and the
current UBUSD_CLIENT_BACKLOG of 32, the time overhead for walking the
pointers is actually insignificant. If it ever becomes significant, we
can simply add a tail pointer to struct ubus_client and still save space.
I also expect it to be less than the malloc overhead for lots of
single-message list entries.

>  If you want to save a few bytes, it does make sense to move the offset back to
> struct ubus_client.
> 
>  If you *really* want to save space, and time as well, it would be better to
> optimise ubusd_handle_lookup. That currently sends a separate, relatively small,
> message for every object. The overhead of the struct ubus_msg_buf dwarfs the
> overhead of struct ubus_msg_buf_list by quite a lot, and in addition there's
> overhead on the wire as well. It shouldn't be too hard to change ubus_lookup_cb
> to handle a list rather than a single object.
> 
>  Maybe I should have gone down that path. I hadn't thought of it at the time.
I think that's a good idea.
>>> The txq_off variable was used to keep track of which part of the current
>>> message was already written, in case only a partial write was possible.
>>> We take this opportunity to also move that variable under the
>>> ubus_msg_buf_list structure (and rename it to "offset", and change its
>>> type to size_t). This makes it clearer that it is related to that
>>> particular queued message and not to the queue as a whole.
>>>
>>> Note that this infinite tx_queue opens the door to a DoS attack. You can
>>> open a client and a server connection, then send messages from the
>>> client to the server without ever reading anything on the server side.
>>> This will eventually lead to an out-of-memory. However, such a DoS
>>> already existed anyway, it just requires opening multiple server
>>> connections and filling up the fixed-size queue on each one. To protect
>>> against such DoS attacks, we'd need to:
>> I don't fully agree with your reasoning regarding the potential for DoS
>> attacks. It's true that the potential for *intentional* DoS attacks
>> exists in the current code already. What I'm worried about with this
>> patch is that you're adding extra potential for *accidental* DoS attacks
>> (i.e. excessive ubusd memory use for hanging processes receiving events).
> 
>  Well, *that* DoS attack already exists, and is the reason I need to fix this
> issue :-) "ubus list" hangs if there are more than 250-ish objects. This implies
> that netifd will eventually hang as well, because it does a "ubus wait_for"
> (which is also a lookup).
I fully agree that the issue you're pointing out needs fixing. What
you're describing isn't really the "excessive ubusd memory use"
accidental DoS that I was describing.
Let's not trade one accidental DoS for another, but fix both instead :)

>>> - keep a global maximum queue size that applies to all rx and tx queues
>>>   together;
>>> - stop reading from any connection when the maximum is reached;
>>> - close any connection when it hasn't become writeable after some
>>>   timeout.
>> I think we should have both a local and a global maximum queue size.
>> Other than that, I agree with the above.
> 
>  "Should", as in "won't merge unless this is implemented"?
> 
>  So, what's next? I'm willing to move the offset back to the client structure,
> but not much more than that. Well, in a pinch, I could rework the lookup method
> as well, but then I first want confirmation that this is an acceptable solution.
> It changes the wire format, is that acceptable?
The data structure changes that I pointed out are not a blocker for me.
I do want to see at least a per-client message limit (in order to not
regress on accidental DoS). A global one would be nice too, but is not a
must-have. I do agree with your proposal regarding lookup method rework,
and wire format changes are acceptable too, as long as the libubus API
doesn't change.

The wire format change should probably be something generic like adding
support for having a single ubus message contain an array of data blobs
(and having libubus call data_cb() for each element of that array).
That way its use is not limited to lookups, it could also be used by
other ubus clients for sending multiple messages with less overhead.
The changes for this could probably be very simple without any major
refactoring.

Thanks,

- Felix