[PATCH v2] ubusd: convert tx_queue to linked list

Thu Mar 25 20:43:38 GMT 2021

On 25/03/2021 10:48, Felix Fietkau wrote:
> 
> On 2021-03-24 13:27, Arnout Vandecappelle (Essensium/Mind) wrote:
>> ubusd maintains a per-client tx_queue containing references to message
>> buffers that have not been sent yet (due to the socket blocking). This
>> is a fixed-size, 64-element queue.
>>
>> When more than 64 elements are queued, subsequent elements are simply
>> dropped. Thus, a client that is waiting for those messages will block
>> indefinitely. In particular, this happens when more than +- 250 objects
>> are registered on the bus and either "ubus list" or "ubus wait_for" is
>> called. The responses to these requests consist of a message buffer per
>> object. Since in practice, ubusd will not yield between the sends of
>> these message buffers, the client has no time to process them and
>> eventually the output socket blocks. After 64 more objects, the rest is
>> dropped, including the final message that indicates termination. Thus,
>> the client waits indefinitely for the termination message.
>>
>> To solve this, turn the tx_queue into a variable-sized linked list
>> instead of a fixed-size queue.
> In order to reduce memory allocation overhead, I would propose the
> following:
> 
> struct ubus_msg_backlog {
> 	struct ubus_msg_backlog *next;
> 	struct ubus_msg_buf *msg[UBUSD_CLIENT_BACKLOG];
> 	int tail;
> };

 This saves space by open-coding a single-linked list rather than using the
normal double-linked list. This comes at the cost of iterating over the entire
list in order to append to it.

 It also saves space by dropping the offset member. But for that, we don't need
to go to this extra structure.

 Applying those to "optimisations" to struct ubus_msg_buf_list would make that
one 8 bytes (compared to 16 now). So that's 8 bytes per queued buffer, in
addition to the 24 bytes of struct ubus_msg_buf and the actual buffer itself,
which you can expect to be hundreds of bytes.

 struct ubus_msg_backlog is 24 bytes (assuming UBUSD_CLIENT_BACKLOG is 4),
that's 6 bytes per element. So not much gained here. Increasing
UBUSD_CLIENT_BACKLOG will decrease the overhead for large backlog, but gives a
much bigger overhead for smaller backlog. So not a clear win.

 Finally, the backlog should normally be empty. It's pretty unusual for a ubus
message to take more then one write to transfer.

 In conclusion:
- I think your proposal doesn't save much;
- and it complicates things quite a bit;
- in addition, the single-linked list potentially poses significant time overhead.

 If you want to save a few bytes, it does make sense to move the offset back to
struct ubus_client.

 If you *really* want to save space, and time as well, it would be better to
optimise ubusd_handle_lookup. That currently sends a separate, relatively small,
message for every object. The overhead of the struct ubus_msg_buf dwarfs the
overhead of struct ubus_msg_buf_list by quite a lot, and in addition there's
overhead on the wire as well. It shouldn't be too hard to change ubus_lookup_cb
to handle a list rather than a single object.

 Maybe I should have gone down that path. I hadn't thought of it at the time.

> 
> To struct ubus_client add these:
> 
> 	struct ubus_msg_backlog *backlog;
> 	int backlog_head;
> 
> After sending messages from tx_queue, you pull in messages from
> cl->backlog, incrementing backlog_head.
> Once cl->backlog_head == backlog->tail, you set cl->backlog to
> backlog->next and free backlog.
> 
>> To maintain the linked list, an additional structure ubus_msg_buf_list
>> is created. We could also have added the linked list to the ubus_msg_buf
>> struct itself, but it is not relevant in the many other uses of the
>> ubus_msg_buf struct so it would just complicate things.
> Adding the linked list to ubus_msg_buf doesn't work, because a single
> message can be queued for multiple receivers. This mistake was already
> made by previous attempts at introducing a linked list for messages.

 Right, ubus_msg_free doesn't actually free it, it decreases the refcount. I
forgot that. Good thing I didn't implement it that way then :-)

>> The txq_off variable was used to keep track of which part of the current
>> message was already written, in case only a partial write was possible.
>> We take this opportunity to also move that variable under the
>> ubus_msg_buf_list structure (and rename it to "offset", and change its
>> type to size_t). This makes it clearer that it is related to that
>> particular queued message and not to the queue as a whole.
>>
>> Note that this infinite tx_queue opens the door to a DoS attack. You can
>> open a client and a server connection, then send messages from the
>> client to the server without ever reading anything on the server side.
>> This will eventually lead to an out-of-memory. However, such a DoS
>> already existed anyway, it just requires opening multiple server
>> connections and filling up the fixed-size queue on each one. To protect
>> against such DoS attacks, we'd need to:
> I don't fully agree with your reasoning regarding the potential for DoS
> attacks. It's true that the potential for *intentional* DoS attacks
> exists in the current code already. What I'm worried about with this
> patch is that you're adding extra potential for *accidental* DoS attacks
> (i.e. excessive ubusd memory use for hanging processes receiving events).

 Well, *that* DoS attack already exists, and is the reason I need to fix this
issue :-) "ubus list" hangs if there are more than 250-ish objects. This implies
that netifd will eventually hang as well, because it does a "ubus wait_for"
(which is also a lookup).

>> - keep a global maximum queue size that applies to all rx and tx queues
>>   together;
>> - stop reading from any connection when the maximum is reached;
>> - close any connection when it hasn't become writeable after some
>>   timeout.
> I think we should have both a local and a global maximum queue size.
> Other than that, I agree with the above.

 "Should", as in "won't merge unless this is implemented"?

 So, what's next? I'm willing to move the offset back to the client structure,
but not much more than that. Well, in a pinch, I could rework the lookup method
as well, but then I first want confirmation that this is an acceptable solution.
It changes the wire format, is that acceptable?

 Regards,
 Arnout