[PATCH] umbim: fix invalid mbim message string encoding

Tue May 10 03:57:34 PDT 2022

Daniel Danzberger <daniel at dd-wrt.com> writes:

> Strings in mbim messages have to follow these formatting rules:
>  - 4 byte alignment, padded if not.

ack. MBIM spec section 10.3:

    All fields (either static or variable size) start at a 4-byte
    boundary. Hence, all field offsets shall be multiples of 4. Between
    the end of a field (not necessarily at a 4-byte boundary) and the
    beginning of the next field (always at a 4-byte boundary), padding
    may be required. Padding bytes must be set to 0 upon transmission
    and ignored upon reception.
    ..
    If the size of the payload in the variable field is not a multiple
    of 4 bytes, the field shall be padded up to the next 4 byte
    multiple. This shall be true even for the last payload in
    DataBuffer.

>  - utf-16 little endian.

ack. MBIM spec section 10.4:

    Unless otherwise specified, all strings use UNICODE UTF-16LE
    encodings limited to characters from the Basic Multilingual
    Plane. Strings shall not be terminated by a NULL character.

> +	/* convert to utf-16 little endian */
>  	for (i = 0; i < l; i++)
> -		p[i * 2] = in[i];
> +		p[i * 2] = htole16(in[i]);
>  
>  	return 0;
>  }

This new code is buggy.  It fails on BE and makes no difference on
LE. Both p and in are byte arrays. The byte you write into p on a BE
system will be aither 0x00 or 0xff depending on the MSB of in[i].

The previous code was sort of correct, assuming that the input is mostly
ascii mapping to the same value in unicode. Ideally we should do real
utf16le conversion of the input. But then someone has to decide if the
input is utf8 or something else first.  And you need a real utf16
implementation.  Not sure it's worth it.  Did you ever see an operator
use a non-ascii APN, username, password or pin code?

Bjørn