[PATCH V3 1/3] base-files: sysupgrade: add tar.sh with helpers for building archives

Thu Feb 29 20:54:18 PST 2024

Hello Rafał,

Sorry for the late review.

I know this will be used for a limited context but it can be
constructed in such a way it could be used for more cases without
increasing its complexity.

> From: Jo-Philipp Wich <jo at mein.io>
>
> This allows building uncompressed tar archives from shell scripts (and
> compressing them later if needed)
>
> Signed-off-by: Rafał Miłecki <rafal at milecki.pl>
> ---
> V2: Simplify dd in __tar_print_padding (I still think helper is useful)
>     Hardcode 0/0/ root/root for now as most likely it'll be enough
>     Simplify name validation (leasing slash)
>     Reorder some variables
> V3: Fix dd in __tar_print_padding
>     Rename functions
>     Drop unused functions
>     Document usage
>
>  package/base-files/files/lib/upgrade/tar.sh | 71 +++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100644 package/base-files/files/lib/upgrade/tar.sh
>
> diff --git a/package/base-files/files/lib/upgrade/tar.sh b/package/base-files/files/lib/upgrade/tar.sh
> new file mode 100644
> index 0000000000..a9d1d559e6
> --- /dev/null
> +++ b/package/base-files/files/lib/upgrade/tar.sh
> @@ -0,0 +1,71 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later OR MIT
> +
> +# Example usage:
> +#
> +# {
> +#         tar_print_member "date.txt" "It's $(date +"%Y")"
> +#         tar_print_trailer
> +# } > test.tar
> +
> +__tar_print_padding() {
> +       dd if=/dev/zero bs=1 count=$1 2>/dev/null
> +}
> +
> +tar_print_member() {
> +       local name="$1"
> +       local content="$2"
> +       local mtime="${3:-$(date +%s)}"
> +       local mode=644
> +       local uid=0
> +       local gid=0
> +       local size=${#content}
> +       local type=0
> +       local link=""
> +       local username="root"
> +       local groupname="root"
> +
> +       # 100 byte of padding bytes, using 0x01 since the shell does not tolerate null bytes in strings
> +       local pad=$'\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1'

\1 is still a valid filename character. Shell cannot save a string
with \0 but printf can handle it nicely (as well as pipes). If you
build the header using a function at the moment you use it, you could
avoid the \1 hack.

print_header() {
   local filename=$1
   local mode=$2
   ...
   local checksum=$7
   ...
   printf '%s' "$filename"; printf '\x00%.0s' $(seq $((100 - ${#filename})))
   ...
}

print_header "$filename" "$mode" \
   "$uid" "$gid" .. | hexdump -ve '1/1 "%u

If you don't like the printf trick to repeat a char, you could use
__tar_print_padding (or use printf for both).

You could call it twice, first without a checksum to calculate it and
then a new one to generate the final output.

> +
> +       # validate name (strip leading slash if present)
> +       name=${name#/}
> +
> +       # truncate string header values to their maximum length
> +       name=${name:0:100}

It feels strange to simply truncate the name. If you need to truncate
it, you are implying it might be larger. In that case, it should fail
if it cannot fit a specific filename. Otherwise, you could end up
overwriting two filenames with the same first 100 bytes. And BTW,
shouldn't ustar support 255 or 256 filenames?

> +       link=${link:0:100}
> +       username=${username:0:32}
> +       groupname=${groupname:0:32}
> +
> +       # construct header part before checksum field
> +       local header1="${name}${pad:0:$((100 - ${#name}))}"
> +       header1="${header1}$(printf '%07d\1' $mode)"
> +       header1="${header1}$(printf '%07o\1' $uid)"
> +       header1="${header1}$(printf '%07o\1' $gid)"
> +       header1="${header1}$(printf '%011o\1' $size)"
> +       header1="${header1}$(printf '%011o\1' $mtime)"
> +
> +       # construct header part after checksum field
> +       local header2="$(printf '%d' $type)"
> +       header2="${header2}${link}${pad:0:$((100 - ${#link}))}"
> +       header2="${header2}ustar  ${pad:0:1}"
> +       header2="${header2}${username}${pad:0:$((32 - ${#username}))}"
> +       header2="${header2}${groupname}${pad:0:$((32 - ${#groupname}))}"

It seems to be a strange ustar format. Shouldn't it have an ustar
version before the username? And Device major/minor after groupname,
followed by more 150 filename prefix? It might still work if the
reader interprets it as a v7.

> +       # calculate checksum over header fields
> +       local checksum=0
> +       for byte in $(printf '%s%8s%s' "$header1" "" "$header2" | tr '\1' '\0' | hexdump -ve '1/1 "%u "'); do
> +               checksum=$((checksum + byte))
> +       done
> +

Maybe use a single line like:

checksum=$(print header... | hexdump -ve '1/1 "%u\n"' | awk '{s+=$1}
END {print s}')

?

> +       # print member header, padded to 512 byte
> +       printf '%s%06o\0 %s' "$header1" $checksum "$header2" | tr '\1' '\0'
> +       __tar_print_padding 183
> +
> +       # print content data, padded to multiple of 512 byte
> +       printf "%s" "$content"
> +       __tar_print_padding $((512 - (size % 512)))
> +}
> +
> +tar_print_trailer() {
> +       __tar_print_padding 1024
> +}
> --
> 2.35.3
>