Conclusions from CVE-2024-3094 (libxz disaster)
Daniel Golle
daniel at makrotopia.org
Sun Mar 31 09:46:22 PDT 2024
On Sun, Mar 31, 2024 at 12:05:03PM +0200, Thibaut wrote:
>
> > Le 31 mars 2024 à 01:07, Elliott Mitchell <ehem+openwrt at m5p.com> a écrit :
> >
> >> Normally upstream publishes release tarballs that are different than the
> >> automatically generated ones in GitHub. In these modified tarballs, a
> >> malicious version of build-to-host.m4 is included to execute a script
> >> during the build process.
> >
> > So the malicious source code was part of all tarballs, but only the
> > tarballs with the modified `build-to-host.m4` would trigger the malicious
> > payload.
> >
> > So obtaining GitHub's tarballs which came directly from the Git
> > repository *does* avoid the breach.
>
> https://git.tukaani.org/?p=xz.git;a=commitdiff;h=f9cf4c05edd14dedfe63833f8ccbe41b55823b00
>
> Let’s not lure ourselves into thinking that not using upstream-provided tarballs but upstream-provided repo instead is inherently safer. With adversarial upstream, *nothing* is safe anyway.
Just using git checkouts (or **repoducible** tarballs generated from a
repo's git-ref, ie. tag or commit) by itself of course doesn't help
much.
But for myself, maintaining a medium 2-digit number of packages, using
git checkouts (or **reproducible** tarballs generated from git
checkouts) would mean that I can at least be sure that the git
commits I've been seeing and the diff between version tags **would
really correspond to the content of tarball**, without having to put
extra work just into that (which imho nobody does).
I've never claimed that this alone is the solution, but if we are
already used to
a) the content of a release tarball not matching the git repo
(because of `make dist` autotools nonsense, for example),
b) the hash of such tarball being different depending on who generates
it with subtle difference such as the folder name,
c) people all the time "fix" PKG_MIRROR_HASH without anyone having
any option to validate the cause for the "wrong" hash in first
place.
Then the added security of PKG_HASH and esp. PKG_MIRROR_HASH is very
small. Too small, if you ask me. And other than the complex
social/economical/political problems which lead to something like the
xz backdoor (out of question: those are the bigger problems), that's a
technical problem we could quite easily improve **and it would have
been sufficient to prevent the attack** in this case.
There is a reason the attacker(s) went through great lengths to move
the official mirror site of the project, change the PGP key and hide
the key piece of the exploit in the tarballs they generated (and
signed) instead of in a git commit. This is not by chance.
What we need is "Reproducible Source/Release Tarballs", not as a
solution to all our problems, but as a **pre-condition** which
currently isn't met for obvious reasons.
Hence I'm still arguing that the lesser resource use of downloading
Github archive/codeload/release tarballs is not worth the loss of
integrity and audit-trail of git.
Yes, I know SHA-1 is outdated, but in the context of git it's not so
easy to add lots of random padding which would be required to generate
a hash collission, which has yet to be seen even for contexts with
much more freedom than the narrow syntax of a git diff (and commit
message). So sure, it's not perfect, but it's better than nothing.
And while release tarballs (being *delibertely* different from the content
of the source repo at their corresponding tag for things like an added
VERSION or ChangeLog file or stuff like that which is information the
build process could otherwise learn from .git) have some small arguable
value, hard or impossible to reproduce Github-generated tarballs really
do NOT have any value. They are an obstacle, and lure people into bad
practices such as all those "Fix PKG_MIRROR_HASH" commits which become
the norm (and should really not).
And regarding the first case (deliberately added VERSION or ChangeLog
information and such) we should aim for a **standardized** way to do
add them in a **reproducible** way. But that's a longer story, and
certainly boring and trivial, but worth debating never the less.
On the other hand, what does "maintained" actually mean in the context
of an OpenWrt package? I can be anything from
0. I'm not even using this, don't understand the language it is
written in. Just somehow ended up maintaining it.
1. I occassionally bump the version to the newest release or merge PRs
of other people suggesting that.
2. I actually validate GPG signatures while bumping the release.
3. I follow up on git history of that project between releases.
4. I have at least rough understanding of the code and purpose of each
file of that project.
5. I've contributed to that project myself in the past.
6. I at least quickly read git diff of that project between releases.
7. I study each commit at the time it is made.
[...]
up to
X. I'm the author, I've written that code, I know the reason for every
line of code to be there.
Obviously also (X) is also kinda problematic, the sweet-spot is
somewhere around (6) or (7) imho. But I must admit that also for most
packages I maintain the level of maintainance is often closer to (4),
sometimes just (2).
However, we should probably define that in some kind of "maintainer
guideline" somewhere, and give maintainers the options to communicate
(as a self-assessment) the level of maintainance they put into a
package.
>
> And even when upstream repo isn’t entirely under adversarial control, a bad actor can sneak stuff in:
> https://github.com/libarchive/libarchive/commit/6110e9c82d8ba830c3440f36b990483ceaaea52c
I've seen that, and by itself it does not present a security risk in
the context libarchive is intended to be used.
libarchive is not thread-safe, has never been intended to be used in a
multi-threaded context. Probably the actor just wanted to find out how
well suggested changes are being reviewed and how deep the knowledge of
the reviewers goes...
More information about the openwrt-adm
mailing list