A Method of Breaking Git

Mon Oct 23 19:02:53 PDT 2023

Differences between source code handling systems have some influence on
development approach.  Some approaches which work well in one version
control systems work poorly in others.  Due to this I would like to
highlight one technique which works extremely poorly in Git.

Different Linux kernel versions sometimes need differences in kernel
configuration.  As such kernel configurations are being named
config-X.YY, but the update process causes problems.  Roughly for every
kernel version change two individual commits are being done:

git rm config-5.10
git commit -m 'remove old Linux 5.10 kernel configuration'

cp config-5.15 config-6.1
git add config-6.1
git commit -m 'copy Linux 5.15 kernel configuration to Linux 6.1 configuration'

The problem with this the history of "config-5.10" is buried and
effectively lost.  While "config-6.1" is created without any history.
This breaks most of Git's functionality and makes many development tasks
*much* harder.

For instance `git blame config-6.1` will attribute almost all lines to
the copy commit.  For minor contributors, `git rebase` loses all context
and even the simplest patch needs manual intervention.  For larger
outside projects, `git merge` won't figure out the new filenames and
merge to the correct files.

Git can be told to do more searching, but this is massively slower.
Running `git blame` on a file in a huge (1GB) repository with 30 years of
history is a 10 second operation.  Running
`git blame --find-copies-harder` on OpenWRT's repository is a >20 minute
operation.

Perhaps the core members of OpenWRT rarely use much of Git's
functionality, but this does greatly impact others.  Anyone noticed
CONFIG_FUSION=y in target/linux/x86/config-6.1?  At first sight that
seems an awfully strange option to enable, but if you're willing to spend
time it turns out there is good reason.

There are several ways to address this problem.  As such let me suggest
two.

The core of the problem is the updating being in two separate commits.
One solution is to simply combine the steps:

git rm config-5.10
cp config-5.15 config-6.1
git add config-6.1
git commit -m 'remove Linux 5.10 configuration and add Linux 6.1 configuration'

Effectively this attaches the history of config-5.10 to config-6.1.  This
is suboptimal since you end up with two intertwined histories instead of
a single history.  Yet this is good enough to preserve Git functionality
in a useful state.

Another approach is to have files named "config" and "config.old" which
then have symbolic links with version numbers attached (Git supports
symbolic links).  This would lead to version changing commit series along
the lines of:

git rm config-5.10 config.old
git commit -m 'remove old Linux 5.10 kernel configuration'

ln -s config config-6.1
git add config-6.1
git commit -m 'copy Linux 5.15 kernel configuration to Linux 6.1 configuration'

cp config config.old
rm config-5.15
ln -s config.old config-5.15
git add config.old config-5.15
git commit -m 'split Linux 5.15 and 6.1 configurations apart'

Theory being "config" ends up holding the full history of the file,
whereas the symbolic links record which kernel versions currently apply.
The two later steps could be combined if there was a preference to avoid
temporarily shared configurations.

Other approaches would work too.  These are simply two candidates which
would work to address the issue.  As someone who expects tools to work,
choosing actions which break an otherwise fine tool is troublesome.  I
doubt I'm the only one effected by this.

-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg at m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445