Switch issues and CI to GitHub

Sam Kuper sampablokuper at posteo.net
Wed Jan 19 07:02:28 PST 2022


On Tue, Jan 18, 2022 at 03:38:43PM +0100, Paul Spooren wrote:
> ## Bug Tracker
> 
> I looked today into migrating issues from bugs.openwrt.org over to
> GitHub.com, codeberg.org (GiTea) and todo.sr.ht (Sourcehut). [..]
>
> While sr.ht allows to import the large collection of issues, each
> message is limited to about 16.000 characters which would require us
> to truncate existing tasks and comments (and instead have them on some
> paste service). This limit is likely tied to the first class email
> support, users without an account can write to a special email address
> and create tickets without registering at all.  Try it by sending
> something to ~aparcar/openwrt-bugs-import-test-2 at todo.sr.ht
> 
> [..] If we decide to move there, tools like gh2srht[3] would allow a
> quick migration. To get a feel what the bug tracker over at sr.ht
> would look like I migrated as much as possible, feel free to have a
> look[4].

A big thank you for doing this.

Must confess: I was unaware of the ~16k issue body character limit when
I proposed SourceHut.  Did you find a public bug report or feature
request about that?  (I looked just now.  Could not find one myself, but
perhaps my search-fu is off today.)

If not, I will aim to post one, referencing this discussion thread.

Thanks again,

Sam


> A quick bug tracker conclusion, I'd be happy to use codeberg.org for
> issue tracking. Both sr.ht and codeberg.org are FOSS, GitHub not so
> much. [..]

GitHub not at all, last time I checked.


> As an immediate action, we might as well close down bugs.openwrt.org
> and open issues on GitHub.com without any migration of existing
> issues. Both users and developers already know the workflows over
> there and issues have a higher visibility. A migration away from
> GitHub over to coderberg or sr.ht is possible with much less effort
> than migrating away from flyspray.

I wish to caution against this.

Here are some reasons not to use GitHub for hosting issue/bug-reports.


# BROKEN HANDLING OF USER ACCOUNT DELETIONS

Best practice for handling user account deletions is to either:

1.  If the user is happy for a record of their contributions to remain
    attributed to them:

    Leave the username shown unchanged in the remaining webpages where
    it was used, so that at-mentions ("@username") within discussions
    still work (aren't broken), and quotations remain correctly
    attributed ("username commented MMM DD, YYYY").

    Or...

2.  If the user is *not* happy for that:

    Replace all instances of the username (at-mentions, quotation
    attributions, etc) with a non-personally-identifying pseudonym, e.g.
    "user12345".

    This, too, retains comprehensibility and avoids link-breakage.


GitHub does neither.

Instead, GitHub replaces *some but not all* instances of a deleted
user's username with "Ghost".  That can make it difficult to follow a
discussion (bug report, pull request, etc) featuring a now-deleted user.

See e.g. https://github.com/GothenburgBitFactory/taskwarrior/issues/2088
.  If you didn't know that the comments therein that are now attributed
to "Ghost" were in fact made by me, it would be a confusing discussion
to follow.

(I later closed my GitHub account due to the increasing accessibility
problems I encountered on GitHub.)

That would be bad enough.  But because *every* deleted user account is
processed this way by GitHub, it effectively conflates *all* deleted
users into one confusing account.  For instance, the "Ghost" account
here is *not* me: https://github.com/matrix-org/synapse/issues/5778  .
But a third party would be unable to know that.



This is especially problematic if more than one now-deleted user
contributed to a single discussion.  Both user's posts would now be
attributed - by GitHub's incompetence - to the same user, making it look
as though one, rather than several, people made those comments.  (I
don't have an example at hand, but I'd be amazed if this hasn't happened
several times now, given GitHub's size.)


Worse still, because GitHub is proprietary and doesn't have a good way
for users to report GitHub bugs or submit patches to fix them, bugs like
this tend to go undiscussed and unfixed for years, leading to
progressive corruption in GitHub discussions.



# BROKEN SEARCH

There is no way within GitHub to avoid irrelevant search results.  For
instance, if I search in the TaskWarrior repo for

    is:issue in:title "TW-10"

I get results like "[TW-1733] taskwarrior 2.5.0 can not compile FreeBSD
10.1", because they have a "TW" and a "10" in the title.  In other
words, GitHub fails to perform exact string matching.

Try it yourself:
https://github.com/GothenburgBitFactory/taskwarrior/issues?utf8=%E2%9C%93&q=is%3Aissue+in%3Atitle+%22TW-10%22

This makes GitHub's search feature a real pain to use.

Again, because GitHub is proprietary and lacks good ways to track or fix
GitHub bugs, ones like this go unfixed for years.



# ACCESSIBILITY, PRIVACY, AND ETHICS PROBLEMS

As previously discussed, e.g.:

https://lists.openwrt.org/pipermail/openwrt-devel/2022-January/037546.html

Understand that moving OpenWRT's issue-hosting to GitHub would make it
impossible for some users to subscribe to OpenWRT's bug tracker to
receive bug reports by email.


Also remember, Microsoft is a key player in the surveillance-industrial
complex:

https://www.theguardian.com/technology/2016/may/02/google-microsoft-pact-antitrust-surveillance-capitalism

https://theintercept.com/2020/07/14/microsoft-police-state-mass-surveillance-facial-recognition/


Sure, comments on OpenWRT issues might be public, but do you really want
OpenWRT users giving Microsoft their browser fingerprints or IP
addresses in order to participate?

(You might say: users can work around this by using Tor.  But can they?
What if they live in jurisdictions where Tor usage would get them
flagged by law enforcement?  What if GitHub blocks sign-ups from Tor?
And do you really want a situation where people have to weigh up their
threat models and what steps to take to protect themselves from OpenWRT
infrastructure because it has been outsourced to a malevolent entity?)



# PROBLEMATIC ISSUE NUMBERING AND LINK BREAKAGE

If OpenWRT were, as you said, to "open issues on GitHub.com without any
migration of existing issues", then this could lead to broken links in
OpenWRT commit messages, bug reports, and comments.

One reason for this is that the issue numbering on GitHub might not
remain coordinated with the issue numbering on bugs.openwrt.org .  For
instance, there might end up being two bug reports with the same number.

That would be like the ambiguity of "#4206", which could currently
(sadly, as a result of OpenWRT allowing pull requests on GitHub) refer
to either

https://bugs.openwrt.org/index.php?do=details&task_id=4206

or

https://github.com/openwrt/openwrt/pull/4206

but worse.


And even if OpenWRT resists, for now, the *migration* of issues to
GitHub, every additional endorsement of GitHub by OpenWRT sadly
increases the likelihood that some future OpenWRT dev/maintainer will
attempt such a migration in future - probably in ignorance of these
problems.

I have seen projects mess up such migrations quite badly, in ways that
have knock-on effects for years to come.  For instance TaskWarrior,
whose devs/maintainers did not notice that quite a lot of data
corruption and link-breakage had occurred during the migration, until it
was too late to correct because on GitHub, people had already started to
refer to issue numbers that should properly have been reserved for
existing issues.

As a result, the many references from one bug report or pull requests to
another (e.g. "Fixes #XX" or "See #YY", that sort of thing) and that
were silently auto-linked by GitHub to the wrong bug report or pull
request, could not longer be fixed without extensive effort (more than
could be spared - and so IIUC the issue still persists).  See e.g.
https://github.com/GothenburgBitFactory/taskwarrior/issues/2088 .

This is a subtle and insidious kind of corruption that GitHub makes it
hard to avoid.




> [..]
>
> ## Conclusion
> 
> From a FOSS perspective I'd skip GitHub entirely and move to Codeberg
> or sr.ht. Codeberg (Gitea) is a fine clone of GitHub and sr.ht comes
> with a great _no bloat_ attitude and priority on email integration for
> tickets and git (they created git-send-email.io).

Yes.  Those are both great options.


> [OpenWRT] community repositories are on GitHub, people are actively
> and happy contributing there

Except for the people who aren't.


> and mostly think about "how to make OpenWrt better" and less "how to
> improve our workflow and infrastructure".

Those are not contradictory goals!


Thanks again for your effort and reflections,

Sam


-- 
A: When it messes up the order in which people normally read text.
Q: When is top-posting a bad thing?

()  ASCII ribbon campaign. Please avoid HTML emails & proprietary
/\  file formats. (Why? See e.g. https://v.gd/jrmGbS ). Thank you.



More information about the openwrt-devel mailing list