How to pretend that some lines aren't versioned in git

Caption for How to pretend that some lines aren't versioned in git

Have ever you been in a situation where you want to track files in git, but keep getting some changes that you actually would like to ignore?

Well, I had, otherwise I wouldn’t be writing this post.

Bad setup calls for desperate needs

The first time was for me years ago as a developer in a company that had a large php codebase. We had some declarative database schema definition files, and some local command that would take these definition files and generate some php classes that would offer ORM-like access to the database. Something like this:

<?php

// Please do not edit this file directly
// Generated by this invented-here script that you aren't allowed to fix
// Generation time: 2022-12-01T12:32:43.000Z just FYI

class TheCustomer {
  public $email;

  public function getEmail() {
      return $this->email;
  }
}

Whenever the schema changes, we would re-run the command to generate all ORM php classes. Except that this command had the brilliant idea to insert the generation time in a comment on top of the file. Not just the one that needed an update. ALL OF THEM. So if at any point two people happen to run this command locally and push their changes, it would result in many dozens of conflicting files that all appear to have edited the same line with different values. And there were more than 20 people working on this codebase daily.

This setup was clearly suboptimal, and there are solutions around this. The obvious one is making any “make like” file generation deterministic, especially if the output is versioned. But in this work context things had quite some momemtum and I quickly understood that I’d have to live with this situation for a while. That’s when I started to see if I could ignore changes on some files, even though they were versioned. What I wanted is basically a kind of gitignore but just for lines of files 😇. There was of course no such thing, but I remember starting to locally play with git update-index –assume-unchanged. This tells git to ignore the worktree version of the file. It actually avoided conflicts when pulling, but then I would also not get any change synced to disk. And if I wanted myself to commit something that would be another challenge. Long story short: fail.

There was no success story for me there. I assume someone eventually fixed the generation script. Sorry for the dramatic intro.

2022: Out of nowhere, the problem comes back

One could think that this kind of weird need for partially git-ignored files was a one-time thing, due to the specific context in which I was at the time. Yet, I am again cornered by something similar right now when trying to manage some dotfiles of my Linux laptop using git, which seems to be what the entire world is doing these days.

I started earlier this year to go bold and basically versioned my home directory in git. It might sound like madness, but rest assured that most of the folders are actually ignored. My goal is mostly to capture the settings files in the root (~/) and config (~/.config/) folders of my home. This way I can add files that I deem important to track, and I also see them directly as updated in case they get updated by something else, like for instance when changing the settings from a software UI. This proves to work quite well. Except for like 3-4 pieces of software who put both state data and configuration values into the same file.

Let’s take the configuration file for Nomacs image viewer in ~/.config/nomacs/Image Lounge.conf:

[General]
firstTime=false
geometry=@ByteArray(\x1\xd9\xd0\xcb\0\x3\0\0\0\0\x5\0) #...
geometryNomacs=@Rect(1282 24 636 1003)

[AppSettings]
appMode=0
defaultJpgQuality=97

[CustomShortcuts]
Exit=Ctrl+Q

[GlobalSettings]
lastDir=/home/simon/Pictures
recentFiles=/home/simon/Pictures/pic1.jpg, /home/simon/Pictures/pic2.jpg
recentFolders=/home/simon/Pictures, /home/simon/Pictures/subfolder

[SynchronizeSettings]
checkForUpdates=false
disableUpdateInteraction=false

# Many more things down the line

First of all yes, the config file has a space in it. And also yes, I had to add a custom shortcut for [ctrl+q] to quit the app. But most importantly some stuff related to the window geometry is stored there. As well as the last opened files. This means that this file being versioned, each use of Nomacs will pretty much result in a local change from git’s point of view. This is super annoying because I never know whether to just keep the repo dirty, or to keep endlessly committing changes, or to keep resetting to HEAD as a lost fight against Nomacs. The last option is to give up and not version it, but it also contains some settings that I’d like to keep track of like custom shortcuts, preferred theme, slideshow behavior etc.

It looks like just like with my php classes years before, the situation is:

I was having similar problems with a few other apps like pcmanfm, Transmission and Flameshot, each with their specificities. This is not a shame list: I am well aware that I decided myself to put those software’s settings into my dotfiles repo. But the vast majority of other tools whose settings I tracked were perfectly compatible with my git-based dotfile management. So those few exceptions would not justify a complete different strategy. Instead I started to think about ways to achieve this kind of partial git ignore. We’re talking about a 1-person dotfile management setup, so I was fine with things getting a bit hacky.

git update-index, second try

After considering some options, I figured that git update-index --skip-worktree [...paths] is a pretty good direction to take. It is very similar to --assume-unchanged but has subtle differences that makes it better suited for this use case.

To see it in action we can check this little shell example:

  1. Let’s init a git repo
      $ mkdir test && cd test
      $ git init
      Initialized empty Git repository in /test/.git/
    
  2. Now we’re saving the current date & time into a file, before committing
    $ date > now.txt
    $ git add now.txt
    $ git commit -m 'first commit'
    [main (root-commit) 985db33] first commit
    1 file changed, 1 insertion(+)
    create mode 100644 now.txt
    
  3. Updating now.txt obviously makes it dirty:
    $ date > now.txt
    $ git status
    On branch main
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
      modified:   now.txt
    
  4. But setting the skip-worktree flag prevents this!
    $ git update-index --skip-worktree now.txt
    $ git status
    On branch main
    nothing to commit, working tree clean
    
  5. Note that it did NOT reset the file like git checkout would. It just ignores it
    $ date > now.txt
    $ git status
    On branch main
    nothing to commit, working tree clean
    
  6. Unsetting the flag makes local changes visible to git again
    $ git update-index --no-skip-worktree now.txt
    $ git status
    On branch main
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
      modified:   now.txt
    

This is pretty cool to see this in action, as I always thought that once a file is tracked in git, it would never let it change without noticing. But this is obviously too much: instead of just ignoring the uninteresting, forever-changing parts of the files, I am now only making myself blind of any change that can happen.

That’s pretty much where I stopped a few years back, but what if I could instead smartly toggle this skip-worktree flag on and off depending on whether an interesting change occured in them? For this we would need:

  1. An explicit list of non-cooperative files to go through
  2. Some way to define which lines to ignore in each of those
  3. Some way to identify that something else has changed.

For 1, this could just be a file that we’ll call index-file and can be a csv, yml, json or anything.

For 2, since we have the file for 1 already, we can just add some more data to each item of the list. That would weigh in favor of the index-file being in a structured format like json. Instead of saving line numbers, some regular expressions that indicate the lines we don’t care about seem more suited to my usecase.

For 3, the idea would be to generate a checksum of the file content without all the lines that are flagged as ignored. This means that any non-ignored change would update this checksum. The checksum itself could also be stored as additional data for each item of the index file.

I therefore came up with an index-file format that looks like this:

[
  {
    "path": ".config/nomacs/Image Lounge.conf",
    "ignoredPatterns": [
      "^geometry=",
      "^geometryNomacs=",
      "^lastDir=",
      "^recentFiles=",
      "^recentFolders="
    ],
    "fingerprint": "6f32b9ed799a33cba1df381ae4e2c61f9d31bb0a"
  },
  {
    "path": ".config/pcmanfm/default/pcmanfm.conf",
    "ignoredPatterns": [
      "^win_width",
      "^win_height"
    ],
    "fingerprint": "cc682e34b3407d113b04b9dd7f9409bcdd7ad88a"
  }
]

This is pretty simple given the usecase described above. Now what’s left is to have a program that, when invoked, goes through the list of files described by this index-file. For each, it would compute the sha1 sum (I just decided that sha1 would do the job just fine) of all the lines that don’t match one of the ignored patterns. Then it would compare the obtained sha1 with the stored version.

Note that in this latter case, git diff will then surface both the ignored lines as well as the “interesting” changes.

A last problem to solve is a way to update the index-file with the new fingerprint if some non-ignored changes occured. We could do it automatically when the fingerprint changes, but it would render the thing horribly non-idempotent: a first call would show a difference and remove the skip-worktree flag and set the new fingerprint. Another call right away would this time consider it identical and ignore the file again. Instead, I would handle an explicit --write flag for the program that I could call after having checked myself that the new state is the new normal.

When it comes to the program that would do all that, wait no longer: find it in this gist.

I wrote this in Python because I consider it present on any machine on which I would to do such things, and the standard library provides way enough to do anything it needs without extra requirements.

When to invoke?

The first few spins of this script are super promising. This new setup basically achieved exactly what I wanted. But the skip-worktree flags are stateful. Each time the script is invoked, all further changes on those non-cooperatives dotfiles will be ignored (if set so) or not (if not set so). Only inkoking it again reconciles the result of git status with what I expect given the state of the files. It needs to somehow be called at the right time.

As it turns out, running this as part of the pre-commit hook is quite ideal for me. It bascially makes sure that it is run as often as I commit on my dotfile repo. And the script is setup to exit with a non-zero code in case one of the files shows some non-ignored changes, which would block my commit and force me to realize. Plus we are only talking about the rare case where I intentionaly or not change a valuable setting in one of the 3-4 apps that are concerned.

In some other setups adding this to the post-receive hook would also make sure to check again after pulling some remote changes.

Conclusion

This project is meant to stay a little hack, because ignoring some parts of some files is not a road one should willingly want to follow. But it is always rewarding to solve a personal problem by hooking together some great open source tools.


Thanks for reading. Comments are welcome on the Hacker News thread