summaryrefslogtreecommitdiffstatshomepage
path: root/content/11/index.md
blob: e356ee53d56385753c31e213f03236828c0ef634 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
+++
date = 2021-08-11T16:41:49+02:00
title = "Better hunk headers with gitattributes(5)"

[taxonomies]
tags = ["git", "TIL"]
+++

Yesterday whilst catching up on the Git mailing list I stumbled upon this
[patch](https://public-inbox.org/git/20210810190937.305765-1-tsdh@gnu.org/)
proposing to improve the hunk header regex for Java. I had never paid much
attention to how [`git-diff(1)`](https://git-scm.com/docs/git-diff) finds the
right method signature to show in the headers though I was vaguely aware of a
bunch of regexes for different languages.

Turns out that by default, as explained in the manual for
[`gitattributes(5)`](https://git-scm.com/docs/gitattributes#_defining_a_custom_hunk_header),
`git-diff(1)` emulates the behaviour of GNU `diff -p` and does **not** consult
any of the language-specific regular expressions. This came as a bit of a
surprise to me, as Git usually has relatively sane and extensive defaults. Why
define all these regexes and then not use them by default?

Perhaps one reason is that it is hard to tell when to use which. Git can only
look at the filename, and not all shell scripts share the `.sh` ending, for
example. Surely it would not be too invasive, however, to define sensible
defaults for, say, files ending in `.py` or `.rs`.

In any case I updated my `~/.config/git/attributes` with the following, and am
now enjoying better hunk headers across the board:

```
*.c	diff=cpp
*.cpp	diff=cpp
*.go	diff=go
*.md	diff=markdown
*.pl	diff=perl
*.py	diff=python
*.rs	diff=rust
*.sh	diff=bash
*.tex	diff=tex
```

The markdown setting is especially neat since it will now display the nearest
section right in the diff, like so:

```diff
--- a/posts/weltschmerz.md
+++ b/posts/weltschmerz.md
@@ -24,6 +24,10 @@ ## Download
```