There are lots of people interested in converting their git
logs into beautiful JSON or JSON Lines for archive and analytics. It seems like it should be easy enough, but it is deceptively more complicated than it needs to be.
When I got a feature request for jc
to support git log
output, my first instinct was to look into the robust git --format
options. At first glance it seemed like a simple format string like this should work:
git log --format='{"hash": "%H", "author": "%an", "subject": "%s", "body": "%b", "date": %at}'
The problem is that git
does not do any escaping when using those format variables. This will generate invalid JSON if there are newline characters or other special characters like quotation marks inside the data.
I found several other solutions to the problem using custom scripts, but unfortunately some require installing interpreters like Node.js, some require specific git log ‑‑format
options, or didn’t fully support options like ‑‑stat
or ‑‑shortstat
. Some solutions still did not even fully solve the string escaping issue.
A Better git log
Parser
I decided jc
would be a great git log
parser. jc
already supports around 100 other commands so this is right in jc
‘s wheelhouse. I wanted to make the jc
parser for git log
as easy to use as the other parsers (e.g. jc git log
), but also support more advanced git
format and statistics options.
In addition, I wanted to support both JSON and JSON Lines conversion. git
logs can become huge over time, so being able to emit JSON Lines can reduce the memory overhead that would be incurred by generating a huge JSON array of logs.
Finally, I wanted to add calculated timestamps (naive and time zone aware) to make the output more useful in scripts.
The new git log
standard and streaming parsers are now bundled with jc
. They work just like any other jc
parser and support several git log
options as well as ‑
‑format
and ‑
‑stat
. No need to worry about escaping special characters or using a specific format string. It just works out of the box!‑
‑shortstat
Here’s an example using both the fuller
format option along with full stats using
.‑
‑stat
$ git log --format=fuller --stat | jc --git-log -p [ { "commit": "af2c06cd284352eb47c44f2387d4600b1b322cbd", "author": "Kelly Brazil", "author_email": "kellyjonbrazil@gmail.com", "date": "Sun May 15 22:28:12 2022 -0700", "commit_by": "Kelly Brazil", "commit_by_email": "kellyjonbrazil@gmail.com", "commit_by_date": "Sun May 15 22:28:12 2022 -0700", "stats": { "files_changed": 1, "insertions": 2, "deletions": 2, "files": [ "docs/parsers/pip_show.md" ] }, "message": "doc update", "epoch": 1652678892, "epoch_utc": null }, { "commit": "67a4c6f797dfeaba2ba50222e879bf4fb58678f4", "author": "Kelly Brazil", "author_email": "kellyjonbrazil@gmail.com", "date": "Sun May 15 22:23:00 2022 -0700", "commit_by": "Kelly Brazil", "commit_by_email": "kellyjonbrazil@gmail.com", "commit_by_date": "Sun May 15 22:23:00 2022 -0700", "stats": { "files_changed": 2, "insertions": 4, "deletions": 4, "files": [ "jc/parsers/pip_show.py", "tests/fixtures/generic/pip-show-multiline-license.json" ] }, "message": "add initial \\n to first line of multiline fields", "epoch": 1652678580, "epoch_utc": null }, ... ]
You could also use the magic syntax for the above example:
jc
‑
p git log‑
‑format=fuller‑
‑stat
Or, to output JSON Lines, use the streaming parser:
$ git log --format=fuller --stat | jc --git-log-s {"commit":"af2c06cd284352eb47c44f2387d4600b1b322cbd","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:28:12 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:28:12 2022 -0700","stats":{"files_changed":1,"insertions":2,"deletions":2,"files":["docs/parsers/pip_show.md"]},"message":"doc update","epoch":1652678892,"epoch_utc":null} {"commit":"67a4c6f797dfeaba2ba50222e879bf4fb58678f4","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:23:00 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:23:00 2022 -0700","stats":{"files_changed":2,"insertions":4,"deletions":4,"files":["jc/parsers/pip_show.py","tests/fixtures/generic/pip-show-multiline-license.json"]},"message":"add initial \\n to first line of multiline fields","epoch":1652678580,"epoch_utc":null} ...
Of course, other format options, like oneline
, short
, medium
, and full
are supported, as well as
. Check out the docs for all of the supported options. (standard and streaming)‑
‑shortstat
In the end, I believe it would be better for there to be a JSON output option built-into git
, but until then, there is jc
.
Happy parsing!