Easily Convert git log Output to JSON

There are lots of people interested in converting their git logs into beautiful JSON or JSON Lines for archive and analytics. It seems like it should be easy enough, but it is deceptively more complicated than it needs to be.

When I got a feature request for jc to support git log output, my first instinct was to look into the robust git --format options. At first glance it seemed like a simple format string like this should work:

git log --format='{"hash": "%H", "author": "%an", "subject": "%s", "body": "%b", "date": %at}'

The problem is that git does not do any escaping when using those format variables. This will generate invalid JSON if there are newline characters or other special characters like quotation marks inside the data.

I found several other solutions to the problem using custom scripts, but unfortunately some require installing interpreters like Node.js, some require specific git log ‑‑format options, or didn’t fully support options like ‑‑stat or ‑‑shortstat. Some solutions still did not even fully solve the string escaping issue.

A Better git log Parser

I decided jc would be a great git log parser. jc already supports around 100 other commands so this is right in jc‘s wheelhouse. I wanted to make the jc parser for git log as easy to use as the other parsers (e.g. jc git log), but also support more advanced git format and statistics options.

In addition, I wanted to support both JSON and JSON Lines conversion. git logs can become huge over time, so being able to emit JSON Lines can reduce the memory overhead that would be incurred by generating a huge JSON array of logs.

Finally, I wanted to add calculated timestamps (naive and time zone aware) to make the output more useful in scripts.

The new git log standard and streaming parsers are now bundled with jc. They work just like any other jc parser and support several git log ‑format options as well as ‑stat and ‑shortstat. No need to worry about escaping special characters or using a specific format string. It just works out of the box!

Here’s an example using both the fuller format option along with full stats using ‑stat.

$ git log --format=fuller --stat | jc --git-log -p
[
  {
    "commit": "af2c06cd284352eb47c44f2387d4600b1b322cbd",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Sun May 15 22:28:12 2022 -0700",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Sun May 15 22:28:12 2022 -0700",
    "stats": {
      "files_changed": 1,
      "insertions": 2,
      "deletions": 2,
      "files": [
        "docs/parsers/pip_show.md"
      ]
    },
    "message": "doc update",
    "epoch": 1652678892,
    "epoch_utc": null
  },
  {
    "commit": "67a4c6f797dfeaba2ba50222e879bf4fb58678f4",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Sun May 15 22:23:00 2022 -0700",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Sun May 15 22:23:00 2022 -0700",
    "stats": {
      "files_changed": 2,
      "insertions": 4,
      "deletions": 4,
      "files": [
        "jc/parsers/pip_show.py",
        "tests/fixtures/generic/pip-show-multiline-license.json"
      ]
    },
    "message": "add initial \\n to first line of multiline fields",
    "epoch": 1652678580,
    "epoch_utc": null
  },
  ...
]

You could also use the magic syntax for the above example: jc p git log ‑format=fuller ‑stat

Or, to output JSON Lines, use the streaming parser:

$ git log --format=fuller --stat | jc --git-log-s
{"commit":"af2c06cd284352eb47c44f2387d4600b1b322cbd","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:28:12 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:28:12 2022 -0700","stats":{"files_changed":1,"insertions":2,"deletions":2,"files":["docs/parsers/pip_show.md"]},"message":"doc update","epoch":1652678892,"epoch_utc":null}
{"commit":"67a4c6f797dfeaba2ba50222e879bf4fb58678f4","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:23:00 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:23:00 2022 -0700","stats":{"files_changed":2,"insertions":4,"deletions":4,"files":["jc/parsers/pip_show.py","tests/fixtures/generic/pip-show-multiline-license.json"]},"message":"add initial \\n to first line of multiline fields","epoch":1652678580,"epoch_utc":null}
...

Of course, other format options, like oneline, short, medium, and full are supported, as well as ‑shortstat. Check out the docs for all of the supported options. (standard and streaming)

In the end, I believe it would be better for there to be a JSON output option built-into git, but until then, there is jc.

Happy parsing!

Published by kellyjonbrazil

I'm a cybersecurity and cloud computing nerd.

Leave a Reply

%d