Blog Feed - Brazil's Blog

Featured

JC Tips and Tricks – Part 2

URL Parsing | POSIX Path Parsing | Convert Git Logs

In my last JC Tips and Tricks post I discussed using jc as a subnet calculator, for exploring X.509 certs, and converting different types of timestamp strings.

In this post I’ll go over some more use-cases:

Parsing URLs
Parsing POSIX Paths
Exploring Git Logs

URL Parsing

URLs can be notoriously difficult to parse. URLs and URIs can be encoded and can include a surprising number of attributes, including authentication credentials, a path, a query that can include multiple values to a key, and more!

When you pipe a URL into jc it will explode all of the URL parts into a nice, tidy object. In addition, you can encode a URL or decode an already encoded URL. This allows you to easily pull any value with a tool like jq or jello.

% echo 'https://www.example.com/this%20is%20a%20path/parent/form.htm?mykey=value1&mykey=value2' | jc --url --pretty
{
  "url": "https://www.example.com/this%20is%20a%20path/parent/form.htm?mykey=value1&mykey=value2",
  "scheme": "https",
  "netloc": "www.example.com",
  "path": "/this%20is%20a%20path/parent/form.htm",
  "parent": "/this%20is%20a%20path/parent",
  "filename": "form.htm",
  "stem": "form",
  "extension": "htm",
  "path_list": [
    "this%20is%20a%20path",
    "parent",
    "form.htm"
  ],
  "query": "mykey=value1&mykey=value2",
  "query_obj": {
    "mykey": [
      "value1",
      "value2"
    ]
  },
  "fragment": null,
  "username": null,
  "password": null,
  "hostname": "www.example.com",
  "port": null,
  "encoded": {
    "url": "https://www.example.com/this%2520is%2520a%2520path/parent/form.htm?mykey=value1&mykey=value2",
    "scheme": "https",
    "netloc": "www.example.com",
    "path": "/this%2520is%2520a%2520path/parent/form.htm",
    "parent": "/this%2520is%2520a%2520path/parent",
    "filename": "form.htm",
    "stem": "form",
    "extension": "htm",
    "path_list": [
      "this%2520is%2520a%2520path",
      "parent",
      "form.htm"
    ],
    "query": "mykey=value1&mykey=value2",
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": "www.example.com",
    "port": null
  },
  "decoded": {
    "url": "https://www.example.com/this is a path/parent/form.htm?mykey=value1&mykey=value2",
    "scheme": "https",
    "netloc": "www.example.com",
    "path": "/this is a path/parent/form.htm",
    "parent": "/this is a path/parent",
    "filename": "form.htm",
    "stem": "form",
    "extension": "htm",
    "path_list": [
      "this is a path",
      "parent",
      "form.htm"
    ],
    "query": "mykey=value1&mykey=value2",
    "fragment": null,
    "username": null,
    "password": null,
    "hostname": "www.example.com",
    "port": null
  }
}

Notice that you get the original, encoded, and decoded URL information. In the example above, the path includes encoded spaces which we can decode as follows:

% echo 'https://www.example.com/this%20is%20a%20path/parent/form.htm?mykey=value1&mykey=value2' | jc --url | jq -r '.decoded.path'
/this is a path/parent/form.htm

The URL above also includes a query – but not just any query. It is perfectly legal to include the same key-name multiple times, which assigns multiple values to a single key. This is handled correctly by jc:

% echo 'https://www.example.com/this%20is%20a%20path/parent/form.htm?mykey=value1&mykey=value2' | jc --url | jq '.query_obj' 
{
  "mykey": [
    "value1",
    "value2"
  ]
}

This allows you to grab one or more values from the above:

% echo 'https://www.example.com/this%20is%20a%20path/parent/form.htm?mykey=value1&mykey=value2' | jc --url | jq -r '.query_obj.mykey[1]'
value2

You can also encode a URL so it can be safely used on the web:

% echo 'https://www.example.com/this path has/spaces in it/' | jc --url | jq -r '.encoded.url'
https://www.example.com/this%20path%20has/spaces%20in%20it/

The URL parser correctly handles authentication, IPv6, and port information as well:

% echo 'https://myusername:mypassword@[1:2::127]:8000/index.htm' | jc --url --pretty
{
  "url": "https://myusername:mypassword@[1:2::127]:8000/index.htm",
  "scheme": "https",
  "netloc": "myusername:mypassword@[1:2::127]:8000",
  "path": "/index.htm",
  "parent": "/",
  "filename": "index.htm",
  "stem": "index",
  "extension": "htm",
  "path_list": [
    "index.htm"
  ],
  "query": null,
  "query_obj": null,
  "fragment": null,
  "username": "myusername",
  "password": "mypassword",
  "hostname": "1:2::127",
  "port": 8000,
<snip>
}

Now you don’t have to worry about writing a complex regex to correctly handle strange corner cases around URL parsing in your scripts!

POSIX Path Parsing

Graphic from https://miguendes.me/python-pathlib

Sometimes you need to parse POSIX compliant paths in your scripts or even a list of paths from the $PATH environment variable. This can be a pain because there are a number of edge-cases, including the fact that paths and filenames can include spaces. the --path parser in jc will deconstruct the path into all of its parts:

% echo '/home/joeuser/git/my project/app.py' | jc --path --pretty 
{
  "path": "/home/joeuser/git/my project/app.py",
  "parent": "/home/joeuser/git/my project",
  "filename": "app.py",
  "stem": "app",
  "extension": "py",
  "path_list": [
    "/",
    "home",
    "joeuser",
    "git",
    "my project",
    "app.py"
  ]
}

The path is broken down into its various parts, including the parent, filename, stem (filename without the extension), and extension. A path_list field is also included that gives you an array of all of the path parts so you can easily pull a value:

% echo '/home/joeuser/git/my project/app.py' | jc --path | jq -r '.path_list[-2]'
my project

The --path parser works with Windows paths, too:

% echo 'C:\Windows\Program Files\xfolder\file.txt' | jc --path -pretty
{
  "path": "C:\\Windows\\Program Files\\xfolder\\file.txt",
  "parent": "C:\\Windows\\Program Files\\xfolder",
  "filename": "file.txt",
  "stem": "file",
  "extension": "txt",
  "path_list": [
    "C:\\",
    "Windows",
    "Program Files",
    "xfolder",
    "file.txt"
  ]
}

In addition, jc can parse the path list from the $PATH environment variable with the --path-list parser. You get all of the same information above but with each path added to an array:

% echo $PATH
/Users/joeuser/.pyenv/shims:/Users/joeuser/.gem/ruby/2.6.0/bin:/Users/joeuser/.local/bin:/Users/joeuser/.cargo/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

% echo $PATH | jc --path-list --pretty
[
  {
    "path": "/Users/joeuser/.pyenv/shims",
    "parent": "/Users/joeuser/.pyenv",
    "filename": "shims",
    "stem": "shims",
    "extension": "",
    "path_list": [
      "/",
      "Users",
      "joeuser",
      ".pyenv",
      "shims"
    ]
  },
  {
    "path": "/Users/joeuser/.gem/ruby/2.6.0/bin",
    "parent": "/Users/joeuser/.gem/ruby/2.6.0",
    "filename": "bin",
    "stem": "bin",
    "extension": "",
    "path_list": [
      "/",
      "Users",
      "joeuser",
      ".gem",
      "ruby",
      "2.6.0",
      "bin"
    ]
  },
<snip>
]

Finally, you can use the --slurp option which will allow you to parse a line-delimited list of paths using the --path parser:

% cat paths.txt 
/usr/local/bin
/home/joeuser
/var/log/system

% cat paths.txt | jc --path --slurp --pretty
[
  {
    "path": "/usr/local/bin",
    "parent": "/usr/local",
    "filename": "bin",
    "stem": "bin",
    "extension": "",
    "path_list": [
      "/",
      "usr",
      "local",
      "bin"
    ]
  },
  {
    "path": "/home/joeuser",
    "parent": "/home",
    "filename": "joeuser",
    "stem": "joeuser",
    "extension": "",
    "path_list": [
      "/",
      "home",
      "joeuser"
    ]
  },
  {
    "path": "/var/log/system",
    "parent": "/var/log",
    "filename": "system",
    "stem": "system",
    "extension": "",
    "path_list": [
      "/",
      "var",
      "log",
      "system"
    ]
  }
]

As you can see, parsing paths with jc gives you more control that just using basename or dirname in your scripts.

Exploring Git Log Output

One of my most popular blog posts is about converting Git log output to JSON. This provides an easier way of exploring git log output in scripts or saving the output as a structured object for later querying.

% git log --format=fuller --stat | jc --git-log --pretty
[
  {
    "commit": "4bed8392b83bc5ebc55238ab516a19cfafba4bfa",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Tue Feb 20 08:56:53 2024 -0800",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Tue Feb 20 08:56:53 2024 -0800",
    "stats": {
      "files_changed": 3,
      "insertions": 3,
      "deletions": 3,
      "files": [
        "README.md",
        "man/jc.1",
        "templates/readme_template"
      ]
    },
    "message": "update release notes section",
    "epoch": 1708448213,
    "epoch_utc": null
  },
  {
    "commit": "8e2bcba35230079c8f8c3e741f840a4e68e354af",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Wed Feb 14 15:45:18 2024 -0800",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Wed Feb 14 15:45:18 2024 -0800",
    "stats": {
      "files_changed": 2,
      "insertions": 4,
      "deletions": 4,
      "files": [
        "docs/parsers/proc.md",
        "jc/parsers/proc.py"
      ]
    },
    "message": "use get_parser instead of importlib",
    "epoch": 1707954318,
    "epoch_utc": null
  },
<snip>
]

Now you can easily query the data. For example, this is how you can get a list of commits that have more than 20 files changed:

% git log --format=fuller --stat | jc --git-log | jq '.[] | select(.stats.files_changed > 20)'
{
  "commit": "c332c4febf2bf757f662bc2cb22ec20fe0b4bcd0",
  "author": "Muescha",
  "author_email": "184316+muescha@users.noreply.github.com",
  "date": "Wed Jan 31 05:04:55 2024 +0100",
  "commit_by": "Kelly Brazil",
  "commit_by_email": "kellyjonbrazil@gmail.com",
  "commit_by_date": "Tue Feb 6 01:54:31 2024 +0000",
  "stats": {
    "files_changed": 28,
    "insertions": 1041,
    "deletions": 1,
    "files": [
      "CHANGELOG",
      "jc/lib.py",
      "jc/parsers/path.py",
      "jc/parsers/path_list.py",
      "tests/fixtures/generic/path--long.json",
      "tests/fixtures/generic/path--long.out",
      "tests/fixtures/generic/path--one.json",
      "tests/fixtures/generic/path--one.out",
      "tests/fixtures/generic/path--windows.json",
      "tests/fixtures/generic/path--windows.out",
      "tests/fixtures/generic/path--with-spaces.json",
      "tests/fixtures/generic/path--with-spaces.out",
      "tests/fixtures/generic/path_list--long.json",
      "tests/fixtures/generic/path_list--long.out",
      "tests/fixtures/generic/path_list--one.json",
      "tests/fixtures/generic/path_list--one.out",
      "tests/fixtures/generic/path_list--two.json",
      "tests/fixtures/generic/path_list--two.out",
      ".../generic/path_list--windows-environment.json",
      ".../generic/path_list--windows-environment.out",
      ".../fixtures/generic/path_list--windows-long.json",
      "tests/fixtures/generic/path_list--windows-long.out",
      "tests/fixtures/generic/path_list--windows.json",
      "tests/fixtures/generic/path_list--windows.out",
      "tests/fixtures/generic/path_list--with-spaces.json",
      "tests/fixtures/generic/path_list--with-spaces.out",
      "tests/test_path.py",
      "tests/test_path_list.py"
    ]
  },
  "message": "draft for path and path_list (#513)\n\n* draft for path_list\n\n* updaate doc\n\n* add input check\n\n* fix types\n\n* fix schema: add missing properties\n\n* add _process\n\n* fix _process docs\n\n* refactor: extract path.py parser\n\n* swap order of names alphabetically\n\n* documentation and comments\n\n* path parser: add early return for nodata\n\n* path and path-list parser: add test and fixtures\n\n* typo in file name\n\n* add early return for nodata\n\n* add test and fixtures\n\n* typo in file name\n\n* rename fixtures\n\n* rename fixtures\n\n* refactor to pathlib.Path\n\n* failing on windows - use PurePosixPath\n\n* changed the way to strip dot from suffix\n\n* add POSIX to path\n\n* test commit to see results on windows is failing\n\n* test commit to see results on windows is failing\n\n* add windows path detection\n\n* somehow Path not like the newline from input line\n\n* add test with more items\n\n* remove debug print\n\n* wrap test loops into into subTest\n\n* remove print statements\n\n* add path and path-list to CHANGELOG\n\n---------\n\nCo-authored-by: Kelly Brazil <kellyjonbrazil@gmail.com>",
  "epoch": 1706706295,
  "epoch_utc": null
}
{
  "commit": "2d5d87c73db538acdb0bcc58b8f250febe0ecbab",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "date": "Wed Jan 3 15:57:08 2024 -0800",
  "commit_by": "Kelly Brazil",
  "commit_by_email": "kellyjonbrazil@gmail.com",
  "commit_by_date": "Tue Feb 6 01:54:31 2024 +0000",
  "stats": {
    "files_changed": 21,
    "insertions": 76,
    "deletions": 21,
    "files": [
      "CHANGELOG",
      "README.md",
      "completions/jc_bash_completion.sh",
      "completions/jc_zsh_completion.sh",
      "docgen.sh",
      "docs/lib.md",
      "docs/parsers/date.md",
      "docs/parsers/datetime_iso.md",
      "docs/parsers/email_address.md",
      "docs/parsers/ip_address.md",
      "docs/parsers/jwt.md",
      "docs/parsers/semver.md",
      "docs/parsers/timestamp.md",
      "docs/parsers/url.md",
      "docs/parsers/ver.md",
      "jc/cli.py",
      "jc/lib.py",
      "man/jc.1",
      "setup.py",
      "templates/manpage_template",
      "templates/readme_template"
    ]
  },
  "message": "version bump and doc update",
  "epoch": 1704326228,
  "epoch_utc": null
}
<snip>

Or you can see all of the commits that updated the CHANGELOG file:

% git log --format=fuller --stat | jc --git-log | jq '.[] | select(.stats.files[]? | contains("CHANGELOG"))'
{
  "commit": "25085c3412eb1bf8e9c31ea9ae788c671e409585",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "date": "Wed Feb 14 15:25:22 2024 -0800",
  "commit_by": "Kelly Brazil",
  "commit_by_email": "kellyjonbrazil@gmail.com",
  "commit_by_date": "Wed Feb 14 15:25:22 2024 -0800",
  "stats": {
    "files_changed": 4,
    "insertions": 5,
    "deletions": 4,
    "files": [
      "CHANGELOG",
      "docs/parsers/iwconfig.md",
      "jc/parsers/iwconfig.py",
      "man/jc.1"
    ]
  },
  "message": "doc update",
  "epoch": 1707953122,
  "epoch_utc": null
}
{
  "commit": "6275591ef1a02612feb06ce4819809a4bf118ab1",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "date": "Mon Feb 12 21:31:10 2024 -0800",
  "commit_by": "Kelly Brazil",
  "commit_by_email": "kellyjonbrazil@gmail.com",
  "commit_by_date": "Mon Feb 12 21:31:10 2024 -0800",
  "stats": {
    "files_changed": 7,
    "insertions": 12,
    "deletions": 9,
    "files": [
      "CHANGELOG",
      "docs/parsers/proc.md",
      "jc/lib.py",
      "jc/parsers/proc.py",
      "man/jc.1",
      "setup.py",
      "templates/manpage_template"
    ]
  },
  "message": "version bump and doc updates",
  "epoch": 1707802270,
  "epoch_utc": null
}
<snip>

This is much more convenient than using grep or awk and finding the correct git format string.

That’s it for this post – more to come! Let me know if you have a favorite jc use case you would like covered in the future.

Featured

JC Tips and Tricks – Part 1

Subnet Calculator | X.509 Certificates | Date and Timestamp Conversions

It’s hard to believe it’s been about five years since I started building jc! I often get notes from users around the world about how jc has helped them out. It has been described as a Swiss Army Knife for the command line and scripting, so I thought it would be fun and helpful to describe some of my favorite use cases for jc. To keep these posts digestible, I’ll go over three use cases per post, so there will be a few more of these in the next few weeks.

In this post I’ll go over some interactive use-cases at the command line:

Subnet Calculator
Exploring X.509 Certificates and Certificate Signing Requests
Converting and pulling data from Date and Timestamp strings

Your New Favorite Subnet Calculator

I started my career as a network engineer, so calculating subnets is in my DNA. My binary math is not as good as it used to be, so sometimes it’s good to have a handy tool to break down and IP address and give you the pertinent information you’re after.

There are several subnet calculators available on the web and as CLI utilities. I found that many times they don’t show all of the details because they are optimized for interactive human consumption instead of being optimized for use in scripts. Since jc outputs in JSON, you can get the best of both worlds along with easy filtering of output with tools like jq.

Let’s take a look at a simple example:

% echo 192.168.1.10/25 | jc --ip-address --pretty 
{
  "version": 4,
  "max_prefix_length": 32,
  "ip": "192.168.1.10",
  "ip_compressed": "192.168.1.10",
  "ip_exploded": "192.168.1.10",
  "ip_split": [
    "192",
    "168",
    "1",
    "10"
  ],
  "scope_id": null,
  "ipv4_mapped": null,
  "six_to_four": null,
  "teredo_client": null,
  "teredo_server": null,
  "dns_ptr": "10.1.168.192.in-addr.arpa",
  "network": "192.168.1.0",
  "broadcast": "192.168.1.127",
  "hostmask": "0.0.0.127",
  "netmask": "255.255.255.128",
  "cidr_netmask": 25,
  "hosts": 126,
  "first_host": "192.168.1.1",
  "last_host": "192.168.1.126",
  "is_multicast": false,
  "is_private": true,
  "is_global": false,
  "is_link_local": false,
  "is_loopback": false,
  "is_reserved": false,
  "is_unspecified": false,
  "int": {
    "ip": 3232235786,
    "network": 3232235776,
    "broadcast": 3232235903,
    "first_host": 3232235777,
    "last_host": 3232235902
  },
  "hex": {
    "ip": "c0:a8:01:0a",
    "network": "c0:a8:01:00",
    "broadcast": "c0:a8:01:7f",
    "hostmask": "00:00:00:7f",
    "netmask": "ff:ff:ff:80",
    "first_host": "c0:a8:01:01",
    "last_host": "c0:a8:01:7e"
  },
  "bin": {
    "ip": "11000000101010000000000100001010",
    "network": "11000000101010000000000100000000",
    "broadcast": "11000000101010000000000101111111",
    "hostmask": "00000000000000000000000001111111",
    "netmask": "11111111111111111111111110000000",
    "first_host": "11000000101010000000000100000001",
    "last_host": "11000000101010000000000101111110"
  }
}

Notice you simply pipe the IP address to jc. This is important because it allows you to easily grab the details of an IP address in a Bash pipeline. I used the --pretty (or -p) option to pretty-print the output. By the way, I could have also used --yaml-out (or -y) to see a slightly simpler form:

% echo 192.168.1.10/25 | jc --ip-address --yaml-out         
---
version: 4
max_prefix_length: 32
ip: 192.168.1.10
ip_compressed: 192.168.1.10
ip_exploded: 192.168.1.10
<snip>

Note: You can specify the subnet as a CIDR prefix or in legacy dot notation.

If you’re a network geek like me you’ll notice a ton of useful information about the IP and subnet. If you are writing a script that needs to know whether the IP/subnet is valid IPv4 or IPv6, you can query the version field. You can grab each octet of the IP with the ip_split field. Of course you get the network, number of hosts, first and last host, and the broadcast address. But you also get a ton of other useful information!

Have you ever needed the integer representation of an IP address? jc gives you that, plus hex and binary for all of the interesting values.

Do you need to know whether an IP address is private, public, or multicast? Or what the 6-to-4 mapping IP is in an IPv6 address? It’s all there!

% echo 2002:c000:204::/48 | jc --ip-address | jq -r .six_to_four
192.0.2.4

Another cool trick is that the --ip-address parer is slurpable. That means you can use the --slurp (or -s) option in jc and send multiple line-delimited IP addresses to have all of the output slurped into a JSON array:

% cat ipaddresses.txt 
1.1.1.1
10.10.10.10
192.168.1.53
127.0.0.1
::1
% cat ipaddresses.txt | jc --slurp --pretty --ip-address
[
  {<1.1.1.1 information>},
  {<10.10.10.10 information>},
  {<192.168.1.53 information>},
  {<127.0.0.1 information>},
  {<::1 information>}
]

There are a lot of slurpable parsers in jc. You can use jc -hhh to see which parsers can use the --slurp option.

That was fun, now on to the wonderful world of certificates.

Exploring X.509 Certificates

I wont spend too much time on this section since I wrote a blog post on this subject before, but this is the most highly searched topic that hits this site, so I figured it might be helpful for more people to know about it.

When I started developing jc, I was focusing on converting command output to JSON to make scripting easier. I soon realized that there was tremendous value in converting common strings and filetypes to JSON as well. X.509 certificates can be a pain to work with because their form factor is designed for signing and authenticating transactions. If you need to grab a certain value from the certificate, then you need to understand its format and parse the output of a program like openssl.

The nice thing is that jc lets you easily grab any field from a certificate or CSR – even binary DER formatted certificates and CSRs, natively.

% cat x509-ca-cert.der | jc --x509-cert --pretty
[
  {
    "tbs_certificate": {
      "version": "v3",
      "serial_number": "60:01:97:b7:46:a7:ea:b4:b4:9a:d6:4b:2f:f7:90:fb",
      "signature": {
        "algorithm": "sha256_rsa",
        "parameters": null
      },
      "issuer": {
        "country_name": "US",
        "organization_name": "thawte, Inc.",
        "organizational_unit_name": [
          "Certification Services Division",
          "(c) 2008 thawte, Inc. - For authorized use only"
        ],
        "common_name": "thawte Primary Root CA - G3"
      },
      "validity": {
        "not_before": 1207094400,
        "not_after": 2143324799,
        "not_before_iso": "2008-04-02T00:00:00+00:00",
        "not_after_iso": "2037-12-01T23:59:59+00:00"
      },
      "subject": {
        "country_name": "US",
        "organization_name": "thawte, Inc.",
        "organizational_unit_name": [
          "Certification Services Division",
          "(c) 2008 thawte, Inc. - For authorized use only"
        ],
        "common_name": "thawte Primary Root CA - G3"
      },
      "subject_public_key_info": {
        "algorithm": {
          "algorithm": "rsa",
          "parameters": null
        },
        "public_key": {
          "modulus": "b2:bf:27:2c:fb:db:d8:5b:dd:78:7b:1b:9e:77:66:81:cb:3e:bc:7c:ae:f3:a6:27:9a:34:a3:68:31:71:38:33:62:e4:f3:71:66:79:b1:a9:65:a3:a5:8b:d5:8f:60:2d:3f:42:cc:aa:6b:32:c0:23:cb:2c:41:dd:e4:df:fc:61:9c:e2:73:b2:22:95:11:43:18:5f:c4:b6:1f:57:6c:0a:05:58:22:c8:36:4c:3a:7c:a5:d1:cf:86:af:88:a7:44:02:13:74:71:73:0a:42:59:02:f8:1b:14:6b:42:df:6f:5f:ba:6b:82:a2:9d:5b:e7:4a:bd:1e:01:72:db:4b:74:e8:3b:7f:7f:7d:1f:04:b4:26:9b:e0:b4:5a:ac:47:3d:55:b8:d7:b0:26:52:28:01:31:40:66:d8:d9:24:bd:f6:2a:d8:ec:21:49:5c:9b:f6:7a:e9:7f:55:35:7e:96:6b:8d:93:93:27:cb:92:bb:ea:ac:40:c0:9f:c2:f8:80:cf:5d:f4:5a:dc:ce:74:86:a6:3e:6c:0b:53:ca:bd:92:ce:19:06:72:e6:0c:5c:38:69:c7:04:d6:bc:6c:ce:5b:f6:f7:68:9c:dc:25:15:48:88:a1:e9:a9:f8:98:9c:e0:f3:d5:31:28:61:11:6c:67:96:8d:39:99:cb:c2:45:24:39",
          "public_exponent": 65537
        }
      },
      "issuer_unique_id": null,
      "subject_unique_id": null,
      "extensions": [
        {
          "extn_id": "basic_constraints",
          "critical": true,
          "extn_value": {
            "ca": true,
            "path_len_constraint": null
          }
        },
        {
          "extn_id": "key_usage",
          "critical": true,
          "extn_value": [
            "crl_sign",
            "key_cert_sign"
          ]
        },
        {
          "extn_id": "key_identifier",
          "critical": false,
          "extn_value": "ad:6c:aa:94:60:9c:ed:e4:ff:fa:3e:0a:74:2b:63:03:f7:b6:59:bf"
        }
      ],
      "serial_number_str": "127614157056681299805556476275995414779"
    },
    "signature_algorithm": {
      "algorithm": "sha256_rsa",
      "parameters": null
    },
    "signature_value": "1a:40:d8:95:65:ac:09:92:89:c6:39:f4:10:e5:a9:0e:66:53:5d:78:de:fa:24:91:bb:e7:44:51:df:c6:16:34:0a:ef:6a:44:51:ea:2b:07:8a:03:7a:c3:eb:3f:0a:2c:52:16:a0:2b:43:b9:25:90:3f:70:a9:33:25:6d:45:1a:28:3b:27:cf:aa:c3:29:42:1b:df:3b:4c:c0:33:34:5b:41:88:bf:6b:2b:65:af:28:ef:b2:f5:c3:aa:66:ce:7b:56:ee:b7:c8:cb:67:c1:c9:9c:1a:18:b8:c4:c3:49:03:f1:60:0e:50:cd:46:c5:f3:77:79:f7:b6:15:e0:38:db:c7:2f:28:a0:0c:3f:77:26:74:d9:25:12:da:31:da:1a:1e:dc:29:41:91:22:3c:69:a7:bb:02:f2:b6:5c:27:03:89:f4:06:ea:9b:e4:72:82:e3:a1:09:c1:e9:00:19:d3:3e:d4:70:6b:ba:71:a6:aa:58:ae:f4:bb:e9:6c:b6:ef:87:cc:9b:bb:ff:39:e6:56:61:d3:0a:a7:c4:5c:4c:60:7b:05:77:26:7a:bf:d8:07:52:2c:62:f7:70:63:d9:39:bc:6f:1c:c2:79:dc:76:29:af:ce:c5:2c:64:04:5e:88:36:6e:31:d4:40:1a:62:34:36:3f:35:01:ae:ac:63:a0"
  }
]

Notice how easy that makes it to grab any value you want? For example, to grab the Common Name field:

% cat x509-ca-cert.der | jc --x509-cert | jq -r '.[0].tbs_certificate.subject.common_name'
thawte Primary Root CA - G3

Also notice that the validity dates have already been converted to Unix epoch timestamps and ISO datetime format for easier use in scripts and automation:

% cat x509-ca-cert.der | jc --x509-cert | jq -r '.[0].tbs_certificate.validity'           
{
  "not_before": 1207094400,
  "not_after": 2143324799,
  "not_before_iso": "2008-04-02T00:00:00+00:00",
  "not_after_iso": "2037-12-01T23:59:59+00:00"
}

Be sure to check out my earlier blog post on this subject for more examples.

Converting Datetimes and Timestamps

There is no shortage of date and time formats to choose from. There’s Unix date command output, the ISO 8601 standard, Unix epoch timestamps, and more. When writing a script you might want to convert from one format to another, or maybe you just want to grab a certain part of the datetime… perhaps just the day?

jc let’s you do all of the above! Here’s an example of what information jc gives you for the Unix date command output:

% date | jc --date --pretty
{
  "year": 2024,
  "month": "Feb",
  "month_num": 2,
  "day": 6,
  "weekday": "Tue",
  "weekday_num": 2,
  "hour": 6,
  "hour_24": 18,
  "minute": 0,
  "second": 13,
  "period": "PM",
  "timezone": "PST",
  "utc_offset": null,
  "day_of_year": 37,
  "week_of_year": 6,
  "iso": "2024-02-06T18:00:13",
  "epoch": 1707271213,
  "epoch_utc": null,
  "timezone_aware": false
}

Notice, you not only get Unix epoch timestamp conversions, but you also get an ISO conversion. In addition, it’s easy to grab any part of the date output. Maybe you want to convert from 12 hour to 24 hour? To do that you can grab the hour_24 field. jc also calculates the day of the year for you with the day_of_year field. You can also get the days and months in text and number formats. There’s a lot more going on here than simply parsing the items of the date command output.

Let’s do the same thing with an ISO datetime string:

% echo 2024-02-06T18:00:13-08:00 | jc --datetime-iso --pretty
{
  "year": 2024,
  "month": "Feb",
  "month_num": 2,
  "day": 6,
  "weekday": "Tue",
  "weekday_num": 2,
  "hour": 6,
  "hour_24": 18,
  "minute": 0,
  "second": 13,
  "microsecond": 0,
  "period": "PM",
  "utc_offset": "-0800",
  "day_of_year": 37,
  "week_of_year": 6,
  "iso": "2024-02-06T18:00:13-08:00",
  "timestamp": 1707271213
}

Very similar information is available. And we can do the same thing with a Unix epoch timestamp:

% echo 1707271213 | jc --timestamp --pretty
{
  "naive": {
    "year": 2024,
    "month": "Feb",
    "month_num": 2,
    "day": 6,
    "weekday": "Tue",
    "weekday_num": 2,
    "hour": 6,
    "hour_24": 18,
    "minute": 0,
    "second": 13,
    "period": "PM",
    "day_of_year": 37,
    "week_of_year": 6,
    "iso": "2024-02-06T18:00:13"
  },
  "utc": {
    "year": 2024,
    "month": "Feb",
    "month_num": 2,
    "day": 7,
    "weekday": "Wed",
    "weekday_num": 3,
    "hour": 2,
    "hour_24": 2,
    "minute": 0,
    "second": 13,
    "period": "AM",
    "utc_offset": "+0000",
    "day_of_year": 38,
    "week_of_year": 6,
    "iso": "2024-02-07T02:00:13+00:00"
  }
}

Whoa, what is that? Naive and UTC? In most cases you will want to use the utc time since almost all Unix timestamps are generated and used that way. The naive time is based off of the local timezone of the machine jc is running on.

In any case, this allows you to grab any information from the timestamp. Let’s say you want to grab the week of the year:

% echo 1707272432 | jc --timestamp | jq .utc.week_of_year
6

By the way, these date parsers are also slurpable, just like the --ip-address parser.

So there you have it – the first three jc tips and tricks. What are some of your favorite use cases? I hope you enjoyed these and I’ll post some more soon!

Featured

Tutorial: Rapid Script Development with Bash, JC, and JQ

I thought it would be fun to show the power of using JSON for rapid development of Bash scripts. When you don’t need to parse unstructured command output and common string types manually you are freed to focus on the code logic which can really reduce development and troubleshooting cycles.

To that end, I created a toy CLI application that scans the local subnet to see which hosts respond to ICMP requests. This application took less than a half-hour to write (plus some fine-tuning) and I think it demonstrates some cool concepts utilizing the JSON command output of several commands, including ifconfig, ping, arp, date, and wc. We use jc to convert the command output to JSON and jq to grab the values we want from that output.

We will also be utilizing the ip-address parser that comes with jc. It acts like a subnet calculator but provides all of the values in a nice JSON schema for easy access with jq.

Yes, this is not as efficient as pinging the broadcast address and checking the local ARP table, but this is arguably more fun!

Here is the Bash script of around 80 lines. Because there is no low-level command output parsing via grep, cut, sed, awk, etc. it is pretty simple to understand, but we’ll go through some of the interesting parts in more detail.

Here is some sample output when using the script:

% ./scansubnet.sh    
Please enter the interface name as command argument.

Options:
lo0
en0

% ./scansubnet.sh wrong-interface                     
ifconfig: interface wrong-interface does not exist

% ./scansubnet.sh lo0
Subnet is too large (16777214 IPs). Exiting.

% ./scansubnet.sh en0
My IP: 192.168.1.221/24
Sending ICMP requests to 254 IPs: 192.168.1.1 - 192.168.1.254
Start Time: 2022-08-29T07:05:40
   13.623 ms   192.168.1.249   f0:ef:86:f6:21:84   camera1.local
  350.634 ms   192.168.1.72    f0:18:98:3:f8:39    laptop1.local
   10.645 ms   192.168.1.243   fc:ae:34:a1:35:82   
  561.997 ms   192.168.1.188   18:3e:ef:d3:3f:82   laptop2.local
   19.775 ms   192.168.1.254   fc:ae:34:a1:3a:80   router.local
                          <snip>
   27.917 ms   192.168.1.197   cc:a7:c1:5e:c3:f1   camera2.local
   28.582 ms   192.168.1.235   56:c8:36:64:2a:8d   camera3.local
   38.199 ms   192.168.1.246   d8:30:62:2e:a6:cf   extender.local
   44.617 ms   192.168.1.242   50:14:79:1e:42:3e   vacuum.local
    5.350 ms   192.168.1.88    c8:d0:83:cd:f4:2d   tv.local
    0.087 ms   192.168.1.221   a4:83:e7:2d:62:4e   laptop3.local
Scanned 192.168.1.0/24 subnet in 27 seconds.
30 alive hosts found.
End Time: 2022-08-29T07:06:07

Note: For best results, use jc version 1.21.1 or higher in this script.

Notice there are some basic checks and features:

If you don’t specify an interface it will suggest any interfaces on the system with a configured IPv4 address
If you select an interface attached to too large of a subnet (say a 127.0.0.1/8 address of a loopback) it will exit
It will show the subnet information, including the number of hosts it will scan and the range of IP addresses to be scanned
It will log the start and end time and calculate how long it took to complete the scan
It will run the pings in the background for parallel processing
It will report the round-trip time, IP, MAC address, and name (if known) of the hosts that respond

Let’s take a closer look!

Displaying valid interface options

We could make the user figure out the valid interface names that can be used as the only command argument. Instead, we can use the ifconfig output passed through jc and a simple jq query to print valid options (that is, interfaces with an IPv4 address configured):

if [[ $1 == "" ]]; then
    echo "Please enter the interface name as command argument."
    echo
    echo "Options:"
    # Only show interfaces with an assigned IP address
    jc ifconfig | jq -r '.[] | select(.ipv4_addr != null) | .name'
    exit 1
fi

Once jc converts the ifconfig output to JSON we pipe it to jq so it will filter only items with an ipv4_addr field that is not set to null. Pretty straightforward and easy to read.

Note: We could have gotten the same result by using the ip command with the JSON output option instead of parsing the ifconfig output to JSON via jc, but this allows the script to be more cross-platform with macOS, BSD, etc.

Grab the selected interface IP and subnet mask

Once the user selects a valid interface, let’s grab the IP and Subnet:

interfaceInfo=$(jc ifconfig "$1") || exit 1
ip=$(jq -r '.[0].ipv4_addr' <<<"$interfaceInfo")
mask=$(jq -r '.[0].ipv4_mask' <<<"$interfaceInfo")

Here you can see we parse the ifconfig output to JSON with jc in the first line. If the interface name from the user is invalid, ifconfig will give us a non-zero exit code and quit the program with a helpful message. Then we do two variable assignments to pull the IP address and subnet mask via jq queries.

Grab detailed subnet information for the IP/Mask

Next we take the Interface IP and Mask values and use the IP Address string parser in jc to gather more detailed subnet information we will use later in the script.

ipInfo=$(jc --ip-address <<<"$ip/$mask")
network=$(jq -r '.network' <<<"$ipInfo")
numHosts=$(jq -r '.hosts' <<<"$ipInfo")
cidrMask=$(jq -r '.cidr_netmask' <<<"$ipInfo")
firstHostIp=$(jq -r '.first_host' <<<"$ipInfo")
lastHostIp=$(jq -r '.last_host' <<<"$ipInfo")
firstHost=$(jq -r '.int.first_host' <<<"$ipInfo")
lastHost=$(jq -r '.int.last_host' <<<"$ipInfo")

The IP Address parser in jc is nice because it acts like a subnet calculator and gives us a lot of data, including the subnet, number of hosts in the subnet, first host, and last host in different formats (decimal, hex, binary, and standard format). This will help us build a simple for loop that will do most of the work. All we need to do is pick the fields we want with jq. Here is an example of all of the information available with the jc IP Address string parser (IPv6 is also supported):

% echo 192.168.1.10/255.255.255.0 | jc --ip-address -p
{
  "version": 4,
  "max_prefix_length": 32,
  "ip": "192.168.1.10",
  "ip_compressed": "192.168.1.10",
  "ip_exploded": "192.168.1.10",
  "scope_id": null,
  "ipv4_mapped": null,
  "six_to_four": null,
  "teredo_client": null,
  "teredo_server": null,
  "dns_ptr": "10.1.168.192.in-addr.arpa",
  "network": "192.168.1.0",
  "broadcast": "192.168.1.255",
  "hostmask": "0.0.0.255",
  "netmask": "255.255.255.0",
  "cidr_netmask": 24,
  "hosts": 254,
  "first_host": "192.168.1.1",
  "last_host": "192.168.1.254",
  "is_multicast": false,
  "is_private": true,
  "is_global": false,
  "is_link_local": false,
  "is_loopback": false,
  "is_reserved": false,
  "is_unspecified": false,
  "int": {
    "ip": 3232235786,
    "network": 3232235776,
    "broadcast": 3232236031,
    "first_host": 3232235777,
    "last_host": 3232236030
  },
  "hex": {
    "ip": "c0:a8:01:0a",
    "network": "c0:a8:01:00",
    "broadcast": "c0:a8:01:ff",
    "hostmask": "00:00:00:ff",
    "netmask": "ff:ff:ff:00",
    "first_host": "c0:a8:01:01",
    "last_host": "c0:a8:01:fe"
  },
  "bin": {
    "ip": "11000000101010000000000100001010",
    "network": "11000000101010000000000100000000",
    "broadcast": "11000000101010000000000111111111",
    "hostmask": "00000000000000000000000011111111",
    "netmask": "11111111111111111111111100000000",
    "first_host": "11000000101010000000000100000001",
    "last_host": "11000000101010000000000111111110"
  }
}

In addition to accepting a CIDR subnet mask, the jc IP Address string parser also accepts a dotted-quad subnet mask (that’s how ifconfig gives it to us) and provides us the CIDR notation for it in the cidr_netmask field. The IP Address string parser is fairly liberal in the IP formats it will accept.

We’ll see how having the first_host and last_host values in decimal makes for easy looping later.

Sanity check the subnet size

We can use the hosts value from the jc IP Address string parser to see if the subnet is a suitable size to scan. If the subnet supports any more than 1022 hosts (/22) then we don’t want to bother spinning up that many ping processes in the background for the scan. The following code does that sanity check for us:

if [[ $numHosts -gt 1022 ]]; then
    echo "Subnet is too large ($numHosts IPs). Exiting."
    exit 1
fi

Grab the start time in ISO and Unix Epoch format

Next we want to grab the start time. The date command parser in jc gives us the current time in ISO and Epoch formats that we can easily pull with jq. This allows us to display the time in a nice, standard human readable format and also have the date-time information in an easy-to-use format for calculating the duration later.

startTime=$(jc date)
startTimeIso=$(jq -r '.iso' <<<"$startTime")
startTimeEpoch=$(jq -r '.epoch' <<<"$startTime")

Here are all of the fields available when running the date command through jc:

% jc -p date
{
  "year": 2022,
  "month": "Aug",
  "month_num": 8,
  "day": 28,
  "weekday": "Sun",
  "weekday_num": 7,
  "hour": 1,
  "hour_24": 13,
  "minute": 39,
  "second": 20,
  "period": "PM",
  "timezone": "PDT",
  "utc_offset": null,
  "day_of_year": 240,
  "week_of_year": 34,
  "iso": "2022-08-28T13:39:20",
  "epoch": 1661719160,
  "epoch_utc": null,
  "timezone_aware": false
}

Show the user what is going to happen

Next we use a series of simple echo commands to provide the subnet and time information back to the user before the scan:

echo "My IP: $ip/$cidrMask"
echo "Sending ICMP requests to $numHosts IPs: $firstHostIp - $lastHostIp"
echo "Start Time: $startTimeIso"

The main loop

Now comes the fun part – here is the main loop where we ping every host in the subnet and record the round-trip time, IP, MAC address, and name of each host that responds:

for (( ipDecimal=firstHost; ipDecimal<=lastHost; ipDecimal++ )); do
    # Do each ping in the background for parallel processing
    {
        # grab the packets received and rtt values from the ping output
        thisIp=$(jc --ip-address <<<"$ipDecimal" | jq -r '.ip')
        pingResult=$(ping -c1 "$thisIp" | jc --ping)
        packetsReceived=$(jq -r '.packets_received' <<<"$pingResult")
        rtTime=$(jq -r '.round_trip_ms_max' <<<"$pingResult")

        if [[ $packetsReceived -gt 0 ]]; then
            # Grab the MAC address and name for each alive host from the arp command
            thisIpArpInfo=$(arp -a | jc --arp | jq --arg myip "$thisIp" '.[] | select(.address == $myip)')
            thisIpMac=$(jq -r '.hwaddress // empty' <<<"$thisIpArpInfo")
            thisIpName=$(jq -r '.name // empty' <<<"$thisIpArpInfo")

            printf "%9.3f ms   %-16s%-20s%s\n" "$rtTime" "$thisIp" "$thisIpMac" "$thisIpName" | tee -a "$tempFile"
        fi
    } &
done
wait

Let’s break this down a little bit:

for (( ipDecimal=$firstHost; ipDecimal<=$lastHost; ipDecimal++ )); do

We use a C-style for loop which allows us to use those decimal versions of the first and last host IP addresses. I told you those decimal values would come in handy!

        thisIp=$(jc --ip-address <<<"$ipDecimal" | jq -r '.ip')
        pingResult=$(ping -c1 "$thisIp" | jc --ping)
        packetsReceived=$(jq -r '.packets_received' <<<"$pingResult")
        rtTime=$(jq -r '.round_trip_ms_max' <<<"$pingResult")

The decimal IP format is nice to loop over, but unfortunately the ping and arp commands do not seem to accept IP addresses in decimal format (at least not on all platforms). Not to worry – we simply send the decimal IP address to the jc IP Address string parser and it will tell us what the IP address is in standard dotted-quad notation.

Then we give ping that IP address and parse its output with jc. We only care about the packets_received and round_trip_ms_max fields, so we assign them to Bash variables.

Next, let’s take a look at the if block:

        if [[ $packetsReceived -gt 0 ]]; then
            # Grab the MAC address and name for each alive host from the arp command
            thisIpArpInfo=$(arp -a | jc --arp | jq --arg myip "$thisIp" '.[] | select(.address == $myip)')
            thisIpMac=$(jq -r '.hwaddress // empty' <<<"$thisIpArpInfo")
            thisIpName=$(jq -r '.name // empty' <<<"$thisIpArpInfo")

            printf "%9.3f ms   %-16s%-20s%s\n" "$rtTime" "$thisIp" "$thisIpMac" "$thisIpName" | tee -a "$tempFile"
        fi

There’s a bit going on in this if block:

We only run the below commands if there was an ICMP reply from the ping output
Since we got an ICMP reply, we check the ARP table via the arp -a command and filter for the current IP address’ MAC address and name. Having jc parse the arp-a output into JSON allows us to use a simple jq query to accomplish this.
Notice the use of the --arg option in jq that allows us to use the $thisIp value in the query.
Notice the jq -r '.name // empty' section. This tells jq to output an empty string if it sees a null value for name.
We use the printf command with string format specifications to print our output in nice, even columns.
The tee command copies what is printed to the screen and appends it to a temporary file that we will use later.

    } &
done
wait

The & at the end of the Bash command grouping tells Bash to run all of the commands enclosed in {} brackets in the background, so we get parallel processing. The wait command tells bash to pause until all of the background processes complete.

Grab the end time

After all of the background ping and arp processes return, we can grab the end time by parsing the date command with jc and returning the iso and epoch values:

endTime=$(jc date)
endTimeIso=$(jq -r '.iso' <<<"$endTime")
endTimeEpoch=$(jq -r '.epoch' <<<"$endTime")
totalTime=$((endTimeEpoch-startTimeEpoch))

Then we subtract the epoch values to get the total run time.

Grab the number of alive hosts

We can run the temporary file through wc to get the number of lines. The wc parser in jc makes it easy to pull the number of lines with a quick jq query. Then we delete the temporary file.

totalAlive=$(jc wc "$tempFile" | jq '.[0].lines')
rm "$tempFile"

Print the summary message

Finally, we print a summary message with the total run-time, subnet information, number of alive hosts, and the human-readable end time:

echo "Scanned $network/$cidrMask subnet in $totalTime seconds."
echo "$totalAlive alive hosts found."
echo "End Time: $endTimeIso"

Conclusion

There you go – that was a pretty fun exercise demonstrating how you can rapidly develop a prototype in Bash using the output of existing commands on the system without needing to manually parse them. By using jq to query the JSON output from the commands and jc the script becomes very easy to understand. It’s nearly self documenting!

Let me know if you have built any cool scripts or programs with jc and jq!

Featured

JC Version 1.21.0 Released

I’m excited to announce the release of jc version 1.21.0 available on github and pypi. jc now supports over 100 standard and streaming parsers. Thank you to the Open Source community for making this possible!

jc can be installed via pip or through several official OS package repositories, including Debian, Ubuntu, Fedora, openSUSE, Arch Linux, NixOS Linux, Guix System Linux, FreeBSD, and macOS. For more information on how to get jc, see the project README.

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

What’s New

New --meta-out or -M option to add metadata to the JSON output, including a UTC timestamp, parser name, magic command, and magic command exit code
IP Address string parser
Syslog standard and streaming string parsers (RFC 3164 and RFC 5424)
CEF standard and streaming string parsers
PLIST file parser (XML and binary support)
mdadm command parser
Add -n support to the traceroute parser
Other minor parser fixes

New Features

The new --meta-out command option adds a _jc_meta key to the output objects that contains the parser name, a UTC timestamp, and the magic command and exit code information if the magic syntax is used.

Standard parser output can either be an array of objects (list of dictionaries) or a single object (dictionary). If the output is an array of objects, then each object in the array will have the _jc_meta field added. If the output is a singe object, then the _jc_meta field will be added to that single object. In the case of streaming parsers, discrete objects are emitted for each item. Each object will have a _jc_meta field added.

Here is an example with the ping parser and the magic syntax.

$ jc --meta-out -p ping -c3 192.168.1.252
{
  "destination_ip": "192.168.1.252",
  "data_bytes": 56,
  "pattern": null,
  "destination": "192.168.1.252",
  "packets_transmitted": 3,
  "packets_received": 0,
  "packet_loss_percent": 100.0,
  "duplicates": 0,
  "responses": [
    {
      "type": "timeout",
      "icmp_seq": 0,
      "duplicate": false
    },
    {
      "type": "timeout",
      "icmp_seq": 1,
      "duplicate": false
    }
  ],
  "_jc_meta": {
    "parser": "ping",
    "timestamp": 1661128157.294033,
    "magic_command": [
      "ping",
      "-c3",
      "192.168.1.252"
    ],
    "magic_command_exit": 2
  }
}

New Parsers

IP Address string parser

Support for IPv4 and IPv6 CIDR strings. (Documentation)

Standard and decimal IP notation is supported. The output includes subnet information in standard, decimal, hex, and binary notation.

$ echo 192.168.2.10/24 | jc --ip-address -p
{
  "version": 4,
  "max_prefix_length": 32,
  "ip": "192.168.2.10",
  "ip_compressed": "192.168.2.10",
  "ip_exploded": "192.168.2.10",
  "scope_id": null,
  "ipv4_mapped": null,
  "six_to_four": null,
  "teredo_client": null,
  "teredo_server": null,
  "dns_ptr": "10.2.168.192.in-addr.arpa",
  "network": "192.168.2.0",
  "broadcast": "192.168.2.255",
  "hostmask": "0.0.0.255",
  "netmask": "255.255.255.0",
  "cidr_netmask": 24,
  "hosts": 254,
  "first_host": "192.168.2.1",
  "last_host": "192.168.2.254",
  "is_multicast": false,
  "is_private": true,
  "is_global": false,
  "is_link_local": false,
  "is_loopback": false,
  "is_reserved": false,
  "is_unspecified": false,
  "int": {
    "ip": 3232236042,
    "network": 3232236032,
    "broadcast": 3232236287,
    "first_host": 3232236033,
    "last_host": 3232236286
  },
  "hex": {
    "ip": "c0:a8:02:0a",
    "network": "c0:a8:02:00",
    "broadcast": "c0:a8:02:ff",
    "hostmask": "00:00:00:ff",
    "netmask": "ff:ff:ff:00",
    "first_host": "c0:a8:02:01",
    "last_host": "c0:a8:02:fe"
  },
  "bin": {
    "ip": "11000000101010000000001000001010",
    "network": "11000000101010000000001000000000",
    "broadcast": "11000000101010000000001011111111",
    "hostmask": "00000000000000000000000011111111",
    "netmask": "11111111111111111111111100000000",
    "first_host": "11000000101010000000001000000001",
    "last_host": "11000000101010000000001011111110"
  }
}

$ echo 127:0:de::1%128/96 | jc --ip-address -p
{
  "version": 6,
  "max_prefix_length": 128,
  "ip": "127:0:de::1",
  "ip_compressed": "127:0:de::1%128",
  "ip_exploded": "0127:0000:00de:0000:0000:0000:0000:0001",
  "scope_id": "128",
  "ipv4_mapped": null,
  "six_to_four": null,
  "teredo_client": null,
  "teredo_server": null,
  "dns_ptr": "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.....0.7.2.1.0.ip6.arpa",
  "network": "127:0:de::",
  "broadcast": "127:0:de::ffff:ffff",
  "hostmask": "::ffff:ffff",
  "netmask": "ffff:ffff:ffff:ffff:ffff:ffff::",
  "cidr_netmask": 96,
  "hosts": 4294967294,
  "first_host": "127:0:de::1",
  "last_host": "127:0:de::ffff:fffe",
  "is_multicast": false,
  "is_private": false,
  "is_global": true,
  "is_link_local": false,
  "is_loopback": false,
  "is_reserved": true,
  "is_unspecified": false,
  "int": {
    "ip": 1531727573536155682370944093904699393,
    "network": 1531727573536155682370944093904699392,
    "broadcast": 1531727573536155682370944098199666687,
    "first_host": 1531727573536155682370944093904699393,
    "last_host": 1531727573536155682370944098199666686
  },
  "hex": {
    "ip": "01:27:00:00:00:de:00:00:00:00:00:00:00:00:00:01",
    "network": "01:27:00:00:00:de:00:00:00:00:00:00:00:00:00:00",
    "broadcast": "01:27:00:00:00:de:00:00:00:00:00:00:ff:ff:ff:ff",
    "hostmask": "00:00:00:00:00:00:00:00:00:00:00:00:ff:ff:ff:ff",
    "netmask": "ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:00:00:00:00",
    "first_host": "01:27:00:00:00:de:00:00:00:00:00:00:00:00:00:01",
    "last_host": "01:27:00:00:00:de:00:00:00:00:00:00:ff:ff:ff:fe"
  },
  "bin": {
    "ip": "000000010010011100000000000000000000000011011110000000...",
    "network": "0000000100100111000000000000000000000000110111100...",
    "broadcast": "00000001001001110000000000000000000000001101111...",
    "hostmask": "000000000000000000000000000000000000000000000000...",
    "netmask": "1111111111111111111111111111111111111111111111111...",
    "first_host": "0000000100100111000000000000000000000000110111...",
    "last_host": "00000001001001110000000000000000000000001101111..."
  }
}

Syslog string parser (RFC 5424)

Support for RFC 5424 Syslog strings. Multiple syslog strings separated by newline characters are supported. (Documentation)

$ echo "<165>1 2003-08-24T05:14:15.000003-07:00 192.0.2.1 myproc 8710 - - %% It's time to make the do-nuts." | jc --syslog -p
[
  {
    "priority": 165,
    "version": 1,
    "timestamp": "2003-08-24T05:14:15.000003-07:00",
    "hostname": "192.0.2.1",
    "appname": "myproc",
    "proc_id": 8710,
    "msg_id": null,
    "structured_data": null,
    "message": "%% It's time to make the do-nuts.",
    "timestamp_epoch": 1061727255,
    "timestamp_epoch_utc": null
  }
]

Syslog string streaming parser (RFC 5424)

Support for RFC 5424 Syslog strings. Multiple syslog strings separated by newline characters are supported. This is a streaming parser and it outputs JSON Lines. (Documentation)

$ cat syslog.txt | jc --syslog-s -p
{"priority":165,"version":1,"timestamp":"2003-08-24T05:14:15.000003-...}
{"priority":165,"version":1,"timestamp":"2003-08-24T05:14:16.000003-...}
...

Syslog string parser (BSD-style RFC 3164)

Support for RFC 3164 Syslog strings. Multiple syslog strings separated by newline characters are supported. (Documentation)

$ echo "<34>Oct 11 22:14:15 mymachine su: 'su root' failed for lonvick on /dev/pts/8" | jc --syslog-bsd -p
[
  {
    "priority": 34,
    "date": "Oct 11 22:14:15",
    "hostname": "mymachine",
    "tag": "su",
    "content": "'su root' failed for lonvick on /dev/pts/8"
  }
]

Syslog string streaming parser (BSD-style RFC 3164)

Support for RFC 3164 Syslog strings. Multiple syslog strings separated by newline characters are supported. This is a streaming parser and it outputs JSON Lines. (Documentation)

$ cat syslog.txt | jc --syslog-bsd-s -p
{"priority":34,"date":"Oct 11 22:14:15","hostname":"mymachine","t...}
{"priority":34,"date":"Oct 11 22:14:16","hostname":"mymachine","t...}
...

CEF string parser

Support for standard CEF log lines as documented in the Microfocus Arcsight CEF specification. (Documentation)

$ cat cef.log | jc --cef -p
[
  {
    "deviceVendor": "Trend Micro",
    "deviceProduct": "Deep Security Agent",
    "deviceVersion": "<DSA version>",
    "deviceEventClassId": "4000000",
    "name": "Eicar_test_file",
    "agentSeverity": 6,
    "CEFVersion": 0,
    "dvchost": "hostname",
    "string": "hello \"world\"!",
    "start": "Nov 08 2020 12:30:00.111 UTC",
    "start_epoch": 1604867400,
    "start_epoch_utc": 1604838600,
    "Host_ID": 1,
    "Quarantine": 205,
    "myDate": "Nov 08 2022 12:30:00.111",
    "myDate_epoch": 1667939400,
    "myDate_epoch_utc": null,
    "myFloat": 3.14,
    "deviceEventClassIdNum": 4000000,
    "agentSeverityString": "Medium",
    "agentSeverityNum": 6
  }
]

CEF string streaming parser

Support for standard CEF log lines as documented in the Microfocus Arcsight CEF specification. This is a streaming parser and it outputs JSON Lines. (Documentation)

$ cat cef.log | jc --cef-s
{"deviceVendor":"Fortinet","deviceProduct":"FortiDeceptor","deviceV...}
{"deviceVendor":"Trend Micro","deviceProduct":"Deep Security Agent"...}
...

PLIST file parser

Support for binary and XML PLIST files. (Documentation)

$ cat info.plist | jc --plist -p
{
  "NSAppleScriptEnabled": true,
  "LSMultipleInstancesProhibited": true,
  "CFBundleInfoDictionaryVersion": "6.0",
  "DTPlatformVersion": "GM",
  "CFBundleIconFile": "GarageBand.icns",
  "CFBundleName": "GarageBand",
  "DTSDKName": "macosx10.13internal",
  "NSSupportsAutomaticGraphicsSwitching": true,
  "RevisionDate": "2018-12-03_14:10:56",
  "UTImportedTypeDeclarations": [
    {
      "UTTypeConformsTo": [
        "public.data",
        "public.content"
  ...
}

`mdadm` command parser

Linux support for mdadm command output. The --examine and --query options are supported. (Documentation)

$ mdadm --query --detail /dev/md0 | jc --mdadm -p
{
  "device": "/dev/md0",
  "version": "1.1",
  "creation_time": "Tue Apr 13 23:22:16 2010",
  "raid_level": "raid1",
  "array_size": "5860520828 (5.46 TiB 6.00 TB)",
  "used_dev_size": "5860520828 (5.46 TiB 6.00 TB)",
  "raid_devices": 2,
  "total_devices": 2,
  "persistence": "Superblock is persistent",
  "intent_bitmap": "Internal",
  "update_time": "Tue Jul 26 20:16:31 2022",
  "state": "clean",
  "active_devices": 2,
  "working_devices": 2,
  "failed_devices": 0,
  "spare_devices": 0,
  "consistency_policy": "bitmap",
  "name": "virttest:0",
  "uuid": "85c5b164:d58a5ada:14f5fe07:d642e843",
  "events": 2193679,
  "device_table": [
    {
      "number": 3,
      "major": 8,
      "minor": 17,
      "state": [
        "active",
        "sync"
      ],
      "device": "/dev/sdb1",
      "raid_device": 0
    },
    {
      "number": 2,
      "major": 8,
      "minor": 33,
      "state": [
        "active",
        "sync"
      ],
      "device": "/dev/sdc1",
      "raid_device": 1
    }
  ],
  "array_size_num": 5860520828,
  "used_dev_size_num": 5860520828,
  "name_val": "virttest:0",
  "uuid_val": "85c5b164:d58a5ada:14f5fe07:d642e843",
  "state_list": [
    "clean"
  ],
  "creation_time_epoch": 1271226136,
  "update_time_epoch": 1658891791
}

Happy parsing!

Featured

Convert X.509 Certificates to JSON with JC

There are some cool hacks out there that will help you extract X.509 certificate metadata to JSON values. Since jc converts so many other things to JSON, I figured it would make sense to add this functionality. I wanted to make sure jc could handle both binary and text-encoded certificates of most any type, well-known and user-defined extensions, and also ensure the output was convenient for use in scripts.

At first, I considered parsing -text output from openssl. It would not have been too hard to do – except for finding a way to reliably parse unknown certificate extensions. Ultimately I wanted to not only support openssl output, but even native certificate file formats so you could pipe the certificate file directly to jc like: cat certificate.crt | jc --x509-cert.

If I had gone the original route I would have needed two parsers: one openssl parser and another X.509 certificate parser – maybe even multiple parsers for different certificate formats.

I started building an X.509 certificate file parser that supports DER and PEM-encoded certificates first. Serendipitously, I found that this method provides all of the desired functionality in a single parser! This method suports:

Most any binary certificate format (DER, PKCS #7, PKCS #12, etc.)
PEM-encoded certificates
openssl command output (and any other command that can output DER and PEM)
- Allows conversion of password-protected certificate files to JSON
- Allows conversion of most any certificate format to JSON
Well-known and user-defined X.509 certificate extensions
Certificate files with multiple certificates bundled
Convenience fields (e.g. dates in timestamp format as well as ISO format)

Converting DER Certificate Files to JSON

The most basic (but not necessarily the most popular) X.509 certificate file is simply a binary DER format certificate. Certificate file extensions are arbitrary, so there is no guarantee you have simple DER-encoded certificate by looking at the extension. But typical file extensions for DER-encoded certificates can be .der, .cer, .crt, etc.

jc can natively convert DER-encoded certificate files to JSON:

$ cat certificate.crt | jc --x509-cert

Converting PEM Certificate Files to JSON

PEM encoded certificate files are pretty much just base64-encoded DER certificates and will many times have a .pem file extension. But, again, certificate file extensions are arbitrary, so a valid PEM file could also have a .crt, .cer, or any other extension.

PEM files can also contain more than one certificate. For instance, there might be a certificate chain with the web server certificate and one or more intermediate certificates encoded in the file. Also, a PEM file can include other objects like private keys. Don’t worry – jc can handle multiple certificates and will ignore anything but certificates in the JSON output.

jc can natively convert PEM-encoded certificates to JSON:

$ cat certificate.pem | jc --x509-cert

Converting PKCS #7 Certificate Files to JSON

PKCS #7 certificates will typically have a .p7b or .p7c file extension and can be either binary DER-encoded or text PEM-encoded.

jc will not natively convert PKCS #7 certificate files to JSON, but don’t worry! You can easily convert the PKCS #7 file to vanilla X.509 DER or PEM with openssl so jc can convert it to JSON:

$ openssl pkcs7 \
          -in certificate.p7b \
          -inform der \
          -print_certs | jc --x509-cert

Note that the -inform argument is not needed if the PKCS #7 file is PEM encoded.

Converting PKCS #12 Certificate Files to JSON

PKCS #12 files are a password-protected binary format that can contain certificates, private keys, and other objects. You will typically see a .pfx or .p12 extension on these files.

jc will not natively convert binary PKCS #12 certificate files to JSON, but don’t worry! You can easily convert the PKCS #12 file to vanilla X.509 DER or PEM with openssl so jc can convert it to JSON:

$ openssl pkcs12 \
          -info \
          -in certificate.pfx \
          -passin pass:abc123 \
          -passout pass: | jc --x509-cert

Note that you need to specify the certificate file password in the -passin parameter. You can set any password to the -passout parameter so you won’t be prompted for one when the command is run. In this example we set it to blank.

Converting Certificate Signing Request (CSR) Files to JSON

As of version v1.23.3 jc now also supports converting CSR files to JSON. Both PEM and DER format are supported using the new x509-csr parser:

$ cat mycsr.pem | jc --x509-csr

Using in a Script

Let’s put all of the pieces together and show how you can use JSON output in a script.

No matter the certificate type, the JSON output will be consistent. The schema can be found in the jc documentation for the X.509 certificate parser. Here is an example of a Certificate Authority certificate converted from a PKCS #12 file:

[
  {
    "tbs_certificate": {
      "version": "v3",
      "serial_number": "e1:3f:bc:97:7c:10:1d:b8",
      "signature": {
        "algorithm": "sha1_rsa",
        "parameters": null
      },
      "issuer": {
        "country_name": "FR",
        "state_or_province_name": "Alsace",
        "locality_name": "Strasbourg",
        "organization_name": "www.freelan.org",
        "organizational_unit_name": "freelan",
        "common_name": "Freelan Sample Certificate Authority",
        "email_address": "contact@freelan.org"
      },
      "validity": {
        "not_before": 1335521864,
        "not_after": 1338113864,
        "not_before_iso": "2012-04-27T10:17:44+00:00",
        "not_after_iso": "2012-05-27T10:17:44+00:00"
      },
      "subject": {
        "country_name": "FR",
        "state_or_province_name": "Alsace",
        "locality_name": "Strasbourg",
        "organization_name": "www.freelan.org",
        "organizational_unit_name": "freelan",
        "common_name": "Freelan Sample Certificate Authority",
        "email_address": "contact@freelan.org"
      },
      "subject_public_key_info": {
        "algorithm": {
          "algorithm": "rsa",
          "parameters": null
        },
        "public_key": {
          "modulus": "e0:e9:fb:ca:10:70:af:8c:4e:e5:8f:65:5c:49:65:1e:f9:a5:a2:b8:cd:c5:27:82:ea:58:5d:64:86:58:55:cf:4d:5e:ef:b2:c1:64:ea:f2:27:78:f0:2b:4c:bf:93:...",
          "public_exponent": 65537
        }
      },
      "issuer_unique_id": null,
      "subject_unique_id": null,
      "extensions": [
        {
          "extn_id": "key_identifier",
          "critical": false,
          "extn_value": "23:6c:2d:3d:3e:29:5d:78:b8:6c:3e:aa:e2:bb:2e:1e:6c:87:f2:53"
        },
        {
          "extn_id": "authority_key_identifier",
          "critical": false,
          "extn_value": {
            "key_identifier": "23:6c:2d:3d:3e:29:5d:78:b8:6c:3e:aa:e2:bb:2e:1e:6c:87:f2:53",
            "authority_cert_issuer": null,
            "authority_cert_serial_number": null
          }
        },
        {
          "extn_id": "basic_constraints",
          "critical": false,
          "extn_value": {
            "ca": true,
            "path_len_constraint": null
          }
        }
      ]
    },
    "signature_algorithm": {
      "algorithm": "sha1_rsa",
      "parameters": null
    },
    "signature_value": "b0:44:9a:49:0a:0a:7b:4b:e9:3d:05:3e:97:de:40:5e:7e:89:c4:10:e6:2d:c9:65:c1:3e:9b:b2:1b:74:25:9b:5a:dd:85:ce:ba:0c:21:85:a2:b0:e6:4f:18:cc:98:..."
  }
]

Note: jc does not verify the integrity of the certificate, which requires calculating the hash of the certificate body and comparing it to the the hash in the certificate’s signature after it (the hash) is decrypted with the issuer certificate’s public key.

Notice the first (and only) certificate in this JSON array has a tbs_certificate.validity object that contains not_before and not_after values in both epoch timestamp and ISO formats. This should make it easy for us to check whether the certificate is valid in a Bash script using a JSON parser like jq:

#!/bin/bash

# grab the validity information from the first certificate in the pkcs12 file
cert_json=$(
    openssl pkcs12 \
        -info \
        -in certificate.pfx \
        -passin pass:abc123 \
        -passout pass: | jc --x509-cert
)

not_before=$(
    echo "$cert_json" | jq .[0].tbs_certificate.validity.not_before
)

not_after=$(
    echo "$cert_json" | jq .[0].tbs_certificate.validity.not_after
)

# compare the timestamps to the current time
current_time=$(date '+%s')

if [[ "$not_before" -lt "$current_time" ]] && [[ "$not_after" -gt "$current_time" ]]; then
    echo "Certificate is valid"
else
    echo "Certificate is invalid"
fi

And here is the output for an expired certificate. (note the STDERR and STDOUT lines have been distinguished):

$ ./checkcert.sh
MAC Iteration 2048                                                    # STDERR
MAC verified OK                                                       # STDERR
PKCS7 Encrypted data: pbeWithSHA1And40BitRC2-CBC, Iteration 2048      # STDERR
Certificate bag                                                       # STDERR
PKCS7 Data                                                            # STDERR
Shrouded Keybag: pbeWithSHA1And3-KeyTripleDES-CBC, Iteration 2048     # STDERR
Certificate is invalid                                                # STDOUT

There are a lot more things to check than just the not_before and not_after fields for a true certificate validation, so this should be considered a toy example to get you started. I hope this new jc X.509 certificate parser helps you in your automation scripts!

Featured

JC Version 1.20.0 Released

I’m excited to announce the release of jc version 1.20.0 available on github and pypi. jc now supports over 100 standard and streaming parsers. Thank you to the Open Source community for making this possible!

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

What’s New

Add YAML output option with the -y option
Add top -b standard and streaming parsers tested on linux
Add plugin_parser_count, standard_parser_count, and streaming_parser_count keys to jc -a output
Add is_compatible function to the utils module
Fix pip-show parser for packages with a multi-line license field
Fix ASCII Table parser for cases where centered headers cause mis-aligned fields

New Parsers

`top -b` command parser

Support for the top -b command. (Documentation)

$ top -b -n 3 | jc --top -p
[
  {
    "time": "11:20:43",
    "uptime": 118,
    "users": 2,
    "load_1m": 0.0,
    "load_5m": 0.01,
    "load_15m": 0.05,
    "tasks_total": 108,
    "tasks_running": 2,
    "tasks_sleeping": 106,
    "tasks_stopped": 0,
    "tasks_zombie": 0,
    "cpu_user": 5.6,
    "cpu_sys": 11.1,
    "cpu_nice": 0.0,
    "cpu_idle": 83.3,
    "cpu_wait": 0.0,
    "cpu_hardware": 0.0,
    "cpu_software": 0.0,
    "cpu_steal": 0.0,
    "mem_total": 3.7,
    "mem_free": 3.3,
    "mem_used": 0.2,
    "mem_buff_cache": 0.2,
    "swap_total": 2.0,
    "swap_free": 2.0,
    "swap_used": 0.0,
    "mem_available": 3.3,
    "processes": [
      {
        "pid": 2225,
        "user": "kbrazil",
        "priority": 20,
        "nice": 0,
        "virtual_mem": 158.1,
        "resident_mem": 2.2,
        "shared_mem": 1.6,
        "status": "running",
        "percent_cpu": 12.5,
        "percent_mem": 0.1,
        "time_hundredths": "0:00.02",
        "command": "top",
        "parent_pid": 1884,
        "uid": 1000,
        "real_uid": 1000,
        "real_user": "kbrazil",
        "saved_uid": 1000,
        "saved_user": "kbrazil",
        "gid": 1000,
        "group": "kbrazil",
        "pgrp": 2225,
        "tty": "pts/0",
        "tty_process_gid": 2225,
        "session_id": 1884,
        "thread_count": 1,
        "last_used_processor": 0,
        "time": "0:00",
        "swap": 0.0,
        "code": 0.1,
        "data": 1.0,
        "major_page_fault_count": 0,
        "minor_page_fault_count": 736,
        "dirty_pages_count": 0,
        "sleeping_in_function": null,
        "flags": "..4.2...",
        "cgroups": "1:name=systemd:/user.slice/user-1000.+",
        "supplementary_gids": [
          10,
          1000
        ],
        "supplementary_groups": [
          "wheel",
          "kbrazil"
        ],
        "thread_gid": 2225,
        "environment_variables": [
          "XDG_SESSION_ID=2",
          "HOSTNAME=localhost"
        ],
        "major_page_fault_count_delta": 0,
        "minor_page_fault_count_delta": 4,
        "used": 2.2,
        "ipc_namespace_inode": 4026531839,
        "mount_namespace_inode": 4026531840,
        "net_namespace_inode": 4026531956,
        "pid_namespace_inode": 4026531836,
        "user_namespace_inode": 4026531837,
        "nts_namespace_inode": 4026531838
      },
      ...
    ]
  }
]

`top -b` command streaming parser

Support for the top -b command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ top -b | jc --top-s
{"time":"11:24:50","uptime":2,"users":2,"load_1m":0.23,"load_5m":...}
...

v1.20.1 Updates

Add postconf -M parser tested on linux
Update asciitable and asciitable-m parsers to preserve case in key names when using the -r or raw=True options.
Add long options (e.g. --help, --about, --pretty, etc.)
Add shell completions for Bash and Zsh
Fix id parser for cases where the user or group name is not present

`postconf -M` command parser

Linux support for the postconf -m command. (Documentation):

$ postconf -M | jc --postconf -p          # or jc -p postconf -M
[
  {
    "service_name": "smtp",
    "service_type": "inet",
    "private": false,
    "unprivileged": null,
    "chroot": true,
    "wake_up_time": null,
    "process_limit": null,
    "command": "smtpd",
    "no_wake_up_before_first_use": null
  },
  {
    "service_name": "pickup",
    "service_type": "unix",
    "private": false,
    "unprivileged": null,
    "chroot": true,
    "wake_up_time": 60,
    "process_limit": 1,
    "command": "pickup",
    "no_wake_up_before_first_use": false
  }
]

Long Options

jc now supports long CLI options:

Options:
    -a,  --about        about jc
    -C,  --force-color  force color output even when using pipes (overrides -m)
    -d,  --debug        debug (double for verbose debug)
    -h,  --help         help (--help --parser_name for parser documentation)
    -m,  --monochrome   monochrome output
    -p,  --pretty       pretty print output
    -q,  --quiet        suppress warnings (double to ignore streaming errors)
    -r,  --raw          raw output
    -u,  --unbuffer     unbuffer output
    -v,  --version      version info
    -y,  --yaml-out     YAML output
    -B,  --bash-comp    gen Bash completion: jc -B > /etc/bash_completion.d/jc
    -Z,  --zsh-comp     gen Zsh completion: jc -Z > "${fpath[1]}/_jc"

Shell Completions

Bash and Zsh completions are now available for jc! If your system is already set up for completions you can run the following to enable completions:

Bash

Linux

$ jc -B > /etc/bash_completion.d/jc

macOS

$ jc -B > /usr/local/etc/bash_completion.d/jc

Zsh

Linux and macOS

$ jc -Z > "${fpath[1]}/_jc"

v1.20.2 Updates

Add gpg --with-colons command parser tested on linux
Add DER and PEM encoded X.509 Certificate parser
Add Bash and Zsh completion scripts to DEB and RPM packages

gpg –with-colons command parser

Linux support for the gpg --with-colons command. (Documentation):

$ gpg --with-colons --show-keys file.gpg | jc --gpg -p
[
  {
    "type": "pub",
    "validity": "f",
    "key_length": "1024",
    "pub_key_alg": "17",
    "key_id": "6C7EE1B8621CC013",
    "creation_date": "899817715",
    "expiration_date": "1055898235",
    "certsn_uidhash_trustinfo": null,
    "owner_trust": "m",
    "user_id": null,
    "signature_class": null,
    "key_capabilities": "scESC",
...

X.509 DER/PEM Certificate Files

Support for DER and PEM encoded certificate files (Documentation):

$ cat alice.crt | jc --x509-cert -p
[
  {
    "tbs_certificate": {
      "version": "v3",
      "serial_number": "01",
      "signature": {
        "algorithm": "sha1_rsa",
        "parameters": null
      },
      "issuer": {
        "country_name": "FR",
        "state_or_province_name": "Alsace",
        "locality_name": "Strasbourg",
        "organization_name": "www.freelan.org",
        "organizational_unit_name": "freelan",
        "common_name": "Freelan Sample Certificate Authority",
        "email_address": "contact@freelan.org"
      },
      "validity": {
        "not_before": 1335522678,
        "not_after": 1650882678,
        "not_before_iso": "2012-04-27T10:31:18+00:00",
        "not_after_iso": "2022-04-25T10:31:18+00:00"
      },
      "subject": {
        "country_name": "FR",
        "state_or_province_name": "Alsace",
        "organization_name": "www.freelan.org",
        "organizational_unit_name": "freelan",
        "common_name": "alice",
        "email_address": "contact@freelan.org"
      },
      "subject_public_key_info": {
        "algorithm": {
          "algorithm": "rsa",
          "parameters": null
        },
        "public_key": {
          "modulus": "dd:6d:bd:f8:80:fa:d7:de:1b:1f:a7:a3:2e:b2:02:e2:16:f6:52:0a:3c:bf:a6:42:f8:ca:dc:93:67:4d:60:c3:4f:8d:c3:8a:00:1b:f1:c4:4b:41:6a:69:d2:69:e5:3f:21:8e:c5:0b:f8:22:37:ad:b6:2c:4b:55:ff:7a:03:72:bb:9a:d3:ec:96:b9:56:9f:cb:19:99:c9:32:94:6f:8f:c6:52:06:9f:45:03:df:fd:e8:97:f6:ea:d6:ba:bb:48:2b:b5:e0:34:61:4d:52:36:0f:ab:87:52:25:03:cf:87:00:87:13:f2:ca:03:29:16:9d:90:57:46:b5:f4:0e:ae:17:c8:0a:4d:92:ed:08:a6:32:23:11:71:fe:f2:2c:44:d7:6c:07:f3:0b:7b:0c:4b:dd:3b:b4:f7:37:70:9f:51:b6:88:4e:5d:6a:05:7f:8d:9b:66:7a:ab:80:20:fe:ee:6b:97:c3:49:7d:78:3b:d5:99:97:03:75:ce:8f:bc:c5:be:9c:9a:a5:12:19:70:f9:a4:bd:96:27:ed:23:02:a7:c7:57:c9:71:cf:76:94:a2:21:62:f6:b8:1d:ca:88:ee:09:ad:46:2f:b7:61:b3:2c:15:13:86:9f:a5:35:26:5a:67:f4:37:c8:e6:80:01:49:0e:c7:ed:61:d3:cd:bc:e4:f8:be:3f:c9:4e:f8:7d:97:89:ce:12:bc:ca:b5:c6:d2:e0:d9:b3:68:3c:2e:4a:9d:b4:5f:b8:53:ee:50:3d:bf:dd:d4:a2:8a:b6:a0:27:ab:98:0c:b3:b2:58:90:e2:bc:a1:ad:ff:bd:8e:55:31:0f:00:bf:68:e9:3d:a9:19:9a:f0:6d:0b:a2:14:6a:c6:4c:c6:4e:bd:63:12:a5:0b:4d:97:eb:42:09:79:53:e2:65:aa:24:34:70:b8:c1:ab:23:80:e7:9c:6c:ed:dc:82:aa:37:04:b8:43:2a:3d:2a:a8:cc:20:fc:27:5d:90:26:58:f9:b7:14:e2:9e:e2:c1:70:73:97:e9:6b:02:8e:d3:52:59:7b:00:ec:61:30:f1:56:3f:9c:c1:7c:05:c5:b1:36:c8:18:85:cf:61:40:1f:07:e8:a7:06:87:df:9a:77:0b:a9:64:72:03:f6:93:fc:e0:02:59:c1:96:ec:c0:09:42:3e:30:a2:7f:1b:48:2f:fe:e0:21:8f:53:87:25:0d:cb:ea:49:f5:4a:9b:d0:e3:5f:ee:78:18:e5:ba:71:31:a9:04:98:0f:b1:ad:67:52:a0:f2:e3:9c:ab:6a:fe:58:84:84:dd:07:3d:32:94:05:16:45:15:96:59:a0:58:6c:18:0e:e3:77:66:c7:b3:f7:99",
          "public_exponent": 65537
        }
      },
      "issuer_unique_id": null,
      "subject_unique_id": null,
      "extensions": [
        {
          "extn_id": "basic_constraints",
          "critical": false,
          "extn_value": {
            "ca": false,
            "path_len_constraint": null
          }
        },
        {
          "extn_id": "2.16.840.1.113730.1.13",
          "critical": false,
          "extn_value": "16:1d:4f:70:65:6e:53:53:4c:20:47:65:6e:65:72:61:74:65:64:20:43:65:72:74:69:66:69:63:61:74:65"
        },
        {
          "extn_id": "key_identifier",
          "critical": false,
          "extn_value": "59:5f:c9:13:ba:1b:cc:b9:a8:41:4a:8a:49:79:6a:36:f6:7d:3e:d7"
        },
        {
          "extn_id": "authority_key_identifier",
          "critical": false,
          "extn_value": {
            "key_identifier": "23:6c:2d:3d:3e:29:5d:78:b8:6c:3e:aa:e2:bb:2e:1e:6c:87:f2:53",
            "authority_cert_issuer": null,
            "authority_cert_serial_number": null
          }
        }
      ]
    },
    "signature_algorithm": {
      "algorithm": "sha1_rsa",
      "parameters": null
    },
    "signature_value": "13:e7:02:45:3e:a7:ab:bd:b8:da:e7:ef:74:88:ac:62:d5:dd:10:56:d5:46:07:ec:fa:6a:80:0c:b9:62:be:aa:08:b4:be:0b:eb:9a:ef:68:b7:69:6f:4d:20:92:9d:18:63:7a:23:f4:48:87:6a:14:c3:91:98:1b:4e:08:59:3f:91:80:e9:f4:cf:fd:d5:bf:af:4b:e4:bd:78:09:71:ac:d0:81:e5:53:9f:3e:ac:44:3e:9f:f0:bf:5a:c1:70:4e:06:04:ef:dc:e8:77:05:a2:7d:c5:fa:80:58:0a:c5:10:6d:90:ca:49:26:71:84:39:b7:9a:3e:e9:6f:ae:c5:35:b6:5b:24:8c:c9:ef:41:c3:b1:17:b6:3b:4e:28:89:3c:7e:87:a8:3a:a5:6d:dc:39:03:20:20:0b:c5:80:a3:79:13:1e:f6:ec:ae:36:df:40:74:34:87:46:93:3b:a3:e0:a4:8c:2f:43:4c:b2:54:80:71:76:78:d4:ea:12:28:d8:f2:e3:80:55:11:9b:f4:65:dc:53:0e:b4:4c:e0:4c:09:b4:dc:a0:80:5c:e6:b5:3b:95:d3:69:e4:52:3d:5b:61:86:02:e5:fd:0b:00:3a:fa:b3:45:cc:c9:a3:64:f2:dc:25:59:89:58:0d:9e:6e:28:3a:55:45:50:5f:88:67:2a:d2:e2:48:cc:8b:de:9a:1b:93:ae:87:e1:f2:90:50:40:d9:0f:44:31:53:46:ad:62:4e:8d:48:86:19:77:fc:59:75:91:79:35:59:1d:e3:4e:33:5b:e2:31:d7:ee:52:28:5f:0a:70:a7:be:bb:1c:03:ca:1a:18:d0:f5:c1:5b:9c:73:04:b6:4a:e8:46:52:58:76:d4:6a:e6:67:1c:0e:dc:13:d0:61:72:a0:92:cb:05:97:47:1c:c1:c9:cf:41:7d:1f:b1:4d:93:6b:53:41:03:21:2b:93:15:63:08:3e:2c:86:9e:7b:9f:3a:09:05:6a:7d:bb:1c:a7:b7:af:96:08:cb:5b:df:07:fb:9c:f2:95:11:c0:82:81:f6:1b:bf:5a:1e:58:cd:28:ca:7d:04:eb:aa:e9:29:c4:82:51:2c:89:61:95:b6:ed:a5:86:7c:7c:48:1d:ec:54:96:47:79:ea:fc:7f:f5:10:43:0a:9b:00:ef:8a:77:2e:f4:36:66:d2:6a:a6:95:b6:9f:23:3b:12:e2:89:d5:a4:c1:2c:91:4e:cb:94:e8:3f:22:0e:21:f9:b8:4a:81:5c:4c:63:ae:3d:05:b2:5c:5c:54:a7:55:8f:98:25:55:c4:a6:90:bc:19:29:b1:14:d4:e2:b0:95:e4:ff:89:71:61:be:8a:16:85"
  }
]

v1.20.4 Updates

Add URL string parser
Add Email Address string parser
Add JWT string parser
Add ISO 8601 Datetime string parser
Add UNIX Epoch Timestamp string parser
Add M3U/M3U8 file parser
Add pager functionality to help (parser documentation only)
Minor parser performance optimizations

jc version 1.20.4 includes a few new string parsers that can be very useful in scripts.

The url string parser not only allows you to pull out the specific parts of the URL you are interested in (e.g. path, query, hostname, etc.) but it also provides encoded and decoded versions of all of those values.

Similarly, the Email Address string parser allows you to quickly parse out the username and domain, even if Gmail “plus” addressing is used. The parser also allows you to separate out the username from the “plus” suffix.

JWT strings can now be parsed into their constituent Header, Payload, and Signature parts. The Payload is presented as a standard object.

And parsing time strings, including ISO 8601 Datetimes and Unix timestamps, just got easier. Both new parsers provide you detailed date information that you can use in your scripts. These parsers are a nice complement to the existing date command parser.

Finally an M3U/M3U8 parser is included for media playlists. It includes the ability to parse extended information and, since these files are not usually well maintained, the parser fails gracefully for unparsable lines.

Other minor improvements include more/less paging when accessing parser documentation at the command line via jc --help --parser-name.

For more details on each of the new parsers, see below.

URL string parser

This parser outputs Normalized, Encoded, and Decoded versions of the URL and all of the URL parts. (Documentation)

This allows you to pull specific information from the URL, including the scheme, netloc, user, password, hostname, port, path, path list, query, and fragment all three ways. For example, the following URL could be decoded:

$ echo 'http://%D0%BE%D0%B1%D0%BD%D0%BE%D0%B2%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BF%D0%BE%D0%B3%D0%BE%D0%B4%D1%8B.%72%75:%38%30' | jc --url | jq .decoded.hostname
"обновлениепогоды.ru"

You can easily grab the path string and a path list:

$ echo 'https://example.com/this/is/a/path' | jc --url | jq .path
"/this/is/a/path"

$ echo 'https://example.com/this/is/a/path' | jc --url | jq .path_list
[
  "this",
  "is",
  "a",
  "path"
]

Or even the query string and object:

$ echo 'https://example.com?user=joe&selections=gardening&selections=plumbing' | jc --url | jq .query
"user=joe&selections=gardening&selections=plumbing"

$ echo 'https://example.com?user=joe&selections=gardening&selections=plumbing' | jc --url | jq .query_obj
{
  "user": [
    "joe"
  ],
  "selections": [
    "gardening",
    "plumbing"
  ]
}

There are many other use cases that the url parser can help with. Here is a full example of the output:

$ echo 'https://www.example.com:443/mypath?q1=foo&q2=bar#heading-1' | jc --url -p
{
  "url": "https://www.example.com:443/mypath?q1=foo&q2=bar#heading-1",
  "scheme": "https",
  "netloc": "www.example.com:443",
  "path": "/mypath",
  "path_list": [
    "mypath"
  ],
  "query": "q1=foo&q2=bar",
  "query_obj": {
    "q1": [
      "foo"
    ],
    "q2": [
      "bar"
    ]
  },
  "fragment": "heading-1",
  "username": null,
  "password": null,
  "hostname": "www.example.com",
  "port": 443,
  "encoded": {
    "url": "https://www.example.com:443/mypath?q1=foo&q2=bar#heading-1",
    "scheme": "https",
    "netloc": "www.example.com:443",
    "path": "/mypath",
    "path_list": [
      "mypath"
    ],
    "query": "q1=foo&q2=bar",
    "fragment": "heading-1",
    "username": null,
    "password": null,
    "hostname": "www.example.com",
    "port": 443
  },
  "decoded": {
    "url": "https://www.example.com:443/mypath?q1=foo&q2=bar#heading-1",
    "scheme": "https",
    "netloc": "www.example.com:443",
    "path": "/mypath",
    "path_list": [
      "mypath"
    ],
    "query": "q1=foo&q2=bar",
    "fragment": "heading-1",
    "username": null,
    "password": null,
    "hostname": "www.example.com",
    "port": 443
  }
}

Email Address string parser

The Email Address string parser allows you to easily pull the username and domain from an email address, even if it is using Gmail’s “plus” addressing. In those cases you can even pull the “plus” suffix. (Documentation)

$ echo 'joe.user+spam@example.com' | jc --email-address -p
{
  "username": "joe.user",
  "domain": "example.com",
  "local": "joe.user+spam",
  "local_plus_suffix": "spam"
}

JWT string parser

jc can easily parse JWT strings into their constituent Header, Payload, and Signature parts. Note, the JWT parser does not check the integrity of the token. (Documentation)

$ echo 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c' | jc --jwt -p
{
  "header": {
    "alg": "HS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "1234567890",
    "name": "John Doe",
    "iat": 1516239022
  },
  "signature": "49:f9:4a:c7:04:49:48:c7:8a:28:5d:90:4f:87:f0:a4:c7:89:7f:7e:8f:3a:4e:b2:25:5f:da:75:0b:2c:c3:97"
}

ISO 8601 Datetime string parser

This parser explodes all of the relevant date and time fields when given an ISO 8601 datetime string. It will also provide a Unix timestamp and a normalized version of the ISO string. (Documentation)

$ echo "2022-07-20T14:52:45Z" | jc --iso-datetime -p
{
  "year": 2022,
  "month": "Jul",
  "month_num": 7,
  "day": 20,
  "weekday": "Wed",
  "weekday_num": 3,
  "hour": 2,
  "hour_24": 14,
  "minute": 52,
  "second": 45,
  "microsecond": 0,
  "period": "PM",
  "utc_offset": "+0000",
  "day_of_year": 201,
  "week_of_year": 29,
  "iso": "2022-07-20T14:52:45+00:00",
  "timestamp": 1658328765
}

UNIX Epoch Timestamp string parser

In addition to the ISO 8601 Datetime string parser, the Timestamp parser takes in a 10+ digit epoch timestamp string and explodes it into all of the relevant date and time parts you might want to use, including a normalized ISO 8601 format string. Both Naive and timezone-aware UTC versions of the output are provided. (Documentation)

$ echo '1658599410' | jc --timestamp -p
{
  "naive": {
    "year": 2022,
    "month": "Jul",
    "month_num": 7,
    "day": 23,
    "weekday": "Sat",
    "weekday_num": 6,
    "hour": 11,
    "hour_24": 11,
    "minute": 3,
    "second": 30,
    "period": "AM",
    "day_of_year": 204,
    "week_of_year": 29,
    "iso": "2022-07-23T11:03:30"
  },
  "utc": {
    "year": 2022,
    "month": "Jul",
    "month_num": 7,
    "day": 23,
    "weekday": "Sat",
    "weekday_num": 6,
    "hour": 6,
    "hour_24": 18,
    "minute": 3,
    "second": 30,
    "period": "PM",
    "utc_offset": "+0000",
    "day_of_year": 204,
    "week_of_year": 29,
    "iso": "2022-07-23T18:03:30+00:00"
  }
}

M3U/M3U8 file parser

jc can now parse M3U files, including extended information. Unparsable lines are noted with a warning message unless the --quiet flag is enabled. (Documentation)

$ cat playlist.m3u | jc --m3u -p
[
  {
    "runtime": 105,
    "display": "Example artist - Example title",
    "path": "C:\Files\My Music\Example.mp3"
  },
  {
    "runtime": 321,
    "display": "Example Artist2 - Example title2",
    "path": "C:\Files\My Music\Favorites\Example2.ogg"
  }
]

Happy parsing!

Featured

Working with JSON in Various Shells

I recently went through the exercise of testing jc on several traditional and next-gen shells to document the integrations. jc is a utility that converts the output of many commands and file-types to JSON for easier parsing in scripts. I have typically highlighted the use of JSON with Bash in concert with jq, but this is 2022 and there are so many more shells to choose from!

In this article I’d like to give a quick snapshot of what it’s like to work with JSON in various traditional and next generation shells. Traditional shells like Bash and Windows Command Prompt (cmd.exe) don’t have built-in JSON support and require 3rd party utilities. Newer shells like NGS, Nushell, Oil, Elvish, Murex, and PowerShell have JSON serialization/deserialization and filtering capabilities built-in for a cleaner experience.

Bash is still the automation workhorse of the Unix ecosystem and it’s not going away any time soon, but it’s good to see what capabilities are out there in more modern shells. Perhaps this will inspire you to try them out for yourself!

Bash

Bash is old. Bash is solid. Bash is ubiquitous. Bash isn’t going anywhere. I’ve done some crazy things with Bash in my career… Bash and me go a long way. That being said, using JSON in Bash is not always very ergonomic. Tools like jq, jello, jp, etc. help bridge the gap between 1970’s-2000’s POSIX line-based text manipulation to the modern-day JSON API reality.

Here’s a simple example of how to pull a value from JSON and assign it to a variable in Bash:

$ myvar=$(dig www.google.com | jc --dig | jq -r '.[0].answer[0].data')
$ echo $myvar
64.233.185.104

If you would like to see more complex examples of assigning multiple JSON values to Bash arrays, see:

Elvish

Elvish is a next-gen shell that uses structured data in pipelines. It has JSON deserialization built-in, so you don’t need jq et-al to convert it into an Elvish data structure. You can explore structured data in a similar way to jq or Python.

Here’s an example of loading a JSON object into a variable and displaying one of the JSON values using the from-json built-in:

~> var myvar = (dig www.google.com | jc --dig | from-json)
~> put $myvar[0]['answer'][0]['data']
▶ 64.233.185.104

See the Elvish documentation for more details.

Fish

Fish is similar to Bash in that it does not have built-in support for JSON, but it’s a more modern take on the shell that provides nice autosuggestions, tab completion, syntax-highlighting, and a clean syntax that is optimized for interactive use.

When working with JSON in Fish, you will typically use tools like jq, jello, jp, etc. to filter and query the data. Here are some examples showing how to assign filtered JSON data to a variable so it can be used elsewhere in the script:

$ set myvar (dig www.google.com | jc --dig | jq -r '.[0].answer[0].data')
$ echo $myvar
64.233.185.104

$ set myvar (jc dig www.google.com | jello -r '_[0].answer[0].data')
$ echo $myvar
64.233.185.104

$ set myvar (jc dig www.google.com | jp -u '[0].answer[0].data')
$ echo $myvar
64.233.185.104

Murex

The Murex next generation shell is designed for DevOps productivity and includes native JSON capabilities. There are a couple ways to set JSON variables: you can use the cast json builtin to convert a string to a JSON variable, or you can define the JSON type when setting the variable. (e.g. set json myvar).

There are also a couple different ways to access nested attributes within the JSON: you can use Index syntax (single bracket []) or Element syntax (double bracket [[]]).

Here’s an example of setting a JSON variable and accessing a nested value using the Element syntax:

~ » jc dig www.google.com -> set json myvar
~ » $myvar[[.0.answer.0.data]] -> set mydata
~ » out $mydata
64.233.185.104

Check out the documentation for more information.

NGS

Next Generation Shell (NGS) is a modern shell that aims to be DevOps-friendly. To that end, it is no surprise that it has great JSON support out of the box. If you have Python experience, you will find yourself at home with many of the concepts.

Here is a quick example of how to pull a value from JSON into a variable and output a specific value to STDOUT:

myvar = ``jc dig www.google.com``[0].answer[0].data
echo(myvar)

# returns 64.233.185.104

The double-backtick syntax runs the command and parses the JSON output. Then you can use bracket and dot notation to access the key you would like.

There are many other ways to filter the objects, including map(), filter(), reject(), the_one(), etc. No jq required!

Nushell

Nushell’s website describes itself this way:

“Nu pipelines use structured data so you can safely select, filter, and sort the same way every time. Stop parsing strings and start solving problems.”

This is definitely a new take on the shell which works nicely with JSON data. In fact, Nushell has a from json builtin function that deserializes JSON into a native structured object. Here’s a quick example of how to assign a JSON object to a variable and filter it down to a desired value:

> let myvar = (dig www.google.com | jc --dig | from json)
> echo $myvar | get 0.answer.0.data
64.233.185.104

Check out the Nushell documentation for more filtering options.

Oil

If you are at home in Javascript or Python then you should check out Oil. Oil started out being compatible with Bash, but has since advanced into its own shell and scripting language that supports more robust structured objects.

Oil comes with the json read builtin that deserializes JSON into a native Oil object. You can use standard bracket notation or a unique -> notation to access attributes within objects. Here’s an example:

$ dig www.google.com | jc --dig | json read myvar
$ var mydata = myvar[0]['answer'][0]['data']
$ echo $mydata
64.233.185.104

For more details on working with JSON in Oil, see the documentation.

PowerShell

They say you either love or hate PowerShell. I have to admit, coming from a Bash background, I wasn’t too hot on PowerShell the first time I needed to create a script for it. It seemed needlessly verbose. And what were these objects? Why can’t I just pipe text between processes!?

But I have to say it has grown on me because of its concept of passing structured objects between processes via pipes. Well, I neither love or hate PowerShell… I like the concept, but I’m still not a huge fan of some of the execution. It does have pretty good native JSON support, though.

Here’s an example of loading JSON data from jc into an object using the ConvertFrom-Json utility and printing a specific property within the resulting object using bracket and dot notation:

PS C:\> $myvar = dig www.google.com | jc --dig | ConvertFrom-Json
PS C:\> Write-Output $myvar[0].answer[0].data
64.233.185.104

Here’s a good article with more detail on how to work with JSON in PowerShell.

Windows Command Prompt (`cmd.exe`)

Wow, this is a blast from the past! I don’t think I’ve written a batch file since the ’90s. Back then there was no such thing as JSON. I do remember doing some crazy login scripts with batch files back in the day, and I’m sure there are many (not mine) still in use today.

At first I wasn’t sure if it was even practical to use JSON at the Windows Command Prompt, but I thought it would be fun to take on the challenge. Turns out, it wasn’t too terribly difficult, though I’m still not sure of the practicality.

When at the Command Prompt, you can use tools like jq, jello, jp, etc. to filter and query the JSON:

C:\> dig www.google.com | jc --dig | jq -r .[0].answer[0].data
64.233.185.104

C:\> jc dig www.google.com | jello -r _[0].answer[0].data
64.233.185.104

C:\> jc dig www.google.com | jp -u [0].answer[0].data
64.233.185.104

That’s fine and all, but can you actually load JSON values into variables? Yes you can – with the trusty FOR /F command!

C:\> FOR /F "tokens=* USEBACKQ" %i IN (`dig www.google.com ^| jc --dig ^| jq -r .[0].answer[0].data`) DO SET myvar=%i
C:\> ECHO %myvar%
64.233.185.104

Well, that’s a mouthful. But it does work. Batch files require double %% prefixes when setting the variables, so this is how you would do it in a batch file:

FOR /F "tokens=* USEBACKQ" %%i IN (`dig www.google.com ^| jc --dig ^| jq -r .[0].answer[0].data`) DO SET myvar=%%i
ECHO %myvar%

:: returns 64.233.185.104

I needed to make a visit to Stack Overflow to learn how to get this working. Was it worth it? I don’t know – maybe this will help some poor unfortunate soul someday searching “how to use json in batch file”. 🙂

Conclusion

That was fun – I’ve always enjoyed the command line and playing with different shells can spark inspiration for new ways of solving problems. There are lots of next-gen alternatives that are looking to take us to the 21st century shell experience. Did I leave out your favorite new shell?

Happy JSON parsing!

Featured

Easily Convert git log Output to JSON

There are lots of people interested in converting their git logs into beautiful JSON or JSON Lines for archive and analytics. It seems like it should be easy enough, but it is deceptively more complicated than it needs to be.

When I got a feature request for jc to support git log output, my first instinct was to look into the robust git --format options. At first glance it seemed like a simple format string like this should work:

git log --format='{"hash": "%H", "author": "%an", "subject": "%s", "body": "%b", "date": %at}'

The problem is that git does not do any escaping when using those format variables. This will generate invalid JSON if there are newline characters or other special characters like quotation marks inside the data.

I found several other solutions to the problem using custom scripts, but unfortunately some require installing interpreters like Node.js, some require specific git log ‑‑format options, or didn’t fully support options like ‑‑stat or ‑‑shortstat. Some solutions still did not even fully solve the string escaping issue.

A Better `git log` Parser

I decided jc would be a great git log parser. jc already supports around 100 other commands so this is right in jc‘s wheelhouse. I wanted to make the jc parser for git log as easy to use as the other parsers (e.g. jc git log), but also support more advanced git format and statistics options.

In addition, I wanted to support both JSON and JSON Lines conversion. git logs can become huge over time, so being able to emit JSON Lines can reduce the memory overhead that would be incurred by generating a huge JSON array of logs.

Finally, I wanted to add calculated timestamps (naive and time zone aware) to make the output more useful in scripts.

The new git log standard and streaming parsers are now bundled with jc. They work just like any other jc parser and support several git log ‑‑format options as well as ‑‑stat and ‑‑shortstat. No need to worry about escaping special characters or using a specific format string. It just works out of the box!

Here’s an example using both the fuller format option along with full stats using ‑‑stat.

$ git log --format=fuller --stat | jc --git-log -p
[
  {
    "commit": "af2c06cd284352eb47c44f2387d4600b1b322cbd",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Sun May 15 22:28:12 2022 -0700",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Sun May 15 22:28:12 2022 -0700",
    "stats": {
      "files_changed": 1,
      "insertions": 2,
      "deletions": 2,
      "files": [
        "docs/parsers/pip_show.md"
      ]
    },
    "message": "doc update",
    "epoch": 1652678892,
    "epoch_utc": null
  },
  {
    "commit": "67a4c6f797dfeaba2ba50222e879bf4fb58678f4",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Sun May 15 22:23:00 2022 -0700",
    "commit_by": "Kelly Brazil",
    "commit_by_email": "kellyjonbrazil@gmail.com",
    "commit_by_date": "Sun May 15 22:23:00 2022 -0700",
    "stats": {
      "files_changed": 2,
      "insertions": 4,
      "deletions": 4,
      "files": [
        "jc/parsers/pip_show.py",
        "tests/fixtures/generic/pip-show-multiline-license.json"
      ]
    },
    "message": "add initial \\n to first line of multiline fields",
    "epoch": 1652678580,
    "epoch_utc": null
  },
  ...
]

You could also use the magic syntax for the above example: jc ‑p git log ‑‑format=fuller ‑‑stat

Or, to output JSON Lines, use the streaming parser:

$ git log --format=fuller --stat | jc --git-log-s
{"commit":"af2c06cd284352eb47c44f2387d4600b1b322cbd","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:28:12 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:28:12 2022 -0700","stats":{"files_changed":1,"insertions":2,"deletions":2,"files":["docs/parsers/pip_show.md"]},"message":"doc update","epoch":1652678892,"epoch_utc":null}
{"commit":"67a4c6f797dfeaba2ba50222e879bf4fb58678f4","author":"Kelly Brazil","author_email":"kellyjonbrazil@gmail.com","date":"Sun May 15 22:23:00 2022 -0700","commit_by":"Kelly Brazil","commit_by_email":"kellyjonbrazil@gmail.com","commit_by_date":"Sun May 15 22:23:00 2022 -0700","stats":{"files_changed":2,"insertions":4,"deletions":4,"files":["jc/parsers/pip_show.py","tests/fixtures/generic/pip-show-multiline-license.json"]},"message":"add initial \\n to first line of multiline fields","epoch":1652678580,"epoch_utc":null}
...

Of course, other format options, like oneline, short, medium, and full are supported, as well as ‑‑shortstat. Check out the docs for all of the supported options. (standard and streaming)

In the end, I believe it would be better for there to be a JSON output option built-into git, but until then, there is jc.

Happy parsing!

Featured

JC Version 1.19.0 Released

I’m excited to announce the release of jc version 1.19.0 available on github and pypi. jc now supports 100 standard and streaming parsers. Thank you to the Open Source community for making this possible!

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

What’s New

Add git log streaming parser that outputs JSON lines (or a lazy Iterable when used as a python library). This is great for converting very large git logs to JSON so the entire log does not need to be loaded into RAM.
Add chage --list command parser tested on linux
Fix git log standard parser for corner-cases where commit hash values are the only value in a line in messages
Fix df command parser for rare instances when a newline is found at the end of the output
Allow jc to pip install on unsupported python version 3.6 since this version is still widely in use. Note that jc is only tested on officially supported python versions.
Fix asciitable-m parser to skip some rows that contain detected column separator characters in cell data. A warning message will be printed to STDERR unless -q or quiet=True is used.
New zip package for Windows. Simply unzip the files anywhere in the execution PATH.

New Parsers

`git log` command streaming parser

Support for the git log command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ git log | jc --git-log-s
{"commit":"a730ae18c8e81c5261db132df73cd74f272a0a26","author":"Kelly...}
{"commit":"930bf439c06c48a952baec05a9896c8d92b7693e","author":"Kelly...}

`chage --list` command parser

Linux support for the chage --list command. (Documentation)

$ chage --list joeuser | jc --chage -p
{
  "password_last_changed": "never",
  "password_expires": "never",
  "password_inactive": "never",
  "account_expires": "never",
  "min_days_between_password_change": 0,
  "max_days_between_password_change": 99999,
  "warning_days_before_password_expires": 7
}

Happy parsing!

Featured

A New Way to Parse Plain Text Tables

Every so often there are questions on sysadmin forums on how to parse and filter data from plain text tables. For example:

+----+-----------------------+--------------------------------+---------+
| id | name                  | url                            | version |
+----+-----------------------+--------------------------------+---------+
| 25 | example.com           | http://www.example.com/        | 3.8     |
| 34 | anotherexample.com    | https://anotherexample.com/    | 3.2     |
| 62 | yetanotherexample.com | https://yetanotherexample.com/ | 3.9     |
+----+-----------------------+--------------------------------+---------+

Traditionally you would use tools like grep, sed, and/or awk to grab the data you want from a table like this. Now there is a new way, with jc! Now, in version 1.18.6, jc can convert single-line and multi-line ASCII and Unicode tables to JSON with the asciitable and asciitable-m parsers. This then allows you to use JSON filters like jq or jello to filter the data and use in your Bash scripts or other applications.

Here’s how to use the new parsers:

$ echo '
> +----+-----------------------+--------------------------------+---------+
> | id | name                  | url                            | version |
> +----+-----------------------+--------------------------------+---------+
> | 25 | example.com           | http://www.example.com/        | 3.8     |
> | 34 | anotherexample.com    | https://anotherexample.com/    | 3.2     |
> | 62 | yetanotherexample.com | https://yetanotherexample.com/ | 3.9     |
> +----+-----------------------+--------------------------------+---------+
> ' | jc --asciitable -p
[
  {
    "id": "25",
    "name": "example.com",
    "url": "http://www.example.com/",
    "version": "3.8"
  },
  {
    "id": "34",
    "name": "anotherexample.com",
    "url": "https://anotherexample.com/",
    "version": "3.2"
  },
  {
    "id": "62",
    "name": "yetanotherexample.com",
    "url": "https://yetanotherexample.com/",
    "version": "3.9"
  }
]

If there are multi-line rows, then be sure to use the asciitable-m parser:

$ echo '
> ╒══════════╤═════════╤════════╕
> │ foo      │ bar     │ baz    │
> │          │         │ buz    │
> ╞══════════╪═════════╪════════╡
> │ good day │ 12345   │        │
> │ mate     │         │        │
> ├──────────┼─────────┼────────┤
> │ hi there │ abc def │ 3.14   │
> │          │         │        │
> ╘══════════╧═════════╧════════╛' | jc --asciitable-m -p
[
  {
    "foo": "good day\nmate",
    "bar": "12345",
    "baz_buz": null
  },
  {
    "foo": "hi there",
    "bar": "abc def",
    "baz_buz": "3.14"
  }
]

Many different table styles are supported, as long as there is a header row at the top of the table.

Of course, you can also use the parsers as python libraries:

>>> import jc
>>> table = '''
... Protocol  Address     Age (min)  Hardware Addr   Type   Interface
... Internet  10.12.13.1        98   0950.5785.5cd1  ARPA   FastEthernet2.13
... Internet  10.12.13.3       131   0150.7685.14d5  ARPA   GigabitEthernet2.13
... Internet  10.12.13.4       198   0950.5C8A.5c41  ARPA   GigabitEthernet2.17
... '''
>>> jc.parse('asciitable', table)
[{'protocol': 'Internet', 'address': '10.12.13.1', 'age_min': '98', 'hardware_addr': '0950.5785.5cd1', 'type': 'ARPA', 'interface': 'FastEthernet2.13'}, {'protocol': 'Internet', 'address': '10.12.13.3', 'age_min': '131', 'hardware_addr': '0150.7685.14d5', 'type': 'ARPA', 'interface': 'GigabitEthernet2.13'}, {'protocol': 'Internet', 'address': '10.12.13.4', 'age_min': '198', 'hardware_addr': '0950.5C8A.5c41', 'type': 'ARPA', 'interface': 'GigabitEthernet2.17'}]

This can be used to parse the output of some commands that output plaintext tables. For example, the virsh command:

# virsh list --all
 Id   Name          State
------------------------------
 3    rh8-vm01      running
 -    crc           shut off
 -    rh8-tower01   shut off
#
# virsh list -all | jc --asciitable -p
[
  {
    "id": "3",
    "name": "rh8-vm01",
    "state": "running"
  },
  {
    "id": "-",
    "name": "crc",
    "state": "shut off"
  },
  {
    "id": "-",
    "name": "rh8-tower01",
    "state": "shut off"
  }
]

Here’s how you can do the above in an Ansible playbook using the jc community.general plugin:

- name: Get virsh state
  hosts: ubuntu
  tasks:
  - shell: virsh list --all
    register: result
  - set_fact:
      virsh_data: "{{ result.stdout | community.general.jc('asciitable') }}"
  - debug:
      msg: "The virsh state is: {{ virsh_data[0].state }}"

For more information on jc, check out my post on Bringing the UNIX Philosophy to the 21st Century. See these posts for tips on how to use JSON in your Bash scripts.

Happy parsing!

Featured

JC Version 1.18.1 Released

I’m excited to announce the release of jc version 1.18.1 available on github and pypi. This release includes several enhancements when using jc as a Python library. Enhancements include some higher-level APIs and improved documentation to simplify the use of jc in Python programs and scripts. Error message improvements have been made on the CLI as well.

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

New Features

New high-level parse API that works for both builtin and custom plugin parsers

>>> import jc
>>> jc.parse('date', 'Thu Jan 27 11:40:00 PST 2022')
{'year': 2022, 'month': 'Jan', 'month_num': 1, 'day': 27, 'weekday': 'Thu', 'weekday_num': 4, 'hour': 11, 'hour_24': 11, 'minute': 40, 'second': 0, 'period': 'AM', 'timezone': 'PST', 'utc_offset': None, 'day_of_year': 27, 'week_of_year': 4, 'iso': '2022-01-27T11:40:00', 'epoch': 1643312400, 'epoch_utc': None, 'timezone_aware': False}

Several other high-level functions in jc.lib that allow you to gather detailed parser information:
- parser_mod_list() -> list
- plugin_parser_mod_list() -> list
- get_help(parser_module_name: str) -> None
Enhanced CLI error messages for certain OS errors that can happen when using the “magic syntax” (file permission errors, etc.)

v1.18.2 Updates

Enhanced documentation for public functions, including type annotations
Additional high-level convenience functions:
- parser_info(parser_module_name: str) -> dict
- all_parser_info() -> list[dict]
Enhanced CLI error message to suggest setting locale to C when parsing errors occur
Bug fix for plugin parsers with underscore(s) in the name

v1.18.3 Updates

Add rsync command and log file parser tested on linux and macOS
Add rsync command and log file streaming parser tested on linux and macOS
Add xrandr command parser tested on linux
Enhance timestamp performance with caching and format hints
Refactor ignore_exceptions functionality in streaming parsers
Fix man page in packages

`rsync` command parser

Linux and macOS support for the rsync command. (Documentation):

$ rsync -i -a source/ dest | jc --rsync -p          # or  jc -p rsync -i -a source/ dest
[
  {
    "summary": {
      "sent": 1708,
      "received": 8209,
      "bytes_sec": 19834.0,
      "total_size": 235,
      "speedup": 0.02
    },
    "files": [
      {
        "filename": "./",
        "metadata": ".d..t......",
        "update_type": "not updated",
        "file_type": "directory",
        "checksum_or_value_different": false,
        "size_different": false,
        "modification_time_different": true,
        "permissions_different": false,
        "owner_different": false,
        "group_different": false,
        "acl_different": false,
        "extended_attribute_different": false
      },
      ...
    ]
  }
]

`rsync` command streaming parser

Linux support for the rsync command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ rsync -i -a source/ dest | jc --rsync-s
{"type":"file","filename":"./","metadata":".d..t......","update_...}
...

`xrandr` command parser

Linux support for the xrandr command. (Documentation):

$ xrandr | jc --xrandr -p          # or  jc -p xrandr
{
  "screens": [
    {
      "screen_number": 0,
      "minimum_width": 8,
      "minimum_height": 8,
      "current_width": 1920,
      "current_height": 1080,
      "maximum_width": 32767,
      "maximum_height": 32767,
      "associated_device": {
        "associated_modes": [
          {
            "resolution_width": 1920,
            "resolution_height": 1080,
            "is_high_resolution": false,
            "frequencies": [
              {
                "frequency": 60.03,
                "is_current": true,
                "is_preferred": true
              },
              {
                "frequency": 59.93,
                "is_current": false,
                "is_preferred": false
              }
            ]
          },
          {
            "resolution_width": 1680,
            "resolution_height": 1050,
            "is_high_resolution": false,
            "frequencies": [
              {
                "frequency": 59.88,
                "is_current": false,
                "is_preferred": false
              }
            ]
          }
        ],
        "is_connected": true,
        "is_primary": true,
        "device_name": "eDP1",
        "resolution_width": 1920,
        "resolution_height": 1080,
        "offset_width": 0,
        "offset_height": 0,
        "dimension_width": 310,
        "dimension_height": 170
      }
    }
  ],
  "unassociated_devices": []
}

v1.18.4 Updates

Add nmcli command parser tested on linux
Enhance parse error messages at the cli
Add standard and streaming parser list functions to the public API
Enhance python developer documentation formatting

`nmcli` command parser

Linux support for the nmcli command. (Documentation):

$ nmcli connection show ens33 | jc --nmcli -p          # or  jc -p nmcli connection show ens33
[
  {
    "connection_id": "ens33",
    "connection_uuid": "d92ece08-9e02-47d5-b2d2-92c80e155744",
    "connection_stable_id": null,
    "connection_type": "802-3-ethernet",
    "connection_interface_name": "ens33",
    "connection_autoconnect": "yes",
    ...
    "ip4_address_1": "192.168.71.180/24",
    "ip4_gateway": "192.168.71.2",
    "ip4_route_1": {
      "dst": "0.0.0.0/0",
      "nh": "192.168.71.2",
      "mt": 100
    },
    "ip4_route_2": {
      "dst": "192.168.71.0/24",
      "nh": "0.0.0.0",
      "mt": 100
    },
    "ip4_dns_1": "192.168.71.2",
    "ip4_domain_1": "localdomain",
    "dhcp4_option_1": {
      "name": "broadcast_address",
      "value": "192.168.71.255"
    },
    ...
    "ip6_address_1": "fe80::c1cb:715d:bc3e:b8a0/64",
    "ip6_gateway": null,
    "ip6_route_1": {
      "dst": "fe80::/64",
      "nh": "::",
      "mt": 100
    }
  }
]

v1.18.5 Updates

Fix date parser to ensure AM/PM period string is always uppercase. Fixes broken tests in some locales

v1.18.6 Updates

Add pidstat command parser tested on linux
Add pidstat command streaming parser tested on linux
Add mpstat command parser tested on linux
Add mpstat command streaming parser tested on linux
Add single-line ASCII and Unicode table parser
Add multi-line ASCII and Unicode table parser
Add a documentation option to parser_info() and all_parser_info()

`pidstat` command parser

Linux support for the pidstat command. (Documentation):

$ pidstat -hl | jc --pidstat -p          # or  jc -p pidstat -hl
[
  {
    "time": 1646859134,
    "uid": 0,
    "pid": 1,
    "percent_usr": 0.0,
    "percent_system": 0.03,
    "percent_guest": 0.0,
    "percent_cpu": 0.03,
    "cpu": 0,
    "command": "/usr/lib/systemd/systemd --switched-root --system..."
  },
  {
    "time": 1646859134,
    "uid": 0,
    "pid": 6,
    "percent_usr": 0.0,
    "percent_system": 0.0,
    "percent_guest": 0.0,
    "percent_cpu": 0.0,
    "cpu": 0,
    "command": "ksoftirqd/0"
  },
  {
    "time": 1646859134,
    "uid": 0,
    "pid": 2263,
    "percent_usr": 0.0,
    "percent_system": 0.0,
    "percent_guest": 0.0,
    "percent_cpu": 0.0,
    "cpu": 0,
    "command": "kworker/0:0"
  }
]

`pidstat` command streaming parser

Linux support for the pidstat command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ pidstat -hl | jc --pidstat-s
{"time":1646859134,"uid":0,"pid":1,"percent_usr":0.0,"percent_syste...}
{"time":1646859134,"uid":0,"pid":6,"percent_usr":0.0,"percent_syste...}
{"time":1646859134,"uid":0,"pid":9,"percent_usr":0.0,"percent_syste...}
...

`asciitable` ASCII and Unicode table parser

Supports parsing various styles of plain text tables. (Documentation):

$ echo '
>     ╒══════════╤═════════╤════════╕
>     │ foo      │ bar     │ baz    │
>     ╞══════════╪═════════╪════════╡
>     │ good day │         │ 12345  │
>     ├──────────┼─────────┼────────┤
>     │ hi there │ abc def │ 3.14   │
>     ╘══════════╧═════════╧════════╛' | jc --asciitable -p
[
  {
    "foo": "good day",
    "bar": null,
    "baz": "12345"
  },
  {
    "foo": "hi there",
    "bar": "abc def",
    "baz": "3.14"
  }
]

`asciitable-m` multi-line ASCII and Unicode table parser

Supports parsing various styles of plain text tables with multi-line rows. (Documentation):

$ echo '
> +----------+---------+--------+
> | foo      | bar     | baz    |
> |          |         | buz    |
> +==========+=========+========+
> | good day | 12345   |        |
> | mate     |         |        |
> +----------+---------+--------+
> | hi there | abc def | 3.14   |
> |          |         |        |
> +==========+=========+========+' | jc --asciitable-m -p
[
  {
    "foo": "good day\nmate",
    "bar": "12345",
    "baz_buz": null
  },
  {
    "foo": "hi there",
    "bar": "abc def",
    "baz_buz": "3.14"
  }
]

v1.18.7 Updates

Add git-log command parser tested on linux and macOS
Add update-alternatives --query command parser tested on linux
Add update-alternatives --get-selections command parser tested on linux
Fix key/value and INI parsers to allow duplicate keys
Fix YAML file parser for files including timestamp objects
Update xrandr parser: add a rotation field
Fix failing tests by moving template files
Add python interpreter version and path to -v and -a output

`git-log` command parser

Linux support for the git log command. (Documentation):

$ git log --stat | jc --git-log -p          or:  jc -p git log --stat
[
  {
    "commit": "728d882ed007b3c8b785018874a0eb06e1143b66",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Wed Apr 20 09:50:19 2022 -0400",
    "stats": {
      "files_changed": 2,
      "insertions": 90,
      "deletions": 12,
      "files": [
        "docs/parsers/git_log.md",
        "jc/parsers/git_log.py"
      ]
    },
    "message": "add timestamp docs and examples",
    "epoch": 1650462619,
    "epoch_utc": null
  },
  {
    "commit": "b53e42aca623181aa9bc72194e6eeef1e9a3a237",
    "author": "Kelly Brazil",
    "author_email": "kellyjonbrazil@gmail.com",
    "date": "Wed Apr 20 09:44:42 2022 -0400",
    "stats": {
      "files_changed": 5,
      "insertions": 29,
      "deletions": 6,
      "files": [
        "docs/parsers/git_log.md",
        "docs/utils.md",
        "jc/parsers/git_log.py",
        "jc/utils.py",
        "man/jc.1"
      ]
    },
    "message": "add calculated timestamp",
    "epoch": 1650462282,
    "epoch_utc": null
  }
]

`update-alternatives --query` command parser

Linux support for the update-alternatives --query command. (Documentation):

$ update-alternatives --query editor | jc --update-alt-q -p          # or:  jc -p update-alternatives --query editor
{
  "name": "editor",
  "link": "/usr/bin/editor",
  "slaves": [
    {
      "name": "editor.1.gz",
      "path": "/usr/share/man/man1/editor.1.gz"
    },
    {
      "name": "editor.da.1.gz",
      "path": "/usr/share/man/da/man1/editor.1.gz"
    }
  ],
  "status": "auto",
  "best": "/bin/nano",
  "value": "/bin/nano",
  "alternatives": [
    {
      "name": "/bin/ed",
      "priority": -100,
      "slaves": [
        {
          "name": "editor.1.gz",
          "path": "/usr/share/man/man1/ed.1.gz"
        }
      ]
    },
    {
      "name": "/bin/nano",
      "priority": 40,
      "slaves": [
        {
          "name": "editor.1.gz",
          "path": "/usr/share/man/man1/nano.1.gz"
        }
      ]
    }
  ]
}

`update-alternatives --get-selections` command parser

Linux support for the update-alternatives --get-selections command. (Documentation):

$ update-alternatives --get-selections | jc --update-alt-gs -p          # or:  jc -p update-alternatives --get-selections
[
  {
    "name": "arptables",
    "status": "auto",
    "current": "/usr/sbin/arptables-nft"
  },
  {
    "name": "awk",
    "status": "auto",
    "current": "/usr/bin/gawk"
  }
]

v1.18.8 Updates

Fix update-alternatives --query parser for cases where slaves are not present
Fix UnicodeEncodeError on some systems where LANG=C is set and Unicode characters are in the output
Update history command parser: do not drop non-ASCII characters if the system is configured for UTF-8 encoding
Enhance “magic syntax” to always use UTF-8 encoding

Featured

Tips on Adding JSON Output to Your CLI App

A couple of years ago I wrote a somewhat controversial article on the topic of Bringing the Unix Philosophy to the 21st Century by adding a JSON output option to CLI tools. This allows easier parsing in scripts by using JSON parsing tools like jq, jello, jp, etc. without arcane awk, sed, cut, tr, reverse, etc. incantations.

It was controversial because there seem to be a lot of folks who don’t think writing bespoke parsers for each task is a big deal. Others think JSON is evil. There are strong feelings as can be seen in response to the article in the comments and also on Hacker News and Reddit.

I’ll let the next generation of DevOps practitioners and developers come to their own conclusions on the basis of our arguments, but the tide is already turning. Something that was just wishful thinking a couple years ago is now becoming a reality! Now, more and more command line applications are offering JSON output as an option. And with jc, JSON output can even be coaxed out of older command line applications.

Structured Output Support is Increasing

Now, there are many new command line applications that offer structured output as an option, and even some older ones are adding the option. I find that more and more often when a parser is requested for jc, if I check the man page for the application, there is already a JSON or XML output option. Some examples include nvidia-smi, ffprobe, docker CLI, and tree. Even ip now supports JSON output with ip route, which wasn’t supported when I originally wrote about it in the article.

I recently developed standard and streaming parsers for the iostat command and found that versions 11 and above now have a JSON output option. Way to go!

But when looking at the JSON options for some of these commands, I found some things that could be improved.

JSON Output Do’s and Don’ts

While developing over 80 parsers for the jc project, I stumbled upon some best practices. My first goal was to make getting the data easy when using jq, as that was the only CLI JSON processing tool I really used at the time. With that initial goal, and input from scores of users, this is how I try to make the highest quality JSON output:

Note: Many of these are completely subjective and are just my humble opinion. I’m willing to keep an open mind about these choices.

Do Make a Schema
Do Flatten the Structure
Do Output JSON Lines for Streaming Output
Do Use Predictable Key Names
Do Pretty Print with Two Spaces or Don’t Format at All
Don’t Use Special Characters in Key Names
Don’t Allow Duplicate Keys
Don’t Use Very Large Numbers
BONUS

Let’s take a look at these in more detail.

Do

Here are some good practices when generating JSON output:

Make a Schema

When possible, which is almost always the case, I document a schema for the JSON output. This allows the user to know where they can always find an attribute and which type they can expect. (string, integer, float, boolean, null, object, array) This also allows you to test the output to make sure it conforms to the schema and there are no bugs.

A schema doesn’t have to be complicated. It can be specified in the documentation, including the man page. I use this simple structure for jc documentation:

[
  {
    "foo":      string,
    "bar":      float,   # null if blank
    "baz": [
                integer
    ]
  }
]

Flatten the Structure

The best case is to output an object or an array of objects (most common) with no further nesting. Sometimes you can prefix an attribute name if nesting is not absolutely necessary. The idea is to make it as easy for the user to grab the value so they don’t need to traverse the data structure to get what they want.

Sometimes this:

[
  {
    "cpu": {
      "speed": 5,
      "temp": 33.2
    },
    "ram": {
      "speed": 11,
      "mb": 1024
    }
  }
]

Can be turned into this:

[
  {
    "cpu_speed": 5,
    "cpu_temp": 33.2,
    "ram_speed": 11,
    "ram_mb": 1024
  }
]

This way I can easily filter the data in jq or other tools without having to traverse levels. Of course, this is not always possible or desirable, but keeping a flat structure should be considered for user convenience.

This approach is also great for output that contains a long list of items. I’ll pick on iostat a bit here to make a point – but don’t take this the wrong way – I’m thrilled that the author of iostat has included a JSON output option and in no way want to discount the work put into that.

The iostat JSON output option deeply nests the output like so:

{
  "sysstat": {
    "hosts": [
      {
        "nodename": "ubuntu",
        "sysname": "Linux",
        "release": "5.8.0-53-generic",
        "machine": "x86_64",
        "number-of-cpus": 2,
        "date": "12/03/2021",
        "statistics": [
          {
            "avg-cpu": {
              "user": 0.6,
              "nice": 0.01,
              "system": 1.68,
              "iowait": 0.14,
              "steal": 0,
              "idle": 97.58
            },
            "disk": [
              {
                "disk_device": "dm-0",
                "tps": 29.07,
                "kB_read/s": 502.25,
                "kB_wrtn/s": 54.94,
                "kB_dscd/s": 0,
                "kB_read": 251601,
                "kB_wrtn": 27524,
                "kB_dscd": 0
              },
...

This makes sense and is very logical when you look at the output as an entire JSON document, but when dealing with command output from certain commands like iostat, vmstat, ping, ls, etc. which can have huge – even unlimited – amounts of output, it might make more sense to build the JSON structure into a format that is more easily consumed by tools like jq to be used in a pipeline.

With this structure, the whole document needs to be loaded before the JSON is considered valid and searchable, though iostat output can actually go on indefinitely depending on how the command is run.

I took a different approach with the jc iostat parser by using a flat array of objects and simply using a type attribute to denote which type of object it is. This allows very easy filtering in jq or other tools and also allows consistency with the streaming parser, which I’ll get to in another section.

Here’s the jc version:

[
  {
    "percent_user": 0.31,
    "percent_nice": 0.23,
    "percent_system": 0.48,
    "percent_iowait": 0.04,
    "percent_steal": 0.0,
    "percent_idle": 98.95,
    "type": "cpu"
  },
  {
    "device": "dm-0",
    "tps": 8.16,
    "kb_read_s": 137.26,
    "kb_wrtn_s": 129.0,
    "kb_dscd_s": 0.0,
    "kb_read": 395021,
    "kb_wrtn": 371240,
    "kb_dscd": 0,
    "type": "device"
  },
  {
    "device": "loop0",
    "tps": 0.01,
    "kb_read_s": 0.12,
    "kb_wrtn_s": 0.0,
    "kb_dscd_s": 0.0,
    "kb_read": 344,
    "kb_wrtn": 0,
    "kb_dscd": 0,
    "type": "device"
  },
...
]

You’ll notice that jc doesn’t bother with metadata around the source of the data that generated the output or even the host statistics. This is because including the source just makes the object nesting deeper without adding value, and the header information is available with other tools like uname and date, though I could add them in a future parser version as an object with its own type if users want that data.

Getting to the data in this structure is pretty easy: just loop over the array, filter by type (if needed), and pull attributes from the top-level of each object.

Output JSON Lines for Streaming Output

There’s another advantage to the array of flat objects structure discussed above, and that’s for programs like iostat and others that can stream output forever until the user hits <ctrl-c>. In this case, it would be difficult to pipe the output to a JSON filter, like jq, since the output would not be valid JSON until the program ends.

For these cases, outputting JSON Lines (aka NDJSON) is a good choice.

Unfortunately, this is what the iostat output looks like when running it indefinitely:

$ iostat 1 -o JSON
{"sysstat": {
  "hosts": [
    {
      "nodename": "ubuntu",
      "sysname": "Linux",
      "release": "5.8.0-53-generic",
      "machine": "x86_64",
      "number-of-cpus": 2,
      "date": "12/03/2021",
      "statistics": [
        {
          "avg-cpu":  {"user": 1.23, "nice": 0.86, "system": 1.23, "iowait": 0.06, "steal": 0.00, "idle": 96.62},
          "disk": [
            {"disk_device": "dm-0", "tps": 30.16, "kB_read/s": 138.78, "kB_wrtn/s": 476.19, "kB_dscd/s": 0.00, "kB_read": 654975, "kB_wrtn": 2247452, "kB_dscd": 0},
            {"disk_device": "sr0", "tps": 0.13, "kB_read/s": 4.89, "kB_wrtn/s": 0.00, "kB_dscd/s": 0.00, "kB_read": 23067, "kB_wrtn": 0, "kB_dscd": 0}
          ]
        },
        {
          "avg-cpu":  {"user": 0.00, "nice": 0.00, "system": 0.00, "iowait": 0.00, "steal": 0.00, "idle": 100.00},
          "disk": [
            {"disk_device": "dm-0", "tps": 0.00, "kB_read/s": 0.00, "kB_wrtn/s": 0.00, "kB_dscd/s": 0.00, "kB_read": 0, "kB_wrtn": 0, "kB_dscd": 0},
            {"disk_device": "sr0", "tps": 0.00, "kB_read/s": 0.00, "kB_wrtn/s": 0.00, "kB_dscd/s": 0.00, "kB_read": 0, "kB_wrtn": 0, "kB_dscd": 0}
          ]
        },
        {
          "avg-cpu":  {"user": 0.00, "nice": 0.00, "system": 0.50, "iowait": 0.00, "steal": 0.00, "idle": 99.50},
          "disk": [
            {"disk_device": "dm-0", "tps": 5.00, "kB_read/s": 0.00, "kB_wrtn/s": 20.00, "kB_dscd/s": 0.00, "kB_read": 0, "kB_wrtn": 20, "kB_dscd": 0},
            {"disk_device": "sr0", "tps": 0.00, "kB_read/s": 0.00, "kB_wrtn/s": 0.00, "kB_dscd/s": 0.00, "kB_read": 0, "kB_wrtn": 0, "kB_dscd": 0}
          ]
        }
...

This is not easily parsable downstream when used in a pipeline:

$ iostat 1 -o JSON | jq
^C     # hangs forever until <ctrl-c> is entered and no JSON is filtered

The author of iostat did do a cool thing, though, and correctly wrapped the output in the final end brackets when the <ctrl-c> sequence is caught. So it does finally create a valid JSON document, but I’m not sure all developers will have the forethought to do this. Still, this does not solve the pipelining problem.

Instead, the streaming iostat parser in jc outputs JSON lines with the same schema as the standard parser. Basically, the only difference is that there are no beginning and ending array brackets and each object is compact printed on its own line. This allows JSON processors like jq to work on each line immediately as they are emitted:

$ iostat 1 | jc --iostat-s -u | jq -c
{"percent_user":1.11,"percent_nice":0.78,"percent_system":1.12,"percent_iowait":0.05,"percent_steal":0.0,"percent_idle":96.94,"type":"cpu"}
{"device":"dm-0","tps":27.4,"kb_read_s":125.07,"kb_wrtn_s":430.11,"kb_dscd_s":0.0,"kb_read":654987,"kb_wrtn":2252376,"kb_dscd":0,"type":"device"}
{"device":"loop0","tps":0.02,"kb_read_s":0.16,"kb_wrtn_s":0.0,"kb_dscd_s":0.0,"kb_read":862,"kb_wrtn":0,"kb_dscd":0,"type":"device"}
{"percent_user":2.53,"percent_nice":0.0,"percent_system":1.52,"percent_iowait":0.0,"percent_steal":0.0,"percent_idle":95.96,"type":"cpu"}
{"device":"dm-0","tps":19.0,"kb_read_s":0.0,"kb_wrtn_s":76.0,"kb_dscd_s":0.0,"kb_read":0,"kb_wrtn":76,"kb_dscd":0,"type":"device"}
{"device":"loop0","tps":0.0,"kb_read_s":0.0,"kb_wrtn_s":0.0,"kb_dscd_s":0.0,"kb_read":0,"kb_wrtn":0,"kb_dscd":0,"type":"device"}
{"percent_user":1.01,"percent_nice":0.0,"percent_system":0.0,"percent_iowait":0.0,"percent_steal":0.0,"percent_idle":98.99,"type":"cpu"}
{"device":"dm-0","tps":0.0,"kb_read_s":0.0,"kb_wrtn_s":0.0,"kb_dscd_s":0.0,"kb_read":0,"kb_wrtn":0,"kb_dscd":0,"type":"device"}
{"device":"loop0","tps":0.0,"kb_read_s":0.0,"kb_wrtn_s":0.0,"kb_dscd_s":0.0,"kb_read":0,"kb_wrtn":0,"kb_dscd":0,"type":"device"}
...

Tip: If you include a JSON Lines output option, you might also want to include an ‘unbuffer’ option.
When directly printing to the terminal, the OS will disable buffering, but when piping to other programs there will be a buffer typically around 4KB. If the emitted JSON is small, it will look like the terminal is hung. This is why jc offers the -u, or ‘unbuffer’ option like many other filtering tools do.
Note, that there may be a performance impact to disabling the buffer, so it should only be disabled while troubleshooting the pipeline in the terminal.

Use Predictable Key Names

This one basically comes down to “don’t dynamically generate key names”. If key names aren’t static and predictable, it makes it difficult to have a good Schema and also makes it difficult for users to find the data.

Instead of doing something like this:

{
  "Interface 1": [
    "192.168.1.1",
    "172.16.1.1"
  ],
  "Wifi Interface 1": [
    "10.1.1.1"
  ]
}

Do this:

[
  {
    "interface": "Interface 1",
    "ip_addresses": [
      "192.168.1.1",
      "172.16.1.1"
    ]
  },
  {
    "interface": "Wifi Interface 1",
    "ip_addresses": [
      "10.1.1.1"
    ]
  }
]

This is a self-documented structure and the user can simply iterate over all of the objects to get the interface names and IP addresses they want. They can still get it the other way, but it’s not as straightforward and it also doesn’t allow you to have a nicely documented Schema.

Pretty Print with Two Spaces or Don’t Format at All

Higher-level languages like Python allow very easy formatting of the JSON output, so I typically see the issue of ugly formatted JSON with programs written in C:

iostat JSON output formatting is not optimized for terminal line wrapping.

What is going on here? Actually – I can see what the developer was doing – it does look quite nice outside of the terminal when pasted into a text editor, but while inside the terminal the line wrapping makes it very unreadable.

I like the look of two-space indentation with JSON – maybe because that’s the way jq formats it and I’m just used to it.

There’s really no need to format JSON output at all. If it makes your code simpler, just generate the JSON with no newlines or spaces. This is more compact and the user can just as easily pipe the output through jq or other tools to format it.

If you do choose to format the JSON, then take a cue from jq and use two spaces of indent and don’t coalesce brackets. Like so:

$ iostat -o JSON | jq
{
  "sysstat": {
    "hosts": [
      {
        "nodename": "ubuntu",
        "sysname": "Linux",
        "release": "5.8.0-53-generic",
        "machine": "x86_64",
        "number-of-cpus": 2,
        "date": "12/03/2021",
        "statistics": [
          {
            "avg-cpu": {
              "user": 0.6,
              "nice": 0.01,
              "system": 1.68,
              "iowait": 0.14,
              "steal": 0,
              "idle": 97.58
            },
            "disk": [
              {
                "disk_device": "dm-0",
                "tps": 29.07,
                "kB_read/s": 502.25,
                "kB_wrtn/s": 54.94,
                "kB_dscd/s": 0,
                "kB_read": 251601,
                "kB_wrtn": 27524,
                "kB_dscd": 0
              },
              <SNIP>
              {
                "disk_device": "sr0",
                "tps": 0.19,
                "kB_read/s": 6.27,
                "kB_wrtn/s": 0,
                "kB_dscd/s": 0,
                "kB_read": 3139,
                "kB_wrtn": 0,
                "kB_dscd": 0
              }
            ]
          }
        ]
      }
    ]
  }
}

Beggars can’t be choosers, so I’ll take ugly JSON over no JSON any day. But again, compact JSON with no spaces and newlines is perfectly fine. Anyone working with JSON knows to use jq or other tools to make it easy to read in the terminal.

Don’t

Try to avoid these JSON smells:

Don’t Use Special Characters in Key Names

There’s nothing more annoying than having to encapsulate an attribute name in brackets because it has special characters or spaces in it.

$ echo '{"Foo/ foo": "bar"}' | jq '.Foo/ foo'
jq: error: foo/0 is not defined at <top-level>, line 1:
.Foo/ foo      
jq: 1 compile error

$ echo '{"Foo/ foo": "bar"}' | jq '.["Foo/ foo"]'
"bar"

Don’t make your users do that! This can also be a consequence of dynamically generating your keys, as discussed in a section above. Instead, keep all key characters lower-case and convert special characters to underscores (‘_‘) to keep them alphanumeric.

Underscores are better than dashes because they allow you to select the entire key with a double-click in most IDEs and text editors. Dashes will typically only select a section of the key name.

Don’t Allow Duplicate Keys

If you are dynamically generating key names it may be possible for duplicates to appear in an object. If there is a possibility of this, wrap those items in individual objects. Duplicate keys are undefined in JSON and may cause different behavior depending on the client.

Don’t Use Extremely Large Numbers

JSON has nice typing, but unfortunately the numeric data type is underspecified in the standard and may have different behavior with different clients. This can happen if you output a long UUID as a number – the UUID may actually not turn out to be the same on all clients! If you have a very large number, it’s probably best to just wrap it in a string so it doesn’t get mangled downstream.

Don’t Use XML

Just joking! Any standard structured output is better than plain text in many cases, and sometimes (but not often) XML is a better choice than JSON. I prefer JSON for its readability, support ecosystem, and for its support for maps, arrays, and limited types. After developing JSON schemas for over 80 CLI parsers I’ve found that there’s not much JSON can’t do for this type of output.

In Conclusion

Always think of the end-user and how they will interact with the data. By following these steps, you can keep the users from having to jump through extra hoops to get to the data they want:

Make a Schema
Flatten the Structure
Output JSON Lines for Streaming Output
Use Predictable Key Names
Pretty Print with Two Spaces or Don’t Format at All
Don’t Use Special Characters in Key Names
Don’t Allow Duplicate Keys
Don’t Use Very Large Numbers

This is clearly not an exhaustive list. Did I miss any of your pet peeves? Let me know in the comments!

Featured

JC Version 1.17.0 Released

See below for v1.17.x updates

I’m excited to announce the release of jc version 1.17.0 available on github and pypi. This release includes streaming parser support, including three new streaming parsers (ls-s, ping-s, and vmstat-s) and one new standard parser (vmstat), bringing the total number of parsers to 78.

The streaming parsers output JSON Lines (aka NDJSON), which can be ingested by streaming processors like jq, elastic, Splunk, etc. These parsers use significantly less memory while converting large amounts of output (e.g. ls -lR /), and in some cases can be faster than standard parsers. Just like standard parsers, streaming parsers can be used both at the CLI and as Python libraries. When used as Python libraries, parse() is a generator function and returns an iterator which can be used in a loop for lazy processing of the stream.

The -u CLI option has been added to unbuffer the output. This is useful when piping jc output to another process like jq and the input stream is very slow (e.g. ping output). With the unbuffer option enabled you will be able to see the JSON output immediately when using streaming parsers in this scenario instead of waiting for the buffer to be filled.

Streaming parsers also have an ignore_exceptions option (-qq on the CLI) to allow uninterrupted processing in case any unexpected parsing errors occur. This can be used for long-lived streams so the pipe will not be broken if there is a hiccup in the stream. When this option is used, a _jc_meta object with a success attribute is added to each emitted JSON object. This allows the downstream application to decide whether to ignore the unparsable lines or further process those lines.

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

New Features

Warning and Error messages now wrap to the terminal width.
Support for streaming parsers for much lower memory consumption when converting large amounts of command output.
-u CLI option unbuffers jc output. This is useful when converting slow output like ping through the ping-s streaming parser.
-qq CLI option makes jc “extra quiet” for streaming parsers. This equates to the ignore_exceptions argument in the streaming parser’s parse() function when using jc as a Python library.

When using this option, the streaming parser will not stop when parsing errors are encountered. Instead, a _jc_meta object included in the JSON output will have success set to false and the error and line attributes will be set to the error message and original line contents, respectively. Here are examples of the additional _jc_meta object:

Successfully parsed line with -qq option:

{
  "foo": "data1",
  "bar": "data2",
  "baz": "data3",
  "_jc_meta": {
    "success": true
  }
}

Unsuccessfully parsed line with -qq option:

{
  "_jc_meta": {
    "success": false,
    "error": "error message",
    "line": "original line data"
  }
}

New Parsers

jc now supports 78 parsers. New parsers include vmstat and three streaming parsers: ls-s, ping-s, and vmstat-s.

Streaming parsers are considered Beta quality. Even though the streaming parsers have gone through extensive testing, I would like to get more feedback from users before considering them 1.0. Please try them out and provide any feedback as a github issue.
Also, feel free to open a github issue if you have recommendations for other streaming parsers. Currently I’m thinking about adding streaming parsers for CSV, YAML, and XML documents in a future release.

Documentation and schemas for all parsers can be found here.

`vmstat` command parser

Linux support for the vmstat command. (Documentation):

$ vmstat | jc --vmstat -p          # or jc -p vmstat
[
  {
    "runnable_procs": 2,
    "uninterruptible_sleeping_procs": 0,
    "virtual_mem_used": 0,
    "free_mem": 2794468,
    "buffer_mem": 2108,
    "cache_mem": 741208,
    "inactive_mem": null,
    "active_mem": null,
    "swap_in": 0,
    "swap_out": 0,
    "blocks_in": 1,
    "blocks_out": 3,
    "interrupts": 29,
    "context_switches": 57,
    "user_time": 0,
    "system_time": 0,
    "idle_time": 99,
    "io_wait_time": 0,
    "stolen_time": 0,
    "timestamp": null,
    "timezone": null
  }
]

`vmstat-s` streaming command parser

Linux support for the vmstat command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ vmstat | jc --vmstat-s
{"runnable_procs":2,"uninterruptible_sleeping_procs":...timestamp":null,"timezone":null}

`ls-s` streaming command parser

Linux, macOS, and BSD support for the ls command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ ls -l /usr/bin | jc --ls-s
{"filename":"2to3-","flags":"-rwxr-xr-x","links":4,"owner":"root","group":"wheel","size":925,"date":"Feb 22 2019"}
{"filename":"2to3-2.7","link_to":"../../System/Library/Frameworks/Python.framework/Versions/2.7/bin/2to3-2.7","flags":"lrwxr-xr-x","links":1,"owner":"root","group":"wheel","size":74,"date":"May 4 2019"}
{"filename":"AssetCacheLocatorUtil","flags":"-rwxr-xr-x","links":1,"owner":"root","group":"wheel","size":55152,"date":"May 3 2019"}
...

`ping-s` streaming command parser

Linux, macOS, and BSD support for the ping and ping6 commands. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ ping 1.1.1.1 | jc --ping-s
{"type":"reply","destination_ip":"1.1.1.1","sent_bytes":56,"pattern":null,"response_bytes":64,"response_ip":"1.1.1.1","icmp_seq":0,"ttl":56,"time_ms":23.703}
{"type":"reply","destination_ip":"1.1.1.1","sent_bytes":56,"pattern":null,"response_bytes":64,"response_ip":"1.1.1.1","icmp_seq":1,"ttl":56,"time_ms":22.862}
{"type":"reply","destination_ip":"1.1.1.1","sent_bytes":56,"pattern":null,"response_bytes":64,"response_ip":"1.1.1.1","icmp_seq":2,"ttl":56,"time_ms":22.82}
...

Updated Parsers

No updated parsers in this release

Schema Changes

No schema changes in this release

Happy parsing!

For more information on the motivations for creating jc, see my blog post.

v1.17.1 Updates

Fix file parser for gzip files
Fix uname parser for cases where the ‘processor’ and/or ‘hardware_platform’ fields are missing on linux
Fix uname parser on FreeBSD
Add lsusb parser tested on linux
Add CSV file streaming parser
Add testing for Python 3.10.0

`lsusb` command parser

Linux support for the lsusb command. (Documentation):

$ lsusb -v | jc --lsusb -p          # or: jc -p lsusb -v
[
  {
    "bus": "002",
    "device": "001",
    "id": "1d6b:0001",
    "description": "Linux Foundation 1.1 root hub",
    "device_descriptor": {
      "bLength": {
        "value": "18"
      },
      "bDescriptorType": {
        "value": "1"
      },
      "bcdUSB": {
        "value": "1.10"
      },
      ...
      "bNumConfigurations": {
        "value": "1"
      },
      "configuration_descriptor": {
        "bLength": {
          "value": "9"
        },
        ...
        "iConfiguration": {
          "value": "0"
        },
        "bmAttributes": {
          "value": "0xe0",
          "attributes": [
            "Self Powered",
            "Remote Wakeup"
          ]
        },
        "MaxPower": {
          "description": "0mA"
        },
        "interface_descriptors": [
          {
            "bLength": {
              "value": "9"
            },
            ...
            "bInterfaceProtocol": {
              "value": "0",
              "description": "Full speed (or root) hub"
            },
            "iInterface": {
              "value": "0"
            },
            "endpoint_descriptors": [
              {
                "bLength": {
                  "value": "7"
                },
                ...
                "bmAttributes": {
                  "value": "3",
                  "attributes": [
                    "Transfer Type  Interrupt",
                    "Synch Type  None",
                    "Usage Type  Data"
                  ]
                },
                "wMaxPacketSize": {
                  "value": "0x0002",
                  "description": "1x 2 bytes"
                },
                "bInterval": {
                  "value": "255"
                }
              }
            ]
          }
        ]
      }
    },
    "hub_descriptor": {
      "bLength": {
        "value": "9"
      },
      ...
      "wHubCharacteristic": {
        "value": "0x000a",
        "attributes": [
          "No power switching (usb 1.0)",
          "Per-port overcurrent protection"
        ]
      },
      ...
      "hub_port_status": {
        "Port 1": {
          "value": "0000.0103",
          "attributes": [
            "power",
            "enable",
            "connect"
          ]
        },
        "Port 2": {
          "value": "0000.0103",
          "attributes": [
            "power",
            "enable",
            "connect"
          ]
        }
      }
    },
    "device_status": {
      "value": "0x0001",
      "description": "Self Powered"
    }
  }
]

`csv-s` streaming command parser

Support for CSV files. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ cat homes.csv
"Sell", "List", "Living", "Rooms", "Beds", "Baths", "Age", "Acres", "Taxes"
142, 160, 28, 10, 5, 3,  60, 0.28,  3167
175, 180, 18,  8, 4, 1,  12, 0.43,  4033
129, 132, 13,  6, 3, 1,  41, 0.33,  1471
...

$ cat homes.csv | jc --csv-s
{"Sell":"142","List":"160","Living":"28","Rooms":"10","Beds":"5","Baths":"3","Age":"60","Acres":"0.28","Taxes":"3167"}
{"Sell":"175","List":"180","Living":"18","Rooms":"8","Beds":"4","Baths":"1","Age":"12","Acres":"0.43","Taxes":"4033"}
{"Sell":"129","List":"132","Living":"13","Rooms":"6","Beds":"3","Baths":"1","Age":"41","Acres":"0.33","Taxes":"1471"}
...

v1.17.2 Updates

Fix ping parser to add Alpine linux support
Fix netstat parser for older versions of netstat on linux
Fix df parser for cases where the ‘filesystem’ field overflows the column length

v1.17.3 Updates

Update parsers to exit with error if non-string input is detected (raise TypeError)
Update streaming parsers to exit with error if non-iterable input is detected (raise TypeError)
Simplify quiet-checking in parsers
Add iostat parser tested on linux
Add iostat streaming parser tested on linux

`iostat` command parser

Linux support for the iostat command. (Documentation):

$ iostat | jc --iostat          # or: jc -p iostat
[
  {
      "percent_user": 0.15,
      "percent_nice": 0.0,
      "percent_system": 0.18,
      "percent_iowait": 0.0,
      "percent_steal": 0.0,
      "percent_idle": 99.67,
      "type": "cpu"
  },
  {
      "device": "sda",
      "tps": 0.29,
      "kb_read_s": 7.22,
      "kb_wrtn_s": 1.25,
      "kb_read": 194341,
      "kb_wrtn": 33590,
      "type": "device"
  },
  {
      "device": "dm-0",
      "tps": 0.29,
      "kb_read_s": 5.99,
      "kb_wrtn_s": 1.17,
      "kb_read": 161361,
      "kb_wrtn": 31522,
      "type": "device"
  },
  {
      "device": "dm-1",
      "tps": 0.0,
      "kb_read_s": 0.08,
      "kb_wrtn_s": 0.0,
      "kb_read": 2204,
      "kb_wrtn": 0,
      "type": "device"
  }
]

`iostat-s` streaming command parser

Linux support for the iostat command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ iostat | jc --iostat-s
{"percent_user":0.14,"percent_nice":0.0,"percent_system":0.16,"percent_iowait":0.0,"percent_steal":0.0,"percent_idle":99.7,"type":"cpu"}
{"device":"sda","tps":0.24,"kb_read_s":5.28,"kb_wrtn_s":1.1,"kb_read":203305,"kb_wrtn":42368,"type":"device"}
...

v1.17.4 Updates

Add support for the NO_COLOR environment variable to set mono (http://no-color.org/)
Add -C option to force color output even when using pipes (overrides -m and NO_COLOR)

v1.17.5 Updates

Add zipinfo parser tested on linux and macOS

`zipinfo` command parser

Linux and macOS support for the zipinfo command. (Documentation):

$ zipinfo file.zip | jc --zipinfo -p
[
  {
    "archive": "file.zip",
    "size": 4116,
    "size_unit": "bytes",
    "number_entries": 1,
    "number_files": 1,
    "bytes_uncompressed": 11837,
    "bytes_compressed": 3966,
    "percent_compressed": 66.5,
    "files": [
      {
        "flags": "-rw-r--r--",
        "zipversion": "2.1",
        "zipunder": "unx",
        "filesize": 11837,
        "type": "bX",
        "method": "defN",
        "date": "21-Dec-08",
        "time": "20:50",
        "filename": "compressed_file"
      }
    ]
  }
]

v1.17.6 Updates

Add jar-manifest file parser for MANIFEST.MF files.
Fix CSV parsers for some files that include double-quotes

`jar-manifest` file parser

Support for Java JAR Manifest files. (Documentation):

$ cat MANIFEST.MF | jc --jar-manifest -p
[
  {
    "Import_Package": "com.conversantmedia.util.concurrent;resolution:=optional,com.fasterxml.jackson.annotation;version=\"[2.12,3)\";resolution:=optional,com.fasterxml.jackson.core;version=\"[2.12,3)\";resolution:=optional,com.fasterxml.jackson.core.type;version=\"[2.12,3)\";resolution:=optional,com.fasterxml.jackson.cor...",
    "Export_Package": "org.apache.logging.log4j.core;uses:=\"org.apache.logging.log4j,org.apache.logging.log4j.core.config,org.apache.logging.log4j.core.impl,org.apache.logging.log4j.core.layout,org.apache.logging.log4j.core.time,org.apache.logging.log4j.message,org.apache.logging.log4j.spi,org.apache.logging.log4j.status...",
    "Manifest_Version": "1.0",
    "Bundle_License": "https://www.apache.org/licenses/LICENSE-2.0.txt",
    "Bundle_SymbolicName": "org.apache.logging.log4j.core",
    "Built_By": "matt",
    "Bnd_LastModified": "1639373735804",
    "Implementation_Vendor_Id": "org.apache.logging.log4j",
    "Specification_Title": "Apache Log4j Core",
    "Log4jReleaseManager": "Matt Sicker",
    ...
  }
]

v1.17.7 Updates

Add stat-s streaming parser for the stat command.

`stat-s` streaming command parser

Linux, macOS, and FreeBSD support for the stat command. This is a streaming parser and it outputs JSON Lines. (Documentation):

$ stat | jc --stat-s
{"file":"(stdin)","unix_device":1027739696,"inode":1155,"flags":"crw--w----","links":1,"user":"kbrazil","group":"tty","rdev":268435456,"size":0,"access_time":"Jan  4 15:27:44 2022","modify_time":"Jan  4 15:27:44 2022","change_time":"Jan  4 15:27:44 2022","birth_time":"Dec 31 16:00:00 1969","block_size":131072,"blocks":0,"unix_flags":"0","access_time_epoch":1641338864,"access_time_epoch_utc":null,"modify_time_epoch":1641338864,"modify_time_epoch_utc":null,"change_time_epoch":1641338864,"change_time_epoch_utc":null,"birth_time_epoch":null,"birth_time_epoch_utc":null}

Featured

Practical JSON at the Command Line (using Jello)

This is a new version of my existing article: Practical JSON at the Command Line. In this version I have substituted jello where jq was used in the previous article.

I’m a big fan of using JSON at the command line instead of filtering and piping unstructured text between processes. My article on Bringing the Unix Philosopy to the 21st Century explains many of the benefits of using JSON instead of plain text. I also created jc, which converts the output of dozens of commands and file-types to JSON, which allows many new possibilities for automation at the command line.

There are many blog posts on how to use tools like jq to filter JSON at the command line. But I would like to write about how you can actually use that JSON to make your life easier in Bash using jello, a JSON filtering tool I wrote that uses pure Python syntax.

How do you get that beautifully filtered JSON data into a usable form, such as a list or array, in Bash? What are some best practices when working with JSON data in Bash? Let’s start simple and work our way up.

In this article we will be processing the output of rpm -qia so we can get a nice list of RPM package metadata objects to play around with. We’ll use jc to convert the rpm command output to JSON so we can process it in jello and then use in our script.

We’ll look at three scenarios:

Assigning a Bash variable from a single JSON attribute
Assigning a simple list Bash variable from a JSON array
Assigning a Bash array from a JSON array of objects

Assigning a Variable from a Single Attribute

The simplest scenario is to pull a single value from the JSON data we are interested in. If we run rpm -qia | jc --rpm-qi we will get a JSON array of rpm metadata objects to work with. I’ll use the -p option in jc to pretty-print the JSON:

$ rpm -qia | jc --rpm-qi -p
[
  {
    "name": "make",
    "epoch": 1,
    "version": "3.82",
    "release": "24.el7",
    "architecture": "x86_64",
    "install_date": "Wed 16 Oct 2019 09:21:42 AM PDT",
    "group": "Development/Tools",
    "size": 1160660,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Thu 22 Aug 2019 02:34:59 PM PDT, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "make-3.82-24.el7.src.rpm",
    "build_date": "Thu 08 Aug 2019 05:47:25 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://www.gnu.org/software/make/",
    "summary": "A GNU tool which simplifies the build process for users",
    "description": "A GNU tool for controlling the generation of executables and other non-source files of a program from the program's source files. Make allows users to build and install packages without any significant knowledge about the details of the build process. The details about how the program should be built are provided for make in the program's makefile.",
    "build_epoch": 1565311645,
    "build_epoch_utc": null
  },
  {
    "name": "kbd-legacy",
    "version": "1.15.5",
    "release": "15.el7",
    "architecture": "noarch",
    "install_date": "Thu 15 Aug 2019 10:53:08 AM PDT",
    "group": "System Environment/Base",
    "size": 503608,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Mon 12 Nov 2018 07:17:49 AM PST, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "kbd-1.15.5-15.el7.src.rpm",
    "build_date": "Tue 30 Oct 2018 03:40:00 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://ftp.altlinux.org/pub/people/legion/kbd",
    "summary": "Legacy data for kbd package",
    "description": "The kbd-legacy package contains original keymaps for kbd package. Please note that kbd-legacy is not helpful without kbd.",
    "build_epoch": 1540939200,
    "build_epoch_utc": null
  },
  ...
]

Ok, that is a long JSON array of objects. Let’s narrow it down to only packages that use the MIT license with jello:

$ rpm -qia | jc --rpm-qi | jello '[p for p in _ if p.license == "MIT"]'
[
  {
    "name": "ncurses-base",
    "version": "5.9",
    "release": "14.20130511.el7_4",
    "architecture": "noarch",
    "install_date": "Thu 15 Aug 2019 10:53:08 AM PDT",
    "group": "System Environment/Base",
    "size": 223432,
    "license": "MIT",
    "signature": "RSA/SHA256, Thu 07 Sep 2017 05:43:15 AM PDT, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "ncurses-5.9-14.20130511.el7_4.src.rpm",
    "build_date": "Wed 06 Sep 2017 03:08:29 PM PDT",
    "build_host": "c1bm.rdu2.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://invisible-island.net/ncurses/ncurses.html",
    "summary": "Descriptions of common terminals",
    "description": "This package contains descriptions of common terminals. Other terminal descriptions are included in the ncurses-term package.",
    "build_epoch": 1504735709,
    "build_epoch_utc": null
  },
  {
    "name": "ncurses-libs",
    "version": "5.9",
    "release": "14.20130511.el7_4",
    "architecture": "x86_64",
    "install_date": "Thu 15 Aug 2019 10:53:16 AM PDT",
    "group": "System Environment/Libraries",
    "size": 1028216,
    "license": "MIT",
    "signature": "RSA/SHA256, Thu 07 Sep 2017 05:43:31 AM PDT, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "ncurses-5.9-14.20130511.el7_4.src.rpm",
    "build_date": "Wed 06 Sep 2017 03:08:29 PM PDT",
    "build_host": "c1bm.rdu2.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://invisible-island.net/ncurses/ncurses.html",
    "summary": "Ncurses libraries",
    "description": "The curses library routines are a terminal-independent method of updating character screens with reasonable optimization.  The ncurses (new curses) library is a freely distributable replacement for the discontinued 4.4 BSD classic curses library. This package contains the ncurses libraries.",
    "build_epoch": 1504735709,
    "build_epoch_utc": null
  },
...
]

Tip: You can use jellex to help you rapidly create your jello queries

Now the list is much smaller. In this form, this is not exactly usable in a Bash script. We’ll need to get this data into a format that Bash can use.

In this first, simple example, we just want a single attribute from a single object. So let’s filter the data to do that by filtering on the newest build_epoch date and selecting the name field:

$ rpm -qia | jc --rpm-qi | jello -r 'sorted([p for p in _ if p.license == "MIT"], key=lambda x: x.build_epoch)[-1]["name"]'
jc

Well, isn’t that convenient? jc was the last package built on the system. Notice that we use the -r option in jello to strip the quotation marks from the string result. Since that jello query spit out a single word, it’s pretty straightforward to assign it to a Bash variable:

$ package_name=$(rpm -qia | jc --rpm-qi | jello -r 'sorted([p for p in _ if p.license == "MIT"], key=lambda x: x.build_epoch)[-1]["name"]')
$ echo $package_name
jc

This is a good start if we just need a single attribute, but many times in our scripts we have multiple items we need to deal with. Assigning a single Bash variable to a JSON attribute can get tedious and slow if we need to iterate over a large dataset.

Now, let’s look at assigning more than one item to a Bash variable to use it as a list in a for loop.

Assigning a List from a JSON Array

In our next example, we’ll get a list of MIT licensed packages from our rpm -qia query and do something with the output. In this case, we’ll just create a text file for each package, using the name attribute as the filename and the contents will have some text, including the package name. First, lets see the output of the jello filter:

$ rpm -qia | jc --rpm-qi | jello -lr '[p.name for p in _ if p.license == "MIT"]'
curl
dbus-python
expat
jansson
...

And now, lets use that filter in a script by assigning it to a Bash variable that will act as a word list:

#!/bin/bash

packages=$(rpm -qia | jc --rpm-qi | jello -lr '[p.name for p in _ if p.license == "MIT"]')

for package in $packages; do
    echo "Package name is ${package}" > "${package}".txt
done

After running this script, we get a list of files named after the package names. Inside of the files is a bit of text:

$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc

That was easy enough, but remember this only works when each item is a single word and you just want to iterate over the same JSON attribute over and over again in a Bash for loop.

What if I want to include other metadata, like the description, in the text file? One way would be to create another list Bash variable from another jello query and then iterate over the list again. Or, inside the for loop, we could do another rpm -qi query and grab the attribute we want just-in-time:

#!/bin/bash

packages=$(rpm -qia | jc --rpm-qi | jello -lr '[p.name for p in _ if p.license == "MIT"]')

for package in $packages; do
    description=$(rpm -qi "${package}" | jc --rpm-qi | jello -r _[0].description)
    echo "Package name is ${package}" > "${package}".txt
    echo "The description is:  ${description}" >> "${package}".txt
done

This works:

$ ./create_files.sh 
$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc
The description is:  This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output

But it is a little inefficient since we need to run the rpm -qi [package] query many times during the script. A better method would be to do the rpm -qia query one time, which will give us all of the package data at once and then just select the attributes we want in our script. We’ll do that next!

Assigning a Bash Array from a JSON Array of Objects

In other programming languages, like python, it is pretty straightforward to load a JSON string of any depth and complexity and use it as a dictionary or list. Unfortunately, Bash does not have the same native capability, but we can do some useful things by assigning JSON objects to a Bash array.

At first glance, this seems like it should be pretty easy with a single variable assignment statement, but in fact, we’ll need to use a while loop and read lines from jello so Bash can ingest the JSON lines data into the Bash array. This way we can easily iterate through the data in a similar way we would with python.

In this example, we’ll take the filtered JSON output of the rpm -qia command, iterate over all of the objects (each object is a package) and pull the attributes we want to use in a for loop. This should be a more efficient example of the last script we created since we are only running the rpm -qia command once. First let’s just iterate and print the raw Bash array elements so we can see what it looks like:

#!/bin/bash

# pull the rpm package objects into a bash array from jello
packages=()
while read -r value; do
    packages+=("$value")
done < <(rpm -qia | jc --rpm-qi | jello -l '[p for p in _ if p.license == "MIT"]')

# iterate over the bash array
for package in "${packages[@]}"; do
    echo "${package}"
    echo
done

There are a few interesting things going on in this script:

A Bash array variable named packages is created with packages=()
A while loop reads in all of the JSON objects created by jello into the packages Bash array.
- Note: mapfile -t packages < <( ... ) can be substituted for the while loop when using Bash 4.0 and higher.
The jello command uses the -l option which prints each JSON object on a single line. (a.k.a JSON Lines) This is the magic that allows the object to be read in as a Bash array element.
Then we use a standard for loop to iterate over each package object, which contains all of the attributes we want to extract into variables.
Finally, we do something with those variables.

When we run this script, we see the following output:

$ ./print_array.sh 
{"name":"ncurses-base","version":"5.9","release":"14.20130511.el7_4","architecture":"noarch","install_date":"Thu 15 Aug 2019 10:53:08 AM PDT","group":"System Environment/Base","size":223432,"license":"MIT","signature":"RSA/SHA256, Thu 07 Sep 2017 05:43:15 AM PDT, Key ID 24c6a8a7f4a80eb5","source_rpm":"ncurses-5.9-14.20130511.el7_4.src.rpm","build_date":"Wed 06 Sep 2017 03:08:29 PM PDT","build_host":"c1bm.rdu2.centos.org","relocations":"(not relocatable)","packager":"CentOS BuildSystem <http://bugs.centos.org>","vendor":"CentOS","url":"http://invisible-island.net/ncurses/ncurses.html","summary":"Descriptions of common terminals","description":"This package contains descriptions of common terminals. Other terminal descriptions are included in the ncurses-term package.","build_epoch":1504735709,"build_epoch_utc":null}

{"name":"ncurses-libs","version":"5.9","release":"14.20130511.el7_4","architecture":"x86_64","install_date":"Thu 15 Aug 2019 10:53:16 AM PDT","group":"System Environment/Libraries","size":1028216,"license":"MIT","signature":"RSA/SHA256, Thu 07 Sep 2017 05:43:31 AM PDT, Key ID 24c6a8a7f4a80eb5","source_rpm":"ncurses-5.9-14.20130511.el7_4.src.rpm","build_date":"Wed 06 Sep 2017 03:08:29 PM PDT","build_host":"c1bm.rdu2.centos.org","relocations":"(not relocatable)","packager":"CentOS BuildSystem <http://bugs.centos.org>","vendor":"CentOS","url":"http://invisible-island.net/ncurses/ncurses.html","summary":"Ncurses libraries","description":"The curses library routines are a terminal-independent method of updating character screens with reasonable optimization.  The ncurses (new curses) library is a freely distributable replacement for the discontinued 4.4 BSD classic curses library. This package contains the ncurses libraries.","build_epoch":1504735709,"build_epoch_utc":null}
...

Very cool! Now we can use jello to pull any attribute we want into a variable within the for loop:

#!/bin/bash

# pull the rpm package objects into a bash array from jello
packages=()
while read -r value; do
    packages+=("$value")
done < <(rpm -qia | jc --rpm-qi | jello -l '[p for p in _ if p.license == "MIT"]')

# iterate over the bash array
for package in "${packages[@]}"; do
    name=$(jello -r '_.name' <<< "${package}")
    description=$(jello -r '_.description' <<< "${package}")
    version=$(jello -r '_.version' <<< "${package}")
    
    echo "Package name is ${name}" > "${name}".txt
    echo "The description is:  ${description}" >> "${name}".txt
    echo "The version is:  ${version}" >> "${name}".txt
done

And here’s what it does:

$ ./create_files.sh 
$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc
The description is:  This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output
The version is:  1.15.0

As you can see, this is more efficient and allows you to pull in any attribute you would like from each Bash array element. Each element is acting like a JSON object that jello can query.

Conclusion

We went through a few scenarios of how to assign JSON data to Bash variables and arrays with jc and jello. Using JSON instead of plain text allows you to be more expressive in your queries. Also, JSON has the advantage of allowing new fields to be added at any time without breaking your existing query.

JSON can be used by simply assigning a string word to a Bash variable, a string list of words to a variable and looping over the list, or by assigning entire JSON objects to Bash array elements, which can be further queried by jello within a loop. These are powerful ways JSON data can help you write better scripts.

If you like jello, you should check out Jello Explorer (jellex). Jello Explorer is an interactive TUI JSON filter built on jello that can help you create queries faster and easier.

Featured

Practical JSON at the Command Line

Prefer python syntax over jq? Please see a new version of this article that uses jello instead.

There are many blog posts on how to use tools like jq to filter JSON at the command line. But I would like to write about how you can actually use that JSON to make your life easier in Bash.

We’ll look at three scenarios:

Assigning a Bash variable from a single JSON attribute
Assigning a simple list Bash variable from a JSON array
Assigning a Bash array from a JSON array of objects

Assigning a Variable from a Single Attribute

$ rpm -qia | jc --rpm-qi -p
[
  {
    "name": "make",
    "epoch": 1,
    "version": "3.82",
    "release": "24.el7",
    "architecture": "x86_64",
    "install_date": "Wed 16 Oct 2019 09:21:42 AM PDT",
    "group": "Development/Tools",
    "size": 1160660,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Thu 22 Aug 2019 02:34:59 PM PDT, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "make-3.82-24.el7.src.rpm",
    "build_date": "Thu 08 Aug 2019 05:47:25 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://www.gnu.org/software/make/",
    "summary": "A GNU tool which simplifies the build process for users",
    "description": "A GNU tool for controlling the generation of executables and other non-source files of a program from the program's source files. Make allows users to build and install packages without any significant knowledge about the details of the build process. The details about how the program should be built are provided for make in the program's makefile.",
    "build_epoch": 1565311645,
    "build_epoch_utc": null
  },
  {
    "name": "kbd-legacy",
    "version": "1.15.5",
    "release": "15.el7",
    "architecture": "noarch",
    "install_date": "Thu 15 Aug 2019 10:53:08 AM PDT",
    "group": "System Environment/Base",
    "size": 503608,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Mon 12 Nov 2018 07:17:49 AM PST, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "kbd-1.15.5-15.el7.src.rpm",
    "build_date": "Tue 30 Oct 2018 03:40:00 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://ftp.altlinux.org/pub/people/legion/kbd",
    "summary": "Legacy data for kbd package",
    "description": "The kbd-legacy package contains original keymaps for kbd package. Please note that kbd-legacy is not helpful without kbd.",
    "build_epoch": 1540939200,
    "build_epoch_utc": null
  },
  ...
]

Ok, that is a long JSON array of objects. Let’s narrow it down to only packages that use the MIT license with jq:

$ rpm -qia | jc --rpm-qi | jq '.[] | select(.license == "MIT")'
{
  "name": "ncurses-base",
  "version": "5.9",
  "release": "14.20130511.el7_4",
  "architecture": "noarch",
  "install_date": "Thu 15 Aug 2019 10:53:08 AM PDT",
  "group": "System Environment/Base",
  "size": 223432,
  "license": "MIT",
  "signature": "RSA/SHA256, Thu 07 Sep 2017 05:43:15 AM PDT, Key ID 24c6a8a7f4a80eb5",
  "source_rpm": "ncurses-5.9-14.20130511.el7_4.src.rpm",
  "build_date": "Wed 06 Sep 2017 03:08:29 PM PDT",
  "build_host": "c1bm.rdu2.centos.org",
  "relocations": "(not relocatable)",
  "packager": "CentOS BuildSystem <http://bugs.centos.org>",
  "vendor": "CentOS",
  "url": "http://invisible-island.net/ncurses/ncurses.html",
  "summary": "Descriptions of common terminals",
  "description": "This package contains descriptions of common terminals. Other terminal descriptions are included in the ncurses-term package.",
  "build_epoch": 1504735709,
  "build_epoch_utc": null
}
{
  "name": "ncurses-libs",
  "version": "5.9",
  "release": "14.20130511.el7_4",
  "architecture": "x86_64",
  "install_date": "Thu 15 Aug 2019 10:53:16 AM PDT",
  "group": "System Environment/Libraries",
  "size": 1028216,
  "license": "MIT",
  "signature": "RSA/SHA256, Thu 07 Sep 2017 05:43:31 AM PDT, Key ID 24c6a8a7f4a80eb5",
  "source_rpm": "ncurses-5.9-14.20130511.el7_4.src.rpm",
  "build_date": "Wed 06 Sep 2017 03:08:29 PM PDT",
  "build_host": "c1bm.rdu2.centos.org",
  "relocations": "(not relocatable)",
  "packager": "CentOS BuildSystem <http://bugs.centos.org>",
  "vendor": "CentOS",
  "url": "http://invisible-island.net/ncurses/ncurses.html",
  "summary": "Ncurses libraries",
  "description": "The curses library routines are a terminal-independent method of updating character screens with reasonable optimization.  The ncurses (new curses) library is a freely distributable replacement for the discontinued 4.4 BSD classic curses library. This package contains the ncurses libraries.",
  "build_epoch": 1504735709,
  "build_epoch_utc": null
}
...

Now the list is much smaller. Also, notice that jq unpacked the JSON objects from the array for us. (There is no-longer a set of square brackets around the output). In this form, this is not exactly usable in a Bash script. In fact, this is no longer even a single valid JSON object, but a series of smaller JSON objects. We’ll need to get this data into a format that Bash can use.

$ rpm -qia | jc --rpm-qi | jq 'sort_by(.build_epoch)[] | select(.license == "MIT")' | jq -sr '.[-1].name'
jc

The particulars of the jq query itself are outside the scope of this article. For more information on how to properly structure a jq query, see here, here, and here.

Not a fan of jq syntax? Already know how to work with JSON in Python? Try out jello, which works just like jq, but uses Python syntax!

Well, isn’t that convenient? jc was the last package built on the system. Notice that we use the -r option in jq to strip the quotation marks from the string result. Since that jq query spit out a single word, it’s pretty straightforward to assign it to a Bash variable:

$ package_name=$(rpm -qia | jc --rpm-qi | jq 'sort_by(.build_epoch)[] | select(.license == "MIT")' | jq -sr '.[-1].name')
$ echo $package_name
jc

Now, let’s look at assigning more than one item to a Bash variable to use it as a list in a for loop.

Assigning a List from a JSON Array

$ rpm -qia | jc --rpm-qi | jq -r '.[] | select(.license == "MIT") | .name'
curl
dbus-python
expat
jansson
...

And now, lets use that filter in a script by assigning it to a Bash variable that will act as a word list:

#!/bin/bash

packages=$(rpm -qia | jc --rpm-qi | jq -r '.[] | select(.license == "MIT") | .name')

for package in $packages; do
    echo "Package name is ${package}" > "${package}".txt
done

After running this script, we get a list of files named after the package names. Inside of the files is a bit of text:

$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc

That was easy enough, but remember this only works when each item is a single word and you just want to iterate over the same JSON attribute over and over again in a Bash for loop.

What if I want to include other metadata, like the description, in the text file? One way would be to create another list Bash variable from another jq query and then iterate over the list again. Or, inside the for loop, we could do another rpm -qi query and grab the attribute we want just-in-time:

#!/bin/bash

packages=$(rpm -qia | jc --rpm-qi | jq -r '.[] | select(.license == "MIT") | .name')

for package in $packages; do
    description=$(rpm -qi "${package}" | jc --rpm-qi | jq -r .[0].description)
    echo "Package name is ${package}" > "${package}".txt
    echo "The description is:  ${description}" >> "${package}".txt
done

This works:

$ ./create_files.sh 
$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc
The description is:  This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output

Assigning a Bash Array from a JSON Array of Objects

At first glance, this seems like it should be pretty easy with a single variable assignment statement, but in fact, we’ll need to use a while loop and read lines from jq so Bash can ingest the JSON lines data into the Bash array. This way we can easily iterate through the data in a similar way we would with python.

#!/bin/bash

# pull the rpm package objects into a bash array from jq
packages=()
while read -r value; do
    packages+=("$value")
done < <(rpm -qia | jc --rpm-qi | jq -c '.[] | select(.license == "MIT")')

# iterate over the bash array
for package in "${packages[@]}"; do
    echo "${package}"
    echo
done

There are a few interesting things going on in this script:

A Bash array variable named packages is created with packages=()
A while loop reads in all of the JSON objects created by jq into the packages Bash array.
- Note: mapfile -t packages < <( ... ) can be substituted for the while loop when using Bash 4.0 and higher.
The jq command uses the -c option which prints each JSON object on a single line. This is the magic that allows the object to be read in as a Bash array element.
Then we use a standard for loop to iterate over each package object, which contains all of the attributes we want to extract into variables.
Finally, we do something with those variables.

When we run this script, we see the following output:

$ ./print_array.sh 
{"name":"ncurses-base","version":"5.9","release":"14.20130511.el7_4","architecture":"noarch","install_date":"Thu 15 Aug 2019 10:53:08 AM PDT","group":"System Environment/Base","size":223432,"license":"MIT","signature":"RSA/SHA256, Thu 07 Sep 2017 05:43:15 AM PDT, Key ID 24c6a8a7f4a80eb5","source_rpm":"ncurses-5.9-14.20130511.el7_4.src.rpm","build_date":"Wed 06 Sep 2017 03:08:29 PM PDT","build_host":"c1bm.rdu2.centos.org","relocations":"(not relocatable)","packager":"CentOS BuildSystem <http://bugs.centos.org>","vendor":"CentOS","url":"http://invisible-island.net/ncurses/ncurses.html","summary":"Descriptions of common terminals","description":"This package contains descriptions of common terminals. Other terminal descriptions are included in the ncurses-term package.","build_epoch":1504735709,"build_epoch_utc":null}

{"name":"ncurses-libs","version":"5.9","release":"14.20130511.el7_4","architecture":"x86_64","install_date":"Thu 15 Aug 2019 10:53:16 AM PDT","group":"System Environment/Libraries","size":1028216,"license":"MIT","signature":"RSA/SHA256, Thu 07 Sep 2017 05:43:31 AM PDT, Key ID 24c6a8a7f4a80eb5","source_rpm":"ncurses-5.9-14.20130511.el7_4.src.rpm","build_date":"Wed 06 Sep 2017 03:08:29 PM PDT","build_host":"c1bm.rdu2.centos.org","relocations":"(not relocatable)","packager":"CentOS BuildSystem <http://bugs.centos.org>","vendor":"CentOS","url":"http://invisible-island.net/ncurses/ncurses.html","summary":"Ncurses libraries","description":"The curses library routines are a terminal-independent method of updating character screens with reasonable optimization.  The ncurses (new curses) library is a freely distributable replacement for the discontinued 4.4 BSD classic curses library. This package contains the ncurses libraries.","build_epoch":1504735709,"build_epoch_utc":null}
...

Very cool! Now we can use jq to pull any attribute we want into a variable within the for loop:

#!/bin/bash

# pull the rpm package objects into a bash array from jq
packages=()
while read -r value; do
    packages+=("$value")
done < <(rpm -qia | jc --rpm-qi | jq -c '.[] | select(.license == "MIT")')

# iterate over the bash array
for package in "${packages[@]}"; do
    name=$(jq -r '.name' <<< "${package}")
    description=$(jq -r '.description' <<< "${package}")
    version=$(jq -r '.version' <<< "${package}")
    
    echo "Package name is ${name}" > "${name}".txt
    echo "The description is:  ${description}" >> "${name}".txt
    echo "The version is:  ${version}" >> "${name}".txt
done

And here’s what it does:

$ ./create_files.sh 
$ ls
create_files.sh  jc.txt                libcom_err.txt   libpciaccess.txt    libyaml.txt       popt.txt
curl.txt         json-c.txt            libcurl.txt      libss.txt           lua.txt           python-iniparse.txt
dbus-python.txt  krb5-devel.txt        libdrm.txt       libverto-devel.txt  ncurses-base.txt  python-pytoml.txt
expat.txt        krb5-libs.txt         libfastjson.txt  libverto.txt        ncurses-libs.txt  PyYAML.txt
jansson.txt      libcom_err-devel.txt  libkadm5.txt     libxml2.txt         ncurses.txt       rubygem-psych.txt
$ cat jc.txt 
Package name is jc
The description is:  This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output
The version is:  1.15.0

As you can see, this is more efficient and allows you to pull in any attribute you would like from each Bash array element. Each element is acting like a JSON object that jq can query.

Conclusion

We went through a few scenarios of how to assign JSON data to Bash variables and arrays with jc and jq. Using JSON instead of plain text allows you to be more expressive in your queries. Also, JSON has the advantage of allowing new fields to be added at any time without breaking your existing query.

JSON can be used by simply assigning a string word to a Bash variable, a string list of words to a variable and looping over the list, or by assigning entire JSON objects to Bash array elements, which can be further queried by jq within a loop. These are powerful ways JSON data can help you write better scripts.

Featured

JC Version 1.15.0 Released

Try the jc web demo!

jc is now available as an MSI install package for Windows.

I’m excited to announce the release of jc version 1.15.0 available on github and pypi. This is a significant release that includes dozens of new features and parsers.

jc now supports over 70 commands and file-types, including the new acpi, upower, /usr/bin/time, dpkg -l, rpm -qi, finger, and dir command parsers. Several existing parsers have been updated to include calculated time fields for convenience. These include date, uptime, stat, timedatectl, who, dig, and ls.

The CLI experience has been enhanced with new -h help and -v version options. External library dependencies are now optional, so jc will work just fine without them, albeit with limited functionality. JSON output is now more compact so less data is being piped between programs and unencoded unicode characters are now supported in JSON strings.

To upgrade with pip:

$ pip3 install --upgrade jc

Sections

New Features

-h option displays help and the parser list. jc no longer displays the help text on error. Now that there are so many parsers, the -h option prints to STDOUT so the output can be piped to more or less for paging.
-v option displays the version, github site, and copyright information
New calculated epoch timestamp fields have been added to several parsers, including date, stat, timedatectl, who, dig, and ls. These fields are also available in many of the new parsers, including upower, rpm -qi, and dir.

All timestamps are naive (i.e. based on the timezone of the machine the parser is running on) unless the UTC timezone can be detected within the text of the command output. If the UTC timezone is detected, a timezone-aware timestamp is created. All aware timestamps have the suffix ‘_utc‘. No other timezones are supported for aware timestamps.
Several calculated time fields have been added to the uptime parser.
All external library dependencies, including pygments, ruamel.yaml, and xmltodict are now optional. If a dependency is missing, jc will still run, but will have limited functionality. For example, if the pygments library is not installed, then all JSON output will be monochrome. If the ruamel.yaml or xmltodict libraries are not installed, then the --yaml or --xml parsers, respectively, will not run.
JSON output is more compact, with all spaces between delimiters removed, unless the -p option is used to pretty-print the JSON output. This reduces the amount of data that needs to be piped between programs and can save some disk space if JSON output is being stored to disk.
Unencoded unicode characters are now printed in JSON strings. These types of characters include the Copyright ‘©’ symbol and many others.

New Parsers

jc now supports 70 parsers. New parsers include acpi, upower, /usr/bin/time, dpkg -l, rpm -qi, finger, and dir. The dir parser is the first Windows command parser to be included in jc!

Documentation and schemas for all parsers can be found here.

`acpi` command parser

Linux support for the acpi command. (Documentation):

$ acpi -V | jc --acpi -p          # or:  jc -p acpi -V
[
  {
    "type": "Battery",
    "id": 0,
    "state": "Charging",
    "charge_percent": 71,
    "until_charged": "00:29:20",
    "design_capacity_mah": 2110,
    "last_full_capacity": 2271,
    "last_full_capacity_percent": 100,
    "until_charged_hours": 0,
    "until_charged_minutes": 29,
    "until_charged_seconds": 20,
    "until_charged_total_seconds": 1760
  },
  {
    "type": "Adapter",
    "id": 0,
    "on-line": true
  },
  {
    "type": "Thermal",
    "id": 0,
    "mode": "ok",
    "temperature": 46.0,
    "temperature_unit": "C",
    "trip_points": [
      {
        "id": 0,
        "switches_to_mode": "critical",
        "temperature": 127.0,
        "temperature_unit": "C"
      },
      {
        "id": 1,
        "switches_to_mode": "hot",
        "temperature": 127.0,
        "temperature_unit": "C"
      }
    ]
  },
  {
    "type": "Cooling",
    "id": 0,
    "messages": [
      "Processor 0 of 10"
    ]
  },
  {
    "type": "Cooling",
    "id": 1,
    "messages": [
      "Processor 0 of 10"
    ]
  },
  {
    "type": "Cooling",
    "id": 2,
    "messages": [
      "x86_pkg_temp no state information available"
    ]
  },
  {
    "type": "Cooling",
    "id": 3,
    "messages": [
      "Processor 0 of 10"
    ]
  },
  {
    "type": "Cooling",
    "id": 4,
    "messages": [
      "intel_powerclamp no state information available"
    ]
  },
  {
    "type": "Cooling",
    "id": 5,
    "messages": [
      "Processor 0 of 10"
    ]
  }
]

`upower` command parser

Linux support for the upower command. (Documentation):

$ upower -i /org/freedesktop/UPower/devices/battery | jc --upower -p          # or jc -p upower -i /org/freedesktop/UPower/devices/battery
[
  {
    "native_path": "/sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT0",
    "vendor": "NOTEBOOK",
    "model": "BAT",
    "serial": "0001",
    "power_supply": true,
    "updated": "Thu 11 Mar 2021 06:28:08 PM UTC",
    "has_history": true,
    "has_statistics": true,
    "detail": {
      "type": "battery",
      "present": true,
      "rechargeable": true,
      "state": "charging",
      "energy": 22.3998,
      "energy_empty": 0.0,
      "energy_full": 52.6473,
      "energy_full_design": 62.16,
      "energy_rate": 31.6905,
      "voltage": 12.191,
      "time_to_full": 57.3,
      "percentage": 42.5469,
      "capacity": 84.6964,
      "technology": "lithium-ion",
      "energy_unit": "Wh",
      "energy_empty_unit": "Wh",
      "energy_full_unit": "Wh",
      "energy_full_design_unit": "Wh",
      "energy_rate_unit": "W",
      "voltage_unit": "V",
      "time_to_full_unit": "minutes"
    },
    "history_charge": [
      {
        "time": 1328809335,
        "percent_charged": 42.547,
        "status": "charging"
      },
      {
        "time": 1328809305,
        "percent_charged": 42.02,
        "status": "charging"
      }
    ],
    "history_rate": [
      {
        "time": 1328809335,
        "percent_charged": 31.691,
        "status": "charging"
      }
    ],
    "updated_seconds_ago": 441975,
    "updated_epoch": 1615516088,
    "updated_epoch_utc": 1615487288
  }
]

`/usr/bin/time` command parser

Linux, macOS, and BSD support for the /usr/bin/time command. (Documentation):

$ /usr/bin/time --verbose -o timefile.out sleep 2.5; cat timefile.out | jc --time -p
{
  "command_being_timed": "sleep 2.5",
  "user_time": 0.0,
  "system_time": 0.0,
  "cpu_percent": 0,
  "elapsed_time": "0:02.50",
  "average_shared_text_size": 0,
  "average_unshared_data_size": 0,
  "average_stack_size": 0,
  "average_total_size": 0,
  "maximum_resident_set_size": 2084,
  "average_resident_set_size": 0,
  "major_pagefaults": 0,
  "minor_pagefaults": 72,
  "voluntary_context_switches": 2,
  "involuntary_context_switches": 1,
  "swaps": 0,
  "block_input_operations": 0,
  "block_output_operations": 0,
  "messages_sent": 0,
  "messages_received": 0,
  "signals_delivered": 0,
  "page_size": 4096,
  "exit_status": 0,
  "elapsed_time_hours": 0,
  "elapsed_time_minutes": 0,
  "elapsed_time_seconds": 2,
  "elapsed_time_centiseconds": 50,
  "elapsed_time_total_seconds": 2.5
}

`dpkg -l` command parser

Linux support for the dpkg -l command. (Documentation):

$ dpkg -l | jc --dpkg-l -p          # or:  jc -p dpkg -l
[
  {
    "codes": "ii",
    "name": "accountsservice",
    "version": "0.6.45-1ubuntu1.3",
    "architecture": "amd64",
    "description": "query and manipulate user account information",
    "desired": "install",
    "status": "installed"
  },
  {
    "codes": "rc",
    "name": "acl",
    "version": "2.2.52-3build1",
    "architecture": "amd64",
    "description": "Access control list utilities",
    "desired": "remove",
    "status": "config-files"
  },
  {
    "codes": "uWR",
    "name": "acpi",
    "version": "1.7-1.1",
    "architecture": "amd64",
    "description": "displays information on ACPI devices",
    "desired": "unknown",
    "status": "trigger await",
    "error": "reinstall required"
  },
  {
    "codes": "rh",
    "name": "acpid",
    "version": "1:2.0.28-1ubuntu1",
    "architecture": "amd64",
    "description": "Advanced Configuration and Power Interface event daemon",
    "desired": "remove",
    "status": "half installed"
  },
  {
    "codes": "pn",
    "name": "adduser",
    "version": "3.116ubuntu1",
    "architecture": "all",
    "description": "add and remove users and groups",
    "desired": "purge",
    "status": "not installed"
  }
]

`rpm -qi` command parser

Linux support for the rpm -qi command. (Documentation):

$ rpm_qia | jc --rpm_qi -p          # or:  jc -p rpm -qia
[
  {
    "name": "make",
    "epoch": 1,
    "version": "3.82",
    "release": "24.el7",
    "architecture": "x86_64",
    "install_date": "Wed 16 Oct 2019 09:21:42 AM PDT",
    "group": "Development/Tools",
    "size": 1160660,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Thu 22 Aug 2019 02:34:59 PM PDT, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "make-3.82-24.el7.src.rpm",
    "build_date": "Thu 08 Aug 2019 05:47:25 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://www.gnu.org/software/make/",
    "summary": "A GNU tool which simplifies the build process for users",
    "description": "A GNU tool for controlling the generation of executables and other non-source...",
    "build_epoch": 1565311645,
    "build_epoch_utc": null
  },
  {
    "name": "kbd-legacy",
    "version": "1.15.5",
    "release": "15.el7",
    "architecture": "noarch",
    "install_date": "Thu 15 Aug 2019 10:53:08 AM PDT",
    "group": "System Environment/Base",
    "size": 503608,
    "license": "GPLv2+",
    "signature": "RSA/SHA256, Mon 12 Nov 2018 07:17:49 AM PST, Key ID 24c6a8a7f4a80eb5",
    "source_rpm": "kbd-1.15.5-15.el7.src.rpm",
    "build_date": "Tue 30 Oct 2018 03:40:00 PM PDT",
    "build_host": "x86-01.bsys.centos.org",
    "relocations": "(not relocatable)",
    "packager": "CentOS BuildSystem <http://bugs.centos.org>",
    "vendor": "CentOS",
    "url": "http://ftp.altlinux.org/pub/people/legion/kbd",
    "summary": "Legacy data for kbd package",
    "description": "The kbd-legacy package contains original keymaps for kbd package. Please note...",
    "build_epoch": 1540939200,
    "build_epoch_utc": null
  }
]

`finger` command parser

Linux, macOS, and BSD support for the finger command. (Documentation):

$ finger | jc --finger -p          # or:  jc -p finger
[
  {
    "login": "jdoe",
    "name": "John Doe",
    "tty": "tty1",
    "idle": "14d",
    "login_time": "Mar 22 21:14",
    "tty_writeable": false,
    "idle_minutes": 0,
    "idle_hours": 0,
    "idle_days": 14,
    "total_idle_minutes": 20160
  },
  {
    "login": "jdoe",
    "name": "John Doe",
    "tty": "pts/0",
    "idle": null,
    "login_time": "Apr  5 15:33",
    "details": "(192.168.1.22)",
    "tty_writeable": true,
    "idle_minutes": 0,
    "idle_hours": 0,
    "idle_days": 0,
    "total_idle_minutes": 0
  }
]

`dir` command parser

Windows support for the dir command – written by Rasheed Elsaleh. (Documentation):

C:> dir | jc --dir -p          # or:  jc -p dir
[
  {
    "date": "03/24/2021",
    "time": "03:15 PM",
    "dir": true,
    "size": null,
    "filename": ".",
    "parent": "C:\\Program Files\\Internet Explorer",
    "epoch": 1616624100
  },
  {
    "date": "03/24/2021",
    "time": "03:15 PM",
    "dir": true,
    "size": null,
    "filename": "..",
    "parent": "C:\\Program Files\\Internet Explorer",
    "epoch": 1616624100
  },
  {
    "date": "12/07/2019",
    "time": "02:49 AM",
    "dir": true,
    "size": null,
    "filename": "en-US",
    "parent": "C:\\Program Files\\Internet Explorer",
    "epoch": 1575715740
  },
  {
    "date": "12/07/2019",
    "time": "02:09 AM",
    "dir": false,
    "size": 54784,
    "filename": "ExtExport.exe",
    "parent": "C:\\Program Files\\Internet Explorer",
    "epoch": 1575713340
  }
]

Updated Parsers

Several parsers have been updated to include calculated epoch timestamp fields, including: date, stat, timedatectl, who, dig, and ls. See the Schema Changes section for more details.
The uptime parser has been enhanced with additional calculated time fields. See the Schema Changes section for more details.

Schema Changes

`date` command parser

The date command parser has been completely rewritten and enhanced with several new fields, including: epoch, epoch_utc, hour_24, utc_offset, day_of_year, week_of_year, iso, and timezone_aware. The weekday_num field has also been updated to conform to ISO 8601 compliant numbering. (Documentation)

$ date | jc --date -p          # or:  jc -p date
{
  "year": 2021,
  "month": "Mar",
  "month_num": 3,
  "day": 25,
  "weekday": "Thu",
  "weekday_num": 4,
  "hour": 2,
  "hour_24": 2,
  "minute": 2,
  "second": 26,
  "period": "AM",
  "timezone": "UTC",
  "utc_offset": "+0000",
  "day_of_year": 84,
  "week_of_year": 12,
  "iso": "2021-03-25T02:02:26+00:00",
  "epoch": 1616662946,
  "epoch_utc": 1616637746,
  "timezone_aware": true
}

`stat` command parser

The stat parser has been updated to add the following fields: access_time_epoch, access_time_epoch_utc, modify_time_epoch, modify_time_epoch_utc, change_time_epoch, change_time_epoch_utc, birth_time_epoch, birth_time_epoch_utc. (Documentation)

$ stat /bin/* | jc --stat -p          # or:  jc -p stat /bin/*
[
  {
    "file": "/bin/bash",
    "size": 1113504,
    "blocks": 2176,
    "io_blocks": 4096,
    "type": "regular file",
    "device": "802h/2050d",
    "inode": 131099,
    "links": 1,
    "access": "0755",
    "flags": "-rwxr-xr-x",
    "uid": 0,
    "user": "root",
    "gid": 0,
    "group": "root",
    "access_time": "2019-11-14 08:18:03.509681766 +0000",
    "modify_time": "2019-06-06 22:28:15.000000000 +0000",
    "change_time": "2019-08-12 17:21:29.521945390 +0000",
    "birth_time": null,
    "access_time_epoch": 1573748283,
    "access_time_epoch_utc": 1573719483,
    "modify_time_epoch": 1559885295,
    "modify_time_epoch_utc": 1559860095,
    "change_time_epoch": 1565655689,
    "change_time_epoch_utc": 1565630489,
    "birth_time_epoch": null,
    "birth_time_epoch_utc": null
  },
  {
    "file": "/bin/btrfs",
    "size": 716464,
    "blocks": 1400,
    "io_blocks": 4096,
    "type": "regular file",
    "device": "802h/2050d",
    "inode": 131100,
    "links": 1,
    "access": "0755",
    "flags": "-rwxr-xr-x",
    "uid": 0,
    "user": "root",
    "gid": 0,
    "group": "root",
    "access_time": "2019-11-14 08:18:28.990834276 +0000",
    "modify_time": "2018-03-12 23:04:27.000000000 +0000",
    "change_time": "2019-08-12 17:21:29.545944399 +0000",
    "birth_time": null,
    "access_time_epoch": 1573748308,
    "access_time_epoch_utc": 1573719508,
    "modify_time_epoch": 1520921067,
    "modify_time_epoch_utc": 1520895867,
    "change_time_epoch": 1565655689,
    "change_time_epoch_utc": 1565630489,
    "birth_time_epoch": null,
    "birth_time_epoch_utc": null
  }
]

`timedatectl` command parser

The epoch_utc field has been added to the timedatectl command parser. (Documentation)

timedatectl | jc --timedatectl -p          # or: jc -p timedatectl
{
  "local_time": "Tue 2020-03-10 17:53:21 PDT",
  "universal_time": "Wed 2020-03-11 00:53:21 UTC",
  "rtc_time": "Wed 2020-03-11 00:53:21",
  "time_zone": "America/Los_Angeles (PDT, -0700)",
  "ntp_enabled": true,
  "ntp_synchronized": true,
  "rtc_in_local_tz": false,
  "dst_active": true,
  "epoch_utc": 1583888001
}

`who` command parser

The epoch field has been added to the who command parser. (Documentation)

$ who | jc --who -p          # or:  jc -p who
[
  {
    "user": "joeuser",
    "tty": "ttyS0",
    "time": "2020-03-02 02:52",
    "epoch": 1583146320
  },
  {
    "user": "joeuser",
    "tty": "pts/0",
    "time": "2020-03-02 05:15",
    "from": "192.168.71.1",
    "epoch": 1583154900
  }
]

`dig` command parser

The when_epoch and when_epoch_utc fields have been added to the dig command parser. (Documentation)

$ dig cnn.com www.cnn.com @205.251.194.64 | jc --dig -p          # or:  jc -p dig cnn.com www.cnn.com @205.251.194.64
[
  {
    "id": 52172,
    "opcode": "QUERY",
    "status": "NOERROR",
    "flags": [
      "qr",
      "rd",
      "ra"
    ],
    "query_num": 1,
    "answer_num": 4,
    "authority_num": 0,
    "additional_num": 1,
    "question": {
      "name": "cnn.com.",
      "class": "IN",
      "type": "A"
    },
    "answer": [
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "A",
        "ttl": 27,
        "data": "151.101.65.67"
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "A",
        "ttl": 27,
        "data": "151.101.129.67"
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "A",
        "ttl": 27,
        "data": "151.101.1.67"
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "A",
        "ttl": 27,
        "data": "151.101.193.67"
      }
    ],
    "query_time": 38,
    "server": "2600",
    "when": "Tue Mar 30 20:07:59 PDT 2021",
    "rcvd": 100,
    "when_epoch": 1617160079,
    "when_epoch_utc": null
  },
  {
    "id": 36292,
    "opcode": "QUERY",
    "status": "NOERROR",
    "flags": [
      "qr",
      "aa",
      "rd"
    ],
    "query_num": 1,
    "answer_num": 1,
    "authority_num": 4,
    "additional_num": 1,
    "question": {
      "name": "www.cnn.com.",
      "class": "IN",
      "type": "A"
    },
    "answer": [
      {
        "name": "www.cnn.com.",
        "class": "IN",
        "type": "CNAME",
        "ttl": 300,
        "data": "turner-tls.map.fastly.net."
      }
    ],
    "authority": [
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "NS",
        "ttl": 3600,
        "data": "ns-1086.awsdns-07.org."
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "NS",
        "ttl": 3600,
        "data": "ns-1630.awsdns-11.co.uk."
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "NS",
        "ttl": 3600,
        "data": "ns-47.awsdns-05.com."
      },
      {
        "name": "cnn.com.",
        "class": "IN",
        "type": "NS",
        "ttl": 3600,
        "data": "ns-576.awsdns-08.net."
      }
    ],
    "query_time": 27,
    "server": "205.251.194.64#53(205.251.194.64)",
    "when": "Tue Mar 30 20:07:59 PDT 2021",
    "rcvd": 212,
    "when_epoch": 1617160079,
    "when_epoch_utc": null
  }
]

`ls` command parser

The epoch and epoch_utc fields have been added to the ls command parser. Note, that these fields are only available if the --full-time or -l --time-style=full-iso options are used when running ls. (Documentation)

$ ls --full-time /usr/bin | jc --ls -p          # or:  jc -p ls --full-time /usr/bin
[
  {
    "filename": "acpi",
    "flags": "-rwxr-xr-x",
    "links": 1,
    "owner": "root",
    "group": "root",
    "size": 23656,
    "date": "2018-01-14 19:20:21.000000000 -0800",
    "epoch": 1515986421,
    "epoch_utc": null
  },
  {
    "filename": "acpi_listen",
    "flags": "-rwxr-xr-x",
    "links": 1,
    "owner": "root",
    "group": "root",
    "size": 14608,
    "date": "2017-04-27 21:28:10.000000000 -0700",
    "epoch": 1493353690,
    "epoch_utc": null
  }
]

`uptime` command parser

Several calculated time fields have been added to the uptime command parser, including: uptime_days, uptime_hours, uptime_minutes, uptime_total_seconds, time_hour, time_minute, and time_second. (Documentation)

$ uptime | jc --uptime -p          # or:  jc -p uptime
{
  "time": "11:35",
  "uptime": "3 days, 4:03",
  "users": 5,
  "load_1m": 1.88,
  "load_5m": 2.0,
  "load_15m": 1.94,
  "time_hour": 11,
  "time_minute": 35,
  "time_second": null,
  "uptime_days": 3,
  "uptime_hours": 4,
  "uptime_minutes": 3,
  "uptime_total_seconds": 273780
}

Full Parser List

acpi
airport -I
airport -s
arp
blkid
cksum
crontab
crontab (with user info)
csv
date
df
dig
dir
dmidecode
dpkg -l
du
env
file
finger
free
fstab
group
gshadow
hash
hashsum (various hash sum programs: md5, md5sum, shasum, etc.)
hciconfig
history
hosts
id
ifconfig
ini
iptables
iw_scan
jobs
kv
last
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
passwd
ping
pip list
pip show
ps
route
rpm -qi
shadow
ss
stat
sysctl
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
time (/usr/bin/time)
timedatectl
tracepath
traceroute
uname -a
upower
uptime
w
wc
who
xml
yaml

Version 1.15.1 Updates

New feature to show parser documentation interactively with -h --parser_name. For example: $ jc -h --arp
Man page added to pypi package for easier packaging in homebrew
Update rpm-qi parser to add two calculated timestamp fields: install_date_epoch and install_date_epoch_utc
Clean up documentation and autogenerate the Parser Information section from metadata

Schema Changes

The rpm-qi parser has been updated to add two calculated timestamp fields: install_date_epoch (naive) and install_date_epoch_utc (timezone-aware).

$ rpm -qia | jc --rpm-qi -p
    [
      {
        "name": "make",
        "epoch": 1,
        "version": "3.82",
        "release": "24.el7",
        "architecture": "x86_64",
        "install_date": "Wed 16 Oct 2019 09:21:42 AM PDT",
        "group": "Development/Tools",
        "size": 1160660,
        "license": "GPLv2+",
        "signature": "RSA/SHA256, Thu 22 Aug 2019 02:34:59 PM PDT, Key ID 24c6a8a7f4a80eb5",
        "source_rpm": "make-3.82-24.el7.src.rpm",
        "build_date": "Thu 08 Aug 2019 05:47:25 PM PDT",
        "build_host": "x86-01.bsys.centos.org",
        "relocations": "(not relocatable)",
        "packager": "CentOS BuildSystem <http://bugs.centos.org>",
        "vendor": "CentOS",
        "url": "http://www.gnu.org/software/make/",
        "summary": "A GNU tool which simplifies the build process for users",
        "description": "A GNU tool for controlling the generation of executables and other...",
        "build_epoch": 1565311645,
        "build_epoch_utc": null,
        "install_date_epoch": 1571242902,
        "install_date_epoch_utc": null
      }
    ]

Version 1.15.2 Updates

Add systeminfo parser tested on Windows
Update dig parser to fix an issue with IPv6 addresses in the server field
Update dig parser to fix an issue when axfr entries contain a semicolon
Update dig parser to add support for “Additional Section” and “Opt Pseudosection”
Update dig parser to add query_size field
Use dig parser as the main example in readme, documentation, and man page
Standardize int, float, and boolean conversion rules with functions in jc.utils

New Parsers

`systeminfo` command parser (Windows)

Windows support for the systeminfo command – written by Jon Smith. (Documentation):

$ systeminfo | jc --systeminfo -p
    {
      "host_name": "TESTLAPTOP",
      "os_name": "Microsoft Windows 10 Enterprise",
      "os_version": "10.0.17134 N/A Build 17134",
      "os_manufacturer": "Microsoft Corporation",
      "os_configuration": "Member Workstation",
      "os_build_type": "Multiprocessor Free",
      "registered_owner": "Test, Inc.",
      "registered_organization": "Test, Inc.",
      "product_id": "11111-11111-11111-AA111",
      "original_install_date": "3/26/2019, 3:51:30 PM",
      "system_boot_time": "3/30/2021, 6:13:59 AM",
      "system_manufacturer": "Dell Inc.",
      "system_model": "Precision 5530",
      "system_type": "x64-based PC",
      "processors": [
        "Intel64 Family 6 Model 158 Stepping 10 GenuineIntel ~2592 Mhz"
      ],
      "bios_version": "Dell Inc. 1.16.2, 4/21/2020",
      "windows_directory": "C:\\WINDOWS",
      "system_directory": "C:\\WINDOWS\\system32",
      "boot_device": "\\Device\\HarddiskVolume2",
      "system_locale": "en-us;English (United States)",
      "input_locale": "en-us;English (United States)",
      "time_zone": "(UTC+00:00) UTC",
      "total_physical_memory_mb": 32503,
      "available_physical_memory_mb": 19743,
      "virtual_memory_max_size_mb": 37367,
      "virtual_memory_available_mb": 22266,
      "virtual_memory_in_use_mb": 15101,
      "page_file_locations": "C:\\pagefile.sys",
      "domain": "test.com",
      "logon_server": "\\\\TESTDC01",
      "hotfixs": [
        "KB2693643",
        "KB4601054"
      ],
      "network_cards": [
        {
          "name": "Intel(R) Wireless-AC 9260 160MHz",
          "connection_name": "Wi-Fi",
          "status": null,
          "dhcp_enabled": true,
          "dhcp_server": "192.168.2.1",
          "ip_addresses": [
            "192.168.2.219"
          ]
        }
      ],
      "hyperv_requirements": {
        "vm_monitor_mode_extensions": true,
        "virtualization_enabled_in_firmware": true,
        "second_level_address_translation": false,
        "data_execution_prevention_available": true
      },
      "original_install_date_epoch": 1553640690,
      "original_install_date_epoch_utc": 1553615490,
      "system_boot_time_epoch": 1617110039,
      "system_boot_time_epoch_utc": 1617084839
    }

Schema Changes

`dig` Command Parser

Support for the opt_pseudosection and additional Section have been added. The query_size field has also been added.

$ dig example.com | jc --dig -p
    [
      {
        "id": 2951,
        "opcode": "QUERY",
        "status": "NOERROR",
        "flags": [
          "qr",
          "rd",
          "ra"
        ],
        "query_num": 1,
        "answer_num": 1,
        "authority_num": 0,
        "additional_num": 3,
        "opt_pseudosection": {
          "edns": {
            "version": 0,
            "flags": [],
            "udp": 4096
          }
        },
        "question": {
          "name": "example.com.",
          "class": "IN",
          "type": "A"
        },
        "answer": [
          {
            "name": "example.com.",
            "class": "IN",
            "type": "A",
            "ttl": 39302,
            "data": "93.184.216.34"
          }
        ],
        "additional": [
        {
          "name": "pdns196.ultradns.com.",
          "class": "IN",
          "type": "A",
          "ttl": 172800,
          "data": "156.154.64.196"
        },
        {
          "name": "pdns196.ultradns.com.",
          "class": "IN",
          "type": "AAAA",
          "ttl": 172800,
          "data": "2001:502:f3ff::e8"
        },
        "query_size": 57,
        "query_time": 49,
        "server": "2600:1700:bab0:d40::1#53(2600:1700:bab0:d40::1)",
        "when": "Fri Apr 16 16:05:10 PDT 2021",
        "rcvd": 56,
        "when_epoch": 1618614310,
        "when_epoch_utc": null
      }
    ]

Version 1.15.3 Updates

Add ufw status command parser tested on linux
Add ufw-appinfo command parser tested on linux
Fix deb package name to conform to standard
Add Caveats section to readme and manpage

New Parsers

`ufw` command parser

Linux support for the ufw status command. (Documentation):

# ufw status verbose  | jc --ufw -p          # or jc -p ufw status verbose
{
  "status": "active",
  "logging": "on",
  "logging_level": "low",
  "default": "deny (incoming), allow (outgoing), disabled (routed)",
  "new_profiles": "skip",
  "rules": [
    {
      "action": "ALLOW",
      "action_direction": "IN",
      "index": null,
      "network_protocol": "ipv4",
      "to_interface": "any",
      "to_transport": "any",
      "to_service": null,
      "to_ports": [
        22
      ],
      "to_ip": "0.0.0.0",
      "to_ip_prefix": 0,
      "comment": null,
      "from_ip": "0.0.0.0",
      "from_ip_prefix": 0,
      "from_interface": "any",
      "from_transport": "any",
      "from_port_ranges": [
        {
          "start": 0,
          "end": 65535
        }
      ],
      "from_service": null
    },
    {
      "action": "ALLOW",
      "action_direction": "IN",
      "index": null,
      "network_protocol": "ipv4",
      "to_interface": "any",
      "to_transport": "tcp",
      "to_service": null,
      "to_ports": [
        80,
        443
      ],
      "to_ip": "0.0.0.0",
      "to_ip_prefix": 0,
      "comment": null,
      "from_ip": "0.0.0.0",
      "from_ip_prefix": 0,
      "from_interface": "any",
      "from_transport": "any",
      "from_port_ranges": [
        {
          "start": 0,
          "end": 65535
        }
      ],
      "from_service": null
    }
  ]
}

`ufw-appinfo` command parser

Linux support for the ufw app info [application] and ufw app info all commands. (Documentation):

# ufw app info MSN | jc --ufw-appinfo -p          # or:  jc -p ufw app info MSN
[
  {
    "profile": "MSN",
    "title": "MSN Chat",
    "description": "MSN chat protocol (with file transfer and voice)",
    "tcp_list": [
      1863,
      6901
    ],
    "udp_list": [
      1863,
      6901
    ],
    "tcp_ranges": [
      {
        "start": 6891,
        "end": 6900
      }
    ],
    "normalized_tcp_list": [
      1863,
      6901
    ],
    "normalized_tcp_ranges": [
      {
        "start": 6891,
        "end": 6900
      }
    ],
    "normalized_udp_list": [
      1863,
      6901
    ]
  }
]

Version 1.15.4 Updates

Update ping parser to support error responses in OSX and BSD
Update ping parser to be more resilient against parsing errors for unknown error types
Update dig parser to support +noall +answer use case
Update dig parser compatibility to all platforms
Fix colors in Windows terminals (cmd.exe and PowerShell)
Fix epoch calculations when UTC is referenced as “Coordinated Universal Time”
Add Windows time format for systeminfo output
Add exceptions module to standardize parser exceptions
jc no longer swallows exit codes when using the “Magic” syntax. See the Exit Codes section of the README and man page for details

Version 1.15.5 Updates

Fix issue where help and about information would not display if a 3rd party parser library was missing. (e.g. xmltodict)
Add more error message detail when encountering ParseError and LibraryNotFound exceptions

Happy parsing!

For more information on the motivations for creating jc, see my blog post.

Featured

JC Version 1.14.0 Released

Try the jc web demo!

Happy New Year! I’m happy to announce the release of jc version 1.14.0 available on github and pypi.

jc now supports over 60 commands and file-types, including the new hash, hashsum (md5, md5sum, shasum, sha1sum, sha224sum, sha256sum, sha384sum, sha512sum), cksum, and wc command parsers. The ls parser has been enhanced to work with vdir output and the env parser has been enhanced to work with printenv output. jc is now fully tested on python 3.9.

To upgrade with pip:

$ pip3 install --upgrade jc

New Features

jc is now available on the official Debian and Ubuntu repository (apt-get install jc)
Tested on python 3.9

New Parsers

jc now supports 61 parsers. New parsers include kv, date, hash, hashsum, cksum, and wc.

Documentation and schemas for all parsers can be found here.

`kv` key/value pair parser (added in v1.13.2)

Parses key/value pair files. Files can include comments prepended with # or ; and keys and values can be delimited by = or : with or without spaces. Quotation marks are stripped from quoted values, though they can be kept with the -r (raw output) jc argument.

These types of files can be found in many places, including configuration files in /etc. (e.g. /etc/sysconfig/network-scripts).

$ cat keyvalue.txt
# this file contains key/value pairs
name = John Doe
address=555 California Drive
age: 34
; comments can include # or ;
# delimiter can be = or :
# quoted values have quotation marks stripped by default
# but can be preserved with the -r argument
occupation:"Engineer"

$ cat keyvalue.txt | jc --kv -p
{
  "name": "John Doe",
  "address": "555 California Drive",
  "age": "34",
  "occupation": "Engineer"
}

`date` command parser (added in v1.13.2)

Linux, macOS, and FreeBSD support for the date command:

$ date | jc --date -p          # or:  jc -p date
{
  "year": 2020,
  "month_num": 7,
  "day": 31,
  "hour": 16,
  "minute": 48,
  "second": 11,
  "month": "Jul",
  "weekday": "Fri",
  "weekday_num": 6,
  "timezone": "PDT"
}

`hash` command parser

Linux, macOS, and FreeBSD support for the hash BASH shell builtin:

$ hash | jc --hash -p
[
  {
    "hits": 2,
    "command": "/bin/cat"
  },
  {
    "hits": 1,
    "command": "/bin/ls"
  }
]

`hashsum` command parser

Linux, macOS, and FreeBSD support for various MD5 and SHA hash commands, including md5, md5sum, shasum, sha1sum, sha224sum, sha256sum, sha384sum, sha512sum:

$ md5sum * | jc --hashsum -p          # or jc -p md5sum *
[
  {
    "filename": "devtoolset-3-gcc-4.9.2-6.el7.x86_64.rpm",
    "hash": "65fc958c1add637ec23c4b137aecf3d3"   
  },
  {
    "filename": "digout",
    "hash": "5b9312ee5aff080927753c63a347707d"
  },
  {
    "filename": "dmidecode.out",
    "hash": "716fd11c2ac00db109281f7110b8fb9d"
  },
  {
    "filename": "file with spaces in the name",
    "hash": "d41d8cd98f00b204e9800998ecf8427e"
  },
  {
    "filename": "id-centos.out",
    "hash": "4295be239a14ad77ef3253103de976d2"
  },
  {
    "filename": "ifcfg.json",
    "hash": "01fda0d9ba9a75618b072e64ff512b43"
  }
]

`cksum` command parser

Linux, macOS, and FreeBSD support for the cksum and sum commands:

$ cksum * | jc --cksum -p          # or jc -p cksum *
[
  {
    "filename": "__init__.py",
    "checksum": 4294967295,
    "blocks": 0
  },
  {
    "filename": "airport.py",
    "checksum": 2208551092,
    "blocks": 3745
  },
  {
    "filename": "airport_s.py",
    "checksum": 1113817598,
    "blocks": 4572
  }
]

`wc` command parser

Linux, macOS, and FreeBSD support for the wc command:

$ wc * | jc --wc -p          # or jc -p wc *
[
  {
    "filename": "airport-I.json",
    "lines": 1,
    "words": 30,
    "characters": 307
  },
  {
    "filename": "airport-I.out",
    "lines": 15,
    "words": 33,
    "characters": 348
  },
  {
    "filename": "airport-s.json",
    "lines": 1,
    "words": 202,
    "characters": 2152
  }
]

Updated Parsers

The env parser has been enhanced to work with printenv command output using the “magic” syntax. (e.g. jc printenv)

The ls parser has been enhanced to work with vdir command output using the “magic” syntax. (e.g. jc vdir)

Schema Changes

There are no schema changes in this release.

Full Parser List

airport -I
airport -s
arp
blkid
cksum
crontab
crontab-u
CSV
date
df
dig
dmidecode
du
env
file
free
fstab
/etc/group
/etc/gshadow
hash
hashsum
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
kv
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
/etc/passwd
ping
pip list
pip show
ps
route
/etc/shadow
ss
stat
sysctl
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
timedatectl
tracepath
traceroute
uname -a
uptime
w
wc
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

v1.14.1 Release Changes

Add iw-scan parser tested on linux (beta)
Update date parser for Ubuntu 20.04 support
Update last parser for last -F support
Update last parser to add convenience fields and augment data for easier parsing
Update man page
Minor documentation updates

Schema Changes:

`date` command parser

A new period field has been added to the schema to represent AM and PM which may appear depending on locale configuration on the host. If the locale does not print AM or PM then the value will be null.

{
  "year":         integer,
  "month_num":    integer,
  "day":          integer,
  "hour":         integer,
  "minute":       integer,
  "second":       integer,
  "period":       string,
  "month":        string,
  "weekday":      string,
  "weekday_num":  integer,
  "timezone":     string
 }

`last` command parser

The duration field calculation has changed to be more easily parsed and will display as total HOURS:MINUTES. Also, a few convenience calculated fields have been added and will display when the last -F option is used: login_epoch, logout_epoch, and duration_seconds.

[
  {
    "user":             string,
    "tty":              string,
    "hostname":         string,
    "login":            string,
    "logout":           string,
    "duration":         string,
    "login_epoch":      integer,   # available with last -F option
    "logout_epoch":     integer,   # available with last -F option
    "duration_seconds": integer    # available with last -F option
  }
]

Featured

Parsing Command Output in Nornir with JC

In my last couple of posts we learned how to parse linux command output in Ansible and Saltstack using jc. In this post we’ll do something similar with Nornir.

Nornir is a popular automation framework that allows you to use native python to control hosts and network devices. Many times it would be nice to be able to parse the output of remotely-run commands and use that information elsewhere in your scripts. jc allows you to do this automatically – no regex/looping/slicing/etc. required to get to the data you want!

Since jc is both a command line tool and a python library, it is easy to use inside a Nornir script to automate the boring work of command output parsing.

For more information on the motivations for creating jc, see my blog post.

Installation

To use jc in a Nornir script, simply install it and import one or more parsers.

Installing jc:

$ pip3 install jc

Import the jc library:

import jc

Now we are ready to use the jc in our Nornir script!

Syntax

To use the jc parser, call the parse function with the parser name and command output arguments. For example, to automatically parse a uname -a output string:

uname_obj = jc.parse('uname', uname_command_output_string)

Now you can use whatever uname field you would like in the rest of your code:

print(uname_obj['node_name'])

A Simple Example

Below we have a small Nornir script using Netmiko to call a few commands on a linux host. (uname, date, ifconfig, and uptime) I used the nornir-netmiko package to simplify the connection to the linux host:

from nornir import InitNornir
from nornir_netmiko.tasks import netmiko_send_command
import jc

nr = InitNornir(config_file='config.yaml')

def run_commands(task, command_list):
    for cmd in command_list:
        task.run(
            task=netmiko_send_command,
            command_string=cmd,
            name=cmd
        )

commands = ['uname -a', 'date', 'ifconfig', 'uptime']

result = nr.run(
    task=run_commands,
    command_list=commands
)

uname_result_string = result['host1'][1].result
uname_result_obj = jc.parse('uname', uname_result_string)
hostname = uname_result_obj['node_name']
kernel_version = uname_result_obj['kernel_version']

date_result_string = result['host1'][2].result
date_result_obj = jc.parse('date', date_result_string)
timezone = date_result_obj['timezone']

ifconfig_result_string = result['host1'][3].result
ifconfig_result_obj = jc.parse('ifconfig', ifconfig_result_string)
ipv4_addr = ifconfig_result_obj[1]['ipv4_addr']

uptime_result_string = result['host1'][4].result
uptime_result_obj = jc.parse('uptime', uptime_result_string)
uptime = uptime_result_obj['uptime']

print(f'hostname: {hostname}')
print(f'kernel version: {kernel_version}')
print(f'timezone: {timezone}')
print(f'ip address: {ipv4_addr}')
print(f'uptime: {uptime}')

Script output:

$ python3 nornir_with_jc.py 
hostname: my-ubuntu
kernel version: #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020
timezone: UTC
ip address: 192.168.1.239
uptime: 47 min

Here you can see we have run a few tasks and assigned the results to some variables. Let’s go over the uname -a output:

uname_result_string = result['host1'][1].result

Above, we are grabbing the string result output attribute from the uname -a command (the first command in the commands list) and are assigning it to uname_result_string. There are cleaner ways of getting the result info from Nornir, but this way we can see the structure of the result object.

uname_result_obj = jc.parse('uname', uname_result_string)

Next, we have run uname_result_string through the jc uname parser and assigned the resulting dictionary object to the uname_result_obj variable.

hostname = uname_result_obj['node_name']
kernel_version = uname_result_obj['kernel_version']

Then, we created a couple of variables that we can use in our script called hostname and kernel_version so we can grab just the object attributes we are interested in. jc returns standard dictionary objects, so they are easy to use.

print(f'hostname: {hostname}')
print(f'kernel version: {kernel_version}')

Finally, we use our variables in a print function, but we could have used these objects anywhere else in the script.

Nice! Instead of parsing the STDOUT text manually, we used jc to automatically parse the command output, providing us a convenient object to use elsewhere in our script. No more need to regex or loop and slice your way through the output to get what you are looking for!

For a complete list of jc parsers available and their associated schemas, see the parser documentation.

Happy parsing!

Featured

Parsing Command Output in Saltstack with JC

In my last blog post I demonstrated how we can easily parse remote command output in Ansible. Since then it was requested that I demonstrate something similar using Saltstack.

Saltstack (or Salt, as it is known) is a little different than Ansible in that it primarily uses a pub/sub architecture vs. SSH and requires an agent, or a Minion, to be installed on the remote hosts you are managing.

It turns out it is fairly straightforward to add jc functionality to Saltstack via a custom Output Module and/or Serializer Module. We’ll go over both methods, plus a bonus method in this post.

For more information on the motivations for creating jc, see my blog post.

Output Module

With a Salt Output Module you can restructure the output of the command results that are written to STDOUT on the Master. The default output is typically YAML, but you can change it to JSON or other formats with builtin output modules.

Here is the default YAML output:

# salt '*' cmd.run 'uptime'
minion1:
     16:31:16 up 2 days,  3:04,  1 user,  load average: 0.03, 0.03, 0.00
minion2:
     16:31:16 up 2 days,  3:04,  1 user,  load average: 0.00, 0.00, 0.00

And here is the output using the builtin JSON ouputter:

# salt '*' cmd.run 'uptime' --out=json
{
    "minion2": " 16:33:02 up 2 days,  3:06,  1 user,  load average: 0.00, 0.00, 0.00"
}
{
    "minion1": " 16:33:02 up 2 days,  3:06,  1 user,  load average: 0.00, 0.02, 0.00"
}

But we can do better with jc by turning the uptime output into a JSON object:

# JC_PARSER=uptime salt '*' cmd.run 'uptime' --out=jc --out-indent=2
{
  "minion1": {
    "time": "16:36:04",
    "uptime": "2 days, 3:09",
    "users": 1,
    "load_1m": 0.07,
    "load_5m": 0.02,
    "load_15m": 0.0
  }
}
{
  "minion2": {
    "time": "16:36:04",
    "uptime": "2 days, 3:09",
    "users": 1,
    "load_1m": 0.0,
    "load_5m": 0.0,
    "load_15m": 0.0
  }
}

Now we can pipe this output to jq, jello, or any other JSON filter to more easily consume this data.

We’ll go over the Output Module installation and usage later in this post.

Serializer Module

With a Salt Serializer Module you can restructure the output of the command results during runtime on each Minion so they can be used as objects/variables within a Salt state. For example, If I only cared about the number of users currently logged into each minion and wanted to set that number as a variable for use elsewhere, we could do that with a jc Serializer Module.

Here is a simple, contrived example Salt state file to show how it works:

{% set uptime_out = salt.cmd.shell('uptime') %}
{% set uptime_jc = salt.slsutil.deserialize('jc', uptime_out, parser='uptime') %}

run_uptime:
  cmd.run:
    - name: >
        echo 'The number of users logged in is {{ uptime_jc.users }}'

And here is the output after applying this state file:

# salt '*' state.apply uptime-users
minion1:
----------
          ID: run_uptime
    Function: cmd.run
        Name: echo 'The number of users logged in is 1'

      Result: True
     Comment: Command "echo 'The number of users logged in is 1'
              " run
     Started: 17:01:43.992058
    Duration: 6.107 ms
     Changes:   
              ----------
              pid:
                  23208
              retcode:
                  0
              stderr:
              stdout:
                  The number of users logged in is 1

Summary for minion1
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time:   6.107 ms
minion2:
----------
          ID: run_uptime
    Function: cmd.run
        Name: echo 'The number of users logged in is 2'

      Result: True
     Comment: Command "echo 'The number of users logged in is 2'
              " run
     Started: 17:01:44.005482
    Duration: 6.55 ms
     Changes:   
              ----------
              pid:
                  23371
              retcode:
                  0
              stderr:
              stdout:
                  The number of users logged in is 2

Summary for minion2
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time:   6.550 ms

Since jc deserialized the command output into an object, we can simply reference the object attributes in our Salt states. We’ll go over installation and usage of the jc Serializer Module later in this post.

Installation and Usage

To use the jc Output Module, you will need to install jc on the Master. To use the jc Serializer Module, you will need to install jc on the Minions. Depending on your use case you may decide to install one or the other or both modules.

Installing jc

You can install jc on the Master and Minions with the following command. Of course, this can also be automated via Salt!

$ pip3 install jc

Installing the Output Module

To install the Output Module on the Master, you need to place the Python module in a directory where the Master is configured to look for it.

First, edit the /etc/salt/master configuration file to configure a custom Module directory. In this example we will use /srv/modules by adding this line to the configuration file:

module_dirs: ["/srv/modules"]

Next we need to create the /srv/modules/output directory, if it doesn’t already exist:

# mkdir -p /srv/modules/output

Next, copy the python module into the directory. I have uploaded the code to Github as a Gist:

# curl https://gist.githubusercontent.com/kellyjonbrazil/24e10f0c3e438ea22fc1e2bfaee22efc/raw/263e4eaf8e51f974b34d44e0483540b163667bdf/jc.py -o /srv/modules/output/jc.py

Finally, restart the Salt Master:

# systemctl restart salt-master

Using the Output Module

To use the jc Output Module, you need to call it with the --out=jc option of the salt command.

Additionally, you need to tell the jc Output Module which parser to use. To do this, you can set the JC_PARSER environment variable inline with the command:

# JC_PARSER=date salt '*' cmd.run 'date' --out=jc
{"minion2": {"year": 2020, "month_num": 9, "day": 15, "hour": 18, "minute": 27, "second": 11, "month": "Sep", "weekday": "Tue", "weekday_num": 3, "timezone": "UTC"}}
{"minion1": {"year": 2020, "month_num": 9, "day": 15, "hour": 18, "minute": 27, "second": 11, "month": "Sep", "weekday": "Tue", "weekday_num": 3, "timezone": "UTC"}}

For a list of jc parsers, see the parser documentation.

Additionally, you can add the --out-indent option to pretty-print the output:

# JC_PARSER=date salt '*' cmd.run 'date' --out=jc --out-indent=2
{
  "minion2": {
    "year": 2020,
    "month_num": 9,
    "day": 15,
    "hour": 18,
    "minute": 29,
    "second": 8,
    "month": "Sep",
    "weekday": "Tue",
    "weekday_num": 3,
    "timezone": "UTC"
  }
}
{
  "minion1": {
    "year": 2020,
    "month_num": 9,
    "day": 15,
    "hour": 18,
    "minute": 29,
    "second": 8,
    "month": "Sep",
    "weekday": "Tue",
    "weekday_num": 3,
    "timezone": "UTC"
  }
}

Installing the Serializer Module

To install the Serializer Module on the Minions, you can copy the Python module to the _serializers folder within your Salt fileserver directory on the Master (typically /srv/salt) and sync to the Minions.

First, create the /srv/salt/_serializers directory if it doesn’t already exist:

# mkdir -p /srv/salt/_serializers

Next, copy the Python module into the _serializers directory on the Master. I have uploaded the code to Github as a Gist:

# curl https://gist.githubusercontent.com/kellyjonbrazil/7d67cfa003735bf80ef43fe5652950dd/raw/1541a7d327aed0366ccfea91bd0533032111d11c/jc.py -o /srv/salt/_serializers/jc.py

Finally, sync the jc Serializer Module to the Minions:

# salt '*' saltutil.sync_all

Using the Serializer Module

To use the jc Serializer Module, invoke it with the salt.slsutil.deserialize() function within a Salt state file. The function requires three arguments to deserialize with jc:

Argument 1: 'jc'
- This should always be the literal string 'jc' to call the jc Serializer Module
Argument 2: String data to be parsed
- This is the STDOUT string output of the command you want to deserialize
Argument 3: parser='<parser>'
- <parser> is the jc parser you want to use to parse the command output. For example, to use the ifconfig parser, Argument 3 would look like this: parser='ifconfig'. For a list of jc parsers, see the parser documentation.

For example, via Jinja2 template:

{% set date = salt.slsutil.deserialize('jc', date_stdout, parser='date') %}

Then you can reference any attribute of the date object (Python dictionary) in any other part of the Salt state file. Here is a full example:

{% set date_stdout = salt.cmd.shell('date') %}
{% set date = salt.slsutil.deserialize('jc', date_stdout, parser='date') %}

run_date:
  cmd.run:
    - name: >
        echo 'The timezone is {{ date.timezone }}'

One More Thing

It is also possible to deserialize command output into objects using jc without using the jc Serializer Module. If jc is installed on the Minion, then you can pipe the command output to jc as you would normally do on the command line, then use the buit-in JSON Serializer Module to deserialize the jc JSON output into Python objects:

{% set date = salt.slsutil.deserialize('json', salt.cmd.shell('date | jc --date')) %}

run_date:
  cmd.run:
    - name: >
        echo 'The timezone is {{ date.timezone }}'

Happy parsing!

Featured

Parsing Command Output in Ansible with JC

Ansible is a popular automation framework that allows you to configure any number of remote hosts in a declarative and idempotent way. A common use-case is to run a shell command on the remote host, return the STDOUT output, loop through it and parse it.

Starting in Ansible 2.9 with the community.general collection, it is possible to use jc as a filter to automatically parse the command output for you so you can easily use the output as an object. The official filter documentation can be found here. Even more detailed documentation can be found here.

For more information on the motivations for creating jc, see my blog post.

Installation

To use the jc filter plugin, you just need to install jc and the community.general collection on the Ansible controller. Ansible version 2.9 or higher is required to install the community.general collection.

Installing jc:

$ pip3 install jc

Installing the community.general Ansible collection:

$ ansible-galaxy collection install community.general

Now we are ready to use the jc filter plugin!

Syntax

To use the jc filter plugin you just need to pipe the command output to the plugin and specify the parser as an argument. For example, this is how you would parse the output of ps on the remote host:

  tasks:
  - shell: ps aux
    register: result
  - set_fact:
      myvar: "{{ result.stdout | community.general.jc('ps') }}"

Note: Use underscores instead of dashes (if any) in the parser name. e.g. git-log becomes git_log

This will generate a myvar object that includes the exact same information you would have received by running jc ps aux on the remote host. Now you can use object notation to pull out the information you are interested in.

A Simple Example

Let’s put it all together with a very simple example. In this example we will run the date command on the remote host and print the timezone as a debug message:

- name: Get Timezone
  hosts: ubuntu
  tasks:
  - shell: date
    register: result
  - set_fact:
      myvar: "{{ result.stdout | community.general.jc('date') }}"
  - debug:
      msg: "The timezone is: {{ myvar.timezone }}"

Instead of parsing the STDOUT text manually, we used the timezone attribute of the myvar object that jc gave us. Let’s see this in action:

$ ansible-playbook get-timezone.yml 

PLAY [Get Timezone] *****************************************************************************

TASK [Gathering Facts] **************************************************************************
ok: [192.168.1.239]

TASK [shell] ************************************************************************************
changed: [192.168.1.239]

TASK [set_fact] *********************************************************************************
ok: [192.168.1.239]

TASK [debug] ************************************************************************************
ok: [192.168.1.239] => {
    "msg": "The timezone is: UTC"
}

PLAY RECAP **************************************************************************************
192.168.1.239              : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Simple – no more need to grep/awk/sed your way through the output to get what you are looking for!

For a complete list of jc parsers available and their associated schemas, see the parser documentation.

Happy parsing!

Featured

JC Version 1.13.1 Released

Try the jc web demo!

I’m happy to announce the release of jc version 1.13.1 available on github and pypi.

jc now supports over 55 commands and file-types, including the new ping, sysctl, traceroute, and tracepath command parsers. The INI file parser has been enhanced to support simple key/value text files and the route command parser now supports IPv6 tables.

Custom local parser plugins are now supported. This allows overriding existing parsers and rapid development of new parsers.

Other updates include verbose debugging, more consistent handling of empty data, and many parser fixes for FreeBSD.

jc can be installed via pip or through several new official OS package repositories, including Fedora, openSUSE, Arch Linux, NixOS Linux, Guix System Linux, FreeBSD, and macOS. For more information on how to get jc, click here.

To upgrade with pip:

$ pip3 install --upgrade jc

New Features

jc is now available on the official Fedora repository (dnf install jc)
jc is now available on the official Arch Linux repository (pacman -S jc)
jc is now available on the official NixOS repository (nix-env -iA nixpkgs.jc)
jc is now available on the official Guix System Linux repository (guix install jc)
jc is now available on the official FreeBSD ports repository (portsnap fetch update && cd /usr/ports/textproc/py-jc && make install clean)
jc is in process (Intent To Package) for Debian packaging.
Local custom parser plugins allow you to override packaged parsers or rapidly create your own.
Verbose debugging is now supported with the -dd command argument.
All parsers now correctly return empty objects when sent empty data.
Older versions of the pygments library (>=2.3.0) are now supported (for Debian packaging)

New Parsers

jc now supports 55 parsers. New parsers include ping, sysctl, tracepath, and traceroute.

Documentation and schemas for all parsers can be found here.

`ping` command parser

Linux, macOS, and FreeBSD support for the ping command:

$ ping 8.8.8.8 -c 3 | jc --ping -p          # or:  jc -p ping 8.8.8.8 -c 3
{
  "destination_ip": "8.8.8.8",
  "data_bytes": 56,
  "pattern": null,
  "destination": "8.8.8.8",
  "packets_transmitted": 3,
  "packets_received": 3,
  "packet_loss_percent": 0.0,
  "duplicates": 0,
  "time_ms": 2005.0,
  "round_trip_ms_min": 23.835,
  "round_trip_ms_avg": 30.46,
  "round_trip_ms_max": 34.838,
  "round_trip_ms_stddev": 4.766,
  "responses": [
    {
      "type": "reply",
      "timestamp": null,
      "bytes": 64,
      "response_ip": "8.8.8.8",
      "icmp_seq": 1,
      "ttl": 118,
      "time_ms": 23.8,
      "duplicate": false
    },
    {
      "type": "reply",
      "timestamp": null,
      "bytes": 64,
      "response_ip": "8.8.8.8",
      "icmp_seq": 2,
      "ttl": 118,
      "time_ms": 34.8,
      "duplicate": false
    },
    {
      "type": "reply",
      "timestamp": null,
      "bytes": 64,
      "response_ip": "8.8.8.8",
      "icmp_seq": 3,
      "ttl": 118,
      "time_ms": 32.7,
      "duplicate": false
    }
  ]
}

`sysctl` command parser

Linux, macOS, and FreeBSD support for the sysctl -a command:

$ sysctl -a | jc --sysctl -p          # or:  jc -p sysctl -a
{
  "user.cs_path": "/usr/bin:/bin:/usr/sbin:/sbin",
  "user.bc_base_max": 99,
  "user.bc_dim_max": 2048,
  "user.bc_scale_max": 99,
  "user.bc_string_max": 1000,
  "user.coll_weights_max": 2,
  "user.expr_nest_max": 32
  ...
}

`tracepath` command parser

Linux support for the tracepath command:

$ tracepath6 3ffe:2400:0:109::2 | jc --tracepath -p
{
  "pmtu": 1480,
  "forward_hops": 2,
  "return_hops": 2,
  "hops": [
    {
      "ttl": 1,
      "guess": true,
      "host": "[LOCALHOST]",
      "reply_ms": null,
      "pmtu": 1500,
      "asymmetric_difference": null,
      "reached": false
    },
    {
      "ttl": 1,
      "guess": false,
      "host": "dust.inr.ac.ru",
      "reply_ms": 0.411,
      "pmtu": null,
      "asymmetric_difference": null,
      "reached": false
    },
    {
      "ttl": 2,
      "guess": false,
      "host": "dust.inr.ac.ru",
      "reply_ms": 0.39,
      "pmtu": 1480,
      "asymmetric_difference": 1,
      "reached": false
    },
    {
      "ttl": 2,
      "guess": false,
      "host": "3ffe:2400:0:109::2",
      "reply_ms": 463.514,
      "pmtu": null,
      "asymmetric_difference": null,
      "reached": true
    }
  ]
}

`traceroute` command parser

Linux, macOS, and FreeBSD support for the traceroute command:

$ traceroute -m 3 8.8.8.8 | jc --traceroute -p          # or:  jc -p traceroute -m 3 8.8.8.8
{
  "destination_ip": "8.8.8.8",
  "destination_name": "8.8.8.8",
  "hops": [
    {
      "hop": 1,
      "probes": [
        {
          "annotation": null,
          "asn": null,
          "ip": "192.168.1.254",
          "name": "dsldevice.local.net",
          "rtt": 6.616
        },
        {
          "annotation": null,
          "asn": null,
          "ip": "192.168.1.254",
          "name": "dsldevice.local.net",
          "rtt": 6.413
        },
        {
          "annotation": null,
          "asn": null,
          "ip": "192.168.1.254",
          "name": "dsldevice.local.net",
          "rtt": 6.308
        }
      ]
    },
    {
      "hop": 2,
      "probes": [
        {
          "annotation": null,
          "asn": null,
          "ip": "76.220.24.1",
          "name": "76-220-24-1.lightspeed.sntcca.sbcglobal.net",
          "rtt": 29.367
        },
        {
          "annotation": null,
          "asn": null,
          "ip": "76.220.24.1",
          "name": "76-220-24-1.lightspeed.sntcca.sbcglobal.net",
          "rtt": 40.197
        },
        {
          "annotation": null,
          "asn": null,
          "ip": "76.220.24.1",
          "name": "76-220-24-1.lightspeed.sntcca.sbcglobal.net",
          "rtt": 29.162
        }
      ]
    },
    {
      "hop": 3,
      "probes": [
        {
          "annotation": null,
          "asn": null,
          "ip": null,
          "name": null,
          "rtt": null
        }
      ]
    }
  ]
}

Updated Parsers

There have been many parser updates since v1.11.0. The INI file parser has been enhanced to support files and output that contains simple key/value pairs. The route command parser has been enhanced to add support for IPv6 routing tables. The uname parser provides more intuitive debug messages and an issue in the iptables command parser was fixed, allowing it to convert the last row of a table. Many other parser enhancements including the consistent handling of blank input, FreeBSD support, and minor field additions and fixes are included.

Key/Value Pair Files with the INI File Parser

The INI file parser has been enhanced to now support files containing simple key/value pairs. Files can include comments prepended with # or ; and keys and values can be delimited by = or : with or without spaces. Quotation marks are stripped from quoted values, though they can be kept with the -r (raw output) jc argument.

These types of files can be found in many places, including configuration files in /etc. (e.g. /etc/sysconfig/network-scripts).

$ cat keyvalue.txt
# this file contains key/value pairs
name = John Doe
address=555 California Drive
age: 34
; comments can include # or ;
# delimiter can be = or :
# quoted values have quotation marks stripped by default
# but can be preserved with the -r argument
occupation:"Engineer"

$ cat keyvalue.txt | jc --ini -p
{
  "name": "John Doe",
  "address": "555 California Drive",
  "age": "34",
  "occupation": "Engineer"
}

`route` Command Parser

The route command parser has been enhanced to support IPv6 tables.

$ route -6 | jc --route -p          # or: jc -p route -6
[
  {
    "destination": "[::]/96",
    "next_hop": "[::]",
    "flags": "!n",
    "metric": 1024,
    "ref": 0,
    "use": 0,
    "iface": "lo",
    "flags_pretty": [
      "REJECT"
    ]
  },
  {
    "destination": "0.0.0.0/96",
    "next_hop": "[::]",
    "flags": "!n",
    "metric": 1024,
    "ref": 0,
    "use": 0,
    "iface": "lo",
    "flags_pretty": [
      "REJECT"
    ]
  },
  {
    "destination": "2002:a00::/24",
    "next_hop": "[::]",
    "flags": "!n",
    "metric": 1024,
    "ref": 0,
    "use": 0,
    "iface": "lo",
    "flags_pretty": [
      "REJECT"
    ]
  },
  ...
]

Schema Changes

There are no schema changes in this release.

Full Parser List

airport -I
airport -s
arp
blkid
crontab
crontab-u
CSV
df
dig
dmidecode
du
env
file
free
fstab
/etc/group
/etc/gshadow
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
/etc/passwd
ping
pip list
pip show
ps
route
/etc/shadow
ss
stat
sysctl
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
timedatectl
tracepath
traceroute
uname -a
uptime
w
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

More Comprehensive Tracebacks in Python

Python has great exception-handling with nice traceback messages that can help debug issues with your code. Here’s an example of a typical traceback message:

Traceback (most recent call last):
  File "/Users/kbrazil/Library/Python/3.7/bin/jc", line 11, in <module>
    load_entry_point('jc', 'console_scripts', 'jc')()
  File "/Users/kbrazil/git/jc/jc/cli.py", line 396, in main
    result = parser.parse(data, raw=raw, quiet=quiet)
  File "/Users/kbrazil/git/jc/jc/parsers/uname.py", line 108, in parse
    raw_output['kernel_release'] = parsed_line.pop(0)
IndexError: pop from empty list

I usually read these from the bottom-up to zero-in on the issue. Here I can see that my program is trying to pop the last item off a list called parsed_line, but the list is empty and Python doesn’t know what to do, so it quits with an IndexError exception.

The traceback conveniently includes the line number and snippet of the offending code. This is usually correct, but the line numbering can be off depending on the type of error or exception. This might be enough information for me to dig into the code and figure out why parsed_line is empty. But what about a more complex example?

Traceback (most recent call last):
  File "/Users/kbrazil/Library/Python/3.7/bin/jc", line 11, in <module>
    load_entry_point('jc', 'console_scripts', 'jc')()
  File "/Users/kbrazil/git/jc/jc/cli.py", line 396, in main
    result = parser.parse(data, raw=raw, quiet=quiet)
  File "/Users/kbrazil/git/jc/jc/parsers/arp.py", line 226, in parse
    'hwtype': line[4].lstrip('[').rstrip(']'),
IndexError: list index out of range

In this traceback I can see that the program is trying to pull the fifth item from the line list but Python can’t grab it – probably because the list doesn’t have that many items. This traceback doesn’t show me the state of the variables, so I can’t tell what input the function took or what the line variable looks like when causing this issue.

What I’d really like is to see more context (the code lines before and after the error) along with the variable state when the error occurred. Many times this is done with a debugger or with print() statements. But there is another way!

cgitb (deprecated)

Back in 1995 when CGI scripts were all the rage, Python added the cgi library along with its helper module, cgitb. This module would print more verbose traceback messages to the browser to help with troubleshooting. Conveniently, its traceback messages would include surrounding code context and variable state! cgitb is poorly named since it can drop-in replace standard tracebacks on any type of program. The name might be why it never really gained traction. Unfortunately, cgitb is set to be deprecated in Python 3.10, but let’s see how it works and then check out how to replace it:

IndexError
Python 3.7.6: /usr/local/opt/python/bin/python3.7
Mon Jul  6 12:09:08 2020

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 /Users/kbrazil/Library/Python/3.7/bin/jc in <module>()
    2 # EASY-INSTALL-ENTRY-SCRIPT: 'jc','console_scripts','jc'
    3 __requires__ = 'jc'
    4 import re
    5 import sys
    6 from pkg_resources import load_entry_point
    7 
    8 if __name__ == '__main__':
    9     sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
   10     sys.exit(
   11         load_entry_point('jc', 'console_scripts', 'jc')()
   12     )
load_entry_point = <function load_entry_point>

 /Users/kbrazil/git/jc/jc/cli.py in main()
  391 
  392         if parser_name in parsers:
  393             # load parser module just in time so we don't need to load all modules
  394             parser = parser_module(arg)
  395             try:
  396                 result = parser.parse(data, raw=raw, quiet=quiet)
  397                 found = True
  398                 break
  399 
  400             except Exception:
  401                 if debug:
result undefined
parser = <module 'jc.parsers.arp' from '/Users/kbrazil/git/jc/jc/parsers/arp.py'>
parser.parse = <function parse>
data = "#!/usr/bin/env python3\n\nimport jc.parsers.ls\nimp...print(tabulate.tabulate(parsed, headers='keys'))\n"
raw = False
quiet = False

 /Users/kbrazil/git/jc/jc/parsers/arp.py in parse(data="#!/usr/bin/env python3\n\nimport jc.parsers.ls\nimp...print(tabulate.tabulate(parsed, headers='keys'))\n", raw=False, quiet=False)
  221             for line in cleandata:
  222                 line = line.split()
  223                 output_line = {
  224                     'name': line[0],
  225                     'address': line[1].lstrip('(').rstrip(')'),
  226                     'hwtype': line[4].lstrip('[').rstrip(']'),
  227                     'hwaddress': line[3],
  228                     'iface': line[6],
  229                 }
  230                 raw_output.append(output_line)
  231 
line = ['#!/usr/bin/env', 'python3']
].lstrip undefined
IndexError: list index out of range
    __cause__ = None
    __class__ = <class 'IndexError'>
    __context__ = None
    __delattr__ = <method-wrapper '__delattr__' of IndexError object>
    __dict__ = {}
    __dir__ = <built-in method __dir__ of IndexError object>
    __doc__ = 'Sequence index out of range.'
    __eq__ = <method-wrapper '__eq__' of IndexError object>
    __format__ = <built-in method __format__ of IndexError object>
    __ge__ = <method-wrapper '__ge__' of IndexError object>
    __getattribute__ = <method-wrapper '__getattribute__' of IndexError object>
    __gt__ = <method-wrapper '__gt__' of IndexError object>
    __hash__ = <method-wrapper '__hash__' of IndexError object>
    __init__ = <method-wrapper '__init__' of IndexError object>
    __init_subclass__ = <built-in method __init_subclass__ of type object>
    __le__ = <method-wrapper '__le__' of IndexError object>
    __lt__ = <method-wrapper '__lt__' of IndexError object>
    __ne__ = <method-wrapper '__ne__' of IndexError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of IndexError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of IndexError object>
    __repr__ = <method-wrapper '__repr__' of IndexError object>
    __setattr__ = <method-wrapper '__setattr__' of IndexError object>
    __setstate__ = <built-in method __setstate__ of IndexError object>
    __sizeof__ = <built-in method __sizeof__ of IndexError object>
    __str__ = <method-wrapper '__str__' of IndexError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __suppress_context__ = False
    __traceback__ = <traceback object>
    args = ('list index out of range',)
    with_traceback = <built-in method with_traceback of IndexError object>

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "/Users/kbrazil/Library/Python/3.7/bin/jc", line 11, in <module>
    load_entry_point('jc', 'console_scripts', 'jc')()
  File "/Users/kbrazil/git/jc/jc/cli.py", line 396, in main
    result = parser.parse(data, raw=raw, quiet=quiet)
  File "/Users/kbrazil/git/jc/jc/parsers/arp.py", line 226, in parse
    'hwtype': line[4].lstrip('[').rstrip(']'),
IndexError: list index out of range

This verbose traceback gives me just what I’m looking for! Though the default is 5, I told cgitb to print out 11 lines of context. Now I can see the two variables I’m particularly interested in to troubleshoot this issue: data and line.

data = "#!/usr/bin/env python3\n\nimport jc.parsers.ls\nimp...print(tabulate.tabulate(parsed, headers='keys'))\n"

(Notice how it snips the value if it’s too long. Pretty cool!)

line = ['#!/usr/bin/env', 'python3']

Now I can easily see that the data that was input into the function does not look like the type of data expected at all. (it is expecting text output from the arp command and instead it was fed in another Python script file) I can also see that the line list only has two items.

I included cgitb in jc to provide a verbose debug command option (-dd) to help speed up troubleshooting of parsing issues – typically during development of a new parser or to quickly identify an issue a user is having over email. It seemed perfect for my needs and aside from the weird name it worked well.

Then I noticed that cgitb was to be deprecated along with the cgi module with no replacement.

tracebackplus

I decided to vendorize the builtin cgitb library so it wouldn’t be orphaned in later versions of Python. After looking at the code I found it would be pretty easy to simplify the module by taking out all of the HTML rendering cruft. And why not rename it to something more descriptive while we’re at it? After not too much thought, I settled on tracebackplus.

Like cgitb, tracebackplus doesn’t require any external libraries and can easily replace standard tracebacks with the following code:

import tracebackplus
tracebackplus.enable(context=11)

Here is the code for tracebackplus along with the permissive MIT license. Feel free to use this code in your projects.

Here’s an example of how it is being used in jc to provide different levels of debugging using the -d (standard traceback) or -dd (tracebackplus) command line arguments:

try:
    result = parser.parse(data, raw=raw, quiet=quiet)
    found = True
    break

except Exception:
    if debug:
        if verbose_debug:
            import jc.tracebackplus
            jc.tracebackplus.enable(context=11)

        raise

    else:
        import jc.utils
        jc.utils.error_message(
            f'{parser_name} parser could not parse the input data. Did you use the correct parser?\n'
            '                 For details use the -d or -dd option.')
        sys.exit(1)

Happy debugging!

Featured

JC Version 1.11.1 Released

Try the jc web demo!

I’m happy to announce the release of jc version 1.11.1 available on github and pypi.

jc now supports over 50 commands and file-types and now can be installed via Homebrew (macOS) and zypper (OpenSUSE). In addition, jc can now be installed via DEB and RPM packages or run as a single binary on linux or macOS. You can set your own custom colors for jc to display and more command parsers are supported on macOS. See below for more information on the new features.

To upgrade, run:

$ pip3 install --upgrade jc

RPM/DEB packages and Binaries can also be found here.

OS package repositories (e.g. brew, zypper, etc.) will be updated with the latest version of jc on their own future release schedules.

New Features

jc now supports custom colors. You can customize the colors by setting the JC_COLORS environment variable.
jc is now available on macOS via Homebrew (brew install jc)
jc is now available on OpenSUSE via zypper
DEB, RPM, and Binary packages are now available for linux and macOS
Several back-end updates to support packaging on standard linux distribution package repositories in the future (e.g. Fedora)

New Parsers

jc now supports 51 parsers. The dmidecode command is now supported for linux platforms.

Documentation and schemas for all parsers can be found here.

`dmidecode` command parser

Linux support for the dmidecode command:

# jc -p dmidecode
[
  {
    "handle": "0x0000",
    "type": 0,
    "bytes": 24,
    "description": "BIOS Information",
    "values": {
      "vendor": "Phoenix Technologies LTD",
      "version": "6.00",
      "release_date": "04/13/2018",
      "address": "0xEA490",
      "runtime_size": "88944 bytes",
      "rom_size": "64 kB",
      "characteristics": [
        "ISA is supported",
        "PCI is supported",
        "PC Card (PCMCIA) is supported",
        "PNP is supported",
        "APM is supported",
        "BIOS is upgradeable",
        "BIOS shadowing is allowed",
        "ESCD support is available",
        "Boot from CD is supported",
        "Selectable boot is supported",
        "EDD is supported",
        "Print screen service is supported (int 5h)",
        "8042 keyboard services are supported (int 9h)",
        "Serial services are supported (int 14h)",
        "Printer services are supported (int 17h)",
        "CGA/mono video services are supported (int 10h)",
        "ACPI is supported",
        "Smart battery is supported",
        "BIOS boot specification is supported",
        "Function key-initiated network boot is supported",
        "Targeted content distribution is supported"
      ],
      "bios_revision": "4.6",
      "firmware_revision": "0.0"
    }
  },
  ...
]

Updated Parsers

The netstat command is now supported on macOS:

$ jc -p netstat
[
  {
    "proto": "tcp4",
    "recv_q": 0,
    "send_q": 0,
    "local_address": "mylaptop.local",
    "foreign_address": "173.199.15.254",
    "state": "SYN_SENT   ",
    "kind": "network",
    "local_port": "57561",
    "foreign_port": "https",
    "transport_protocol": "tcp",
    "network_protocol": "ipv4",
    "local_port_num": 57561
  },
  {
    "proto": "tcp4",
    "recv_q": 0,
    "send_q": 0,
    "local_address": "mylaptop.local",
    "foreign_address": "192.0.71.3",
    "state": "ESTABLISHED",
    "kind": "network",
    "local_port": "57525",
    "foreign_port": "https",
    "transport_protocol": "tcp",
    "network_protocol": "ipv4",
    "local_port_num": 57525
  },
  ...
]

The netstat parser has been enhanced to support the -r (routes) and -i (interfaces) options on both linux and macOS.

$ jc -p netstat -r
[
  {
    "destination": "default",
    "gateway": "router.local",
    "route_flags": "UGSc",
    "route_refs": 102,
    "use": 24,
    "iface": "en0",
    "kind": "route"
  },
  {
    "destination": "127",
    "gateway": "localhost",
    "route_flags": "UCS",
    "route_refs": 0,
    "use": 0,
    "iface": "lo0",
    "kind": "route"
  },
  ...
]

$ jc -p netstat -i
[
  {
    "iface": "lo0",
    "mtu": 16384,
    "network": "<Link#1>",
    "address": null,
    "ipkts": 1777797,
    "ierrs": 0,
    "opkts": 1777797,
    "oerrs": 0,
    "coll": 0,
    "kind": "interface"
  },
  {
    "iface": "lo0",
    "mtu": 16384,
    "network": "127",
    "address": "localhost",
    "ipkts": 1777797,
    "ierrs": null,
    "opkts": 1777797,
    "oerrs": null,
    "coll": null,
    "kind": "interface"
  },
  {
    "iface": "lo0",
    "mtu": 16384,
    "network": "localhost",
    "address": "::1",
    "ipkts": 1777797,
    "ierrs": null,
    "opkts": 1777797,
    "oerrs": null,
    "coll": null,
    "kind": "interface"
  },
  ...
]

The stat command is now supported on macOS.

$ jc -p stat jc*
[
  {
    "file": "jc-1.11.1-linux.sha256",
    "device": "16778221",
    "inode": 82163627,
    "flags": "-rw-r--r--",
    "links": 1,
    "user": "joeuser",
    "group": "staff",
    "rdev": 0,
    "size": 69,
    "access_time": "May 26 08:27:44 2020",
    "modify_time": "May 24 18:47:25 2020",
    "change_time": "May 24 18:51:21 2020",
    "birth_time": "May 24 18:47:25 2020",
    "block_size": 4096,
    "blocks": 8,
    "osx_flags": "0"
  },
  {
    "file": "jc-1.11.1-linux.tar.gz",
    "device": "16778221",
    "inode": 82163628,
    "flags": "-rw-r--r--",
    "links": 1,
    "user": "joeuser",
    "group": "staff",
    "rdev": 0,
    "size": 20226936,
    "access_time": "May 26 08:27:44 2020",
    "modify_time": "May 24 18:47:25 2020",
    "change_time": "May 24 18:47:25 2020",
    "birth_time": "May 24 18:47:25 2020",
    "block_size": 4096,
    "blocks": 39512,
    "osx_flags": "0"
  },
  ...
]

Schema Changes

There are no schema changes in this release.

Full Parser List

airport -I
airport -s
arp
blkid
crontab
crontab-u
CSV
df
dig
dmidecode
du
env
file
free
fstab
/etc/group
/etc/gshadow
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
/etc/passwd
pip list
pip show
ps
route
/etc/shadow
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
timedatectl
uname -a
uptime
w
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

JC Version 1.10.2 Released

Try the jc web demo!

I’m happy to announce the release of jc version 1.10.2 available on github and pypi. See below for more information on the new features.

To upgrade, run:

$ pip3 install --upgrade jc

New Features

jc now supports color output by default when printing to the terminal. Color is automatically disabled when piping to another program. The -m (monochrome) option can be used to disable color output to the terminal.

New Parsers

No new parsers in this release.

Updated Parsers

file command parser: minor fix for some edge cases
arp command parser: fix macOS detection for some edge cases
dig command parser: add axfr support

Schema Changes

The dig command parser now supports the axfr option. The schema has been updated to add this section:

$ jc -p dig @81.4.108.41 axfr zonetransfer.me
[
  {
    "axfr": [
      {
        "name": "zonetransfer.me.",
        "ttl": 7200,
        "class": "IN",
        "type": "SOA",
        "data": "nsztm1.digi.ninja. robin.digi.ninja. 2019100801 172800 900 1209600 3600"
      },
      {
        "name": "zonetransfer.me.",
        "ttl": 300,
        "class": "IN",
        "type": "HINFO",
        "data": "\"Casio fx-700G\" \"Windows XP\""
      },
      {
        "name": "zonetransfer.me.",
        "ttl": 301,
        "class": "IN",
        "type": "TXT",
        "data": "\"google-site-verification=tyP28J7JAUHA9fw2sHXMgcCC0I6XBmmoVi04VlMewxA\""
      },
      ...
    ],
    "query_time": 805,
    "server": "81.4.108.41#53(81.4.108.41)",
    "when": "Thu Apr 09 08:05:31 PDT 2020",
    "size": "50 records (messages 1, bytes 1994)"
  }
]

Full Parser List

airport -I
airport -s
arp
blkid
crontab
crontab-u
CSV
df
dig
du
env
file
free
fstab
/etc/group
/etc/gshadow
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
/etc/passwd
pip list
pip show
ps
route
/etc/shadow
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
timedatectl
uname -a
uptime
w
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

Jello: The JQ Alternative for Pythonistas

Built on jello:

– Jello Explorer (jellex): TUI interactive JSON filter using Python syntax

– jello web demo

I’m a big fan of using structured data at the command line. So much so that I’ve written a couple of utilities to promote JSON in the CLI:

jc to JSONify command line output of scores of commands and file-types
jtbl to convert JSON output into table format in the terminal

Typically I use jq to filter and process the JSON output into submission until I get what I want. But if you’re anything like me, you spend a lot of time googling how to do what you want in jq because the syntax can get a little out of hand. In fact, I keep notes with example jq queries I’ve used before in case I need those techniques again.

jq is great for simple things, but sometimes when I want to iterate through a deeply nested structure with arrays of objects I find python’s list and dictionary syntax easier to comprehend.

Hello `jello`

That’s why I created jello. jello works similarly to jq but uses the python interpreter, so you can iterate with loops, comprehensions, variables, expressions, etc. just like you would in a full-fledged python script.

The nice thing about jello is that it removes a lot of the boilerplate code you would need to ingest and output the JSON or JSON Lines data so you can focus on the logic.

Let’s take the following output from jc -ap:

$ jc -ap
{
  "name": "jc",
  "version": "1.9.2",
  "description": "jc cli output JSON conversion tool",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "parser_count": 50,
  "parsers": [
    {
      "name": "airport",
      "argument": "--airport",
      "version": "1.0",
      "description": "airport -I command parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "darwin"
      ],
      "magic_commands": [
        "airport -I"
      ]
    },
    {
      "name": "airport_s",
      "argument": "--airport-s",
      "version": "1.0",
      "description": "airport -s command parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "darwin"
      ],
      "magic_commands": [
        "airport -s"
      ]
    },
    ...
]

Let’s say I want a list of the parser names that are compatible with macOS. Here is a jq query that will get down to that level:

$ jc -a | jq '[.parsers[] | select(.compatible[] | contains("darwin")) | .name]' 
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

This is not too terribly bad, but you need to be careful about bracket and parenthesis placements. Here’s the same query in jello:

$ jc -a | jello '[parser.name for parser in _.parsers if "darwin" in parser.compatible]'
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

As you can see, jello gives you the JSON or JSON Lines input as a dictionary or list of dictionaries assigned to ‘_‘. Then you process it as you’d like using standard python syntax, with the convenience of dot notation. jello automatically takes care of slurping input and printing valid JSON or JSON Lines depending on the value of the last expression.

The example above is not quite as terse as using jq, but it’s more readable to someone who is familiar with python list comprehension. As with any programming language, there are multiple ways to skin a cat. We can also do a similar query with a for loop:

$ jc -a | jello '\
result = []
for parser in _.parsers:
  if "darwin" in parser.compatible:
    result.append(parser.name)
result'
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

Advanced JSON Processing

These are very simple examples and jq syntax might be ok here (though I prefer python syntax). But what if we try to do something more complex? Let’s take one of the advanced examples from the excellent jq tutorial by Matthew Lincoln.

Under Grouping and Counting, Matthew describes an advanced jq filter against a sample Twitter dataset that includes JSON Lines data. There he describes the following query:

“We can now create a table of users. Let’s create a table with columns for the user id, user name, followers count, and a column of their tweet ids separated by a semicolon.”
https://programminghistorian.org/en/lessons/json-and-jq

Here is the final jq query:

$ cat twitterdata.jlines | jq -s 'group_by(.user) | 
                                 .[] | 
                                 {
                                   user_id: .[0].user.id, 
                                   user_name: .[0].user.screen_name, 
                                   user_followers: .[0].user.followers_count, 
                                   tweet_ids: [.[].id | tostring] | join(";")
                                 }'
...
{
  "user_id": 47073035,
  "user_name": "msoltanm",
  "user_followers": 63,
  "tweet_ids": "619172275741298700"
}
{
  "user_id": 2569107372,
  "user_name": "SlavinOleg",
  "user_followers": 35,
  "tweet_ids": "501064198973960200;501064202794971140;501064214467731460;501064215759568900;501064220121632800"
}
{
  "user_id": 2369225023,
  "user_name": "SkogCarla",
  "user_followers": 10816,
  "tweet_ids": "501064217667960800"
}
{
  "user_id": 2477475030,
  "user_name": "bennharr",
  "user_followers": 151,
  "tweet_ids": "501064201503113200"
}
{
  "user_id": 42226593,
  "user_name": "shirleycolleen",
  "user_followers": 2114,
  "tweet_ids": "619172281294655500;619172179960328200"
}
...

This is a fantastic query! It’s actually deceptively simple looking – it takes quite a few paragraphs for Matthew to describe how it works and there are some tricky brackets, braces, and parentheses in there that need to be set just right. Let’s see how we could tackle this task with jello using standard python syntax:

$ cat twitterdata.jlines | jello -l '\
user_ids = set()
for tweet in _:
    user_ids.add(tweet.user.id)
result = []
for user in user_ids:
    user_profile = {}
    tweet_ids = []
    for tweet in _:
        if tweet.user.id == user:
            user_profile.update({
                "user_id": user,
                "user_name": tweet.user.screen_name,
                "user_followers": tweet.user.followers_count})
            tweet_ids.append(str(tweet.id))
    user_profile["tweet_ids"] = ";".join(tweet_ids)
    result.append(user_profile)
result'
...
{"user_id": 2696111005, "user_name": "EGEVER142", "user_followers": 1433, "tweet_ids": "619172303654518784"}
{"user_id": 42226593, "user_name": "shirleycolleen", "user_followers": 2114, "tweet_ids": "619172281294655488;619172179960328192"}
{"user_id": 106948003, "user_name": "MrKneeGrow", "user_followers": 172, "tweet_ids": "501064228627705857"}
{"user_id": 18270633, "user_name": "ahhthatswhy", "user_followers": 559, "tweet_ids": "501064204661850113"}
{"user_id": 14331818, "user_name": "edsu", "user_followers": 4220, "tweet_ids": "615973042443956225;618602288781860864"}
{"user_id": 2569107372, "user_name": "SlavinOleg", "user_followers": 35, "tweet_ids": "501064198973960192;501064202794971136;501064214467731457;501064215759568897;501064220121632768"}
{"user_id": 22668719, "user_name": "nodehyena", "user_followers": 294, "tweet_ids": "501064222772445187"}
...

So there’s 17 lines of python… again not as terse as jq, but for pythonistas this is probably a lot easier to understand what is going on. This is a pretty simple and naive implementation – there are probably much better approaches that are shorter, simpler, faster, etc. but the point is I can come back six months from now and understand what is going on if I need to debug or tweak it.

Just for fun, let’s pipe this result through jtbl to see what it looks like:

   user_id  user_name          user_followers  tweet_ids
----------  ---------------  ----------------  ----------------------------------------------------------------------------------------------
...
2481812382  SadieODoyle                    42  501064200035516416
2696111005  EGEVER142                    1433  619172303654518784
  42226593  shirleycolleen               2114  619172281294655488;619172179960328192
 106948003  MrKneeGrow                    172  501064228627705857
  18270633  ahhthatswhy                   559  501064204661850113
  14331818  edsu                         4220  615973042443956225;618602288781860864
2569107372  SlavinOleg                     35  501064198973960192;501064202794971136;501064214467731457;501064215759568897;501064220121632768
  22668719  nodehyena                     294  501064222772445187
  23598003  victoriasview                1163  501064228288364546
 851336634  20mUsa                      15643  50106414
...

Very cool! Find more examples at https://github.com/kellyjonbrazil/jello. I hope you find jello useful in your command line pipelines.

Try Jello Explorer and the jello web demo!

Featured

JC Version 1.9.0 Released

Try the jc web demo!

I’m happy to announce the release of jc version 1.9.0 available on github and pypi. See below for more information on the new features and parsers.

To upgrade, run:

$ pip3 install --upgrade jc

`jc` In The News!

The Linux Unplugged podcast gave a shoutout to jc on their February 18, 2020 episode for their App Pick segment. The discussion starts at 45:47. Go check out the podcast!

New Parsers

jc now includes 50 parsers! New parsers (tested on linux and OSX) include airport -I, airport -s, file, ntpq -p, and timedatectl commands.

Documentation and schemas for all parsers can be found here.

`airport -I` command parser

OSX support for the airport -I command:

$ airport -I | jc --airport -p          # or:  jc -p airport -I
{
  "agrctlrssi": -66,
  "agrextrssi": 0,
  "agrctlnoise": -90,
  "agrextnoise": 0,
  "state": "running",
  "op_mode": "station",
  "lasttxrate": 195,
  "maxrate": 867,
  "lastassocstatus": 0,
  "802_11_auth": "open",
  "link_auth": "wpa2-psk",
  "bssid": "3c:37:86:15:ad:f9",
  "ssid": "SnazzleDazzle",
  "mcs": 0,
  "channel": "48,80"
}

`airport -s` command parser

OSX support for the airport -s command.

$ airport -s | jc --airport-s -p          or: jc -p airport -s
[
  {
    "ssid": "DIRECT-4A-HP OfficeJet 3830",
    "bssid": "00:67:eb:2a:a7:3b",
    "rssi": -90,
    "channel": "6",
    "ht": true,
    "cc": "--",
    "security": [
      "WPA2(PSK/AES/AES)"
    ]
  },
  {
    "ssid": "Latitude38",
    "bssid": "c0:ff:d5:d2:7a:f3",
    "rssi": -85,
    "channel": "11",
    "ht": true,
    "cc": "US",
    "security": [
      "WPA2(PSK/AES/AES)"
    ]
  },
  {
    "ssid": "xfinitywifi",
    "bssid": "6e:e3:0e:b8:45:99",
    "rssi": -83,
    "channel": "11",
    "ht": true,
    "cc": "US",
    "security": [
      "NONE"
    ]
  },
  ...
]

`file` command parser

Linux and OSX support for the file command:

$ file * | jc --file -p          or:  jc -p file *
[
  {
    "filename": "Applications",
    "type": "directory"
  },
  {
    "filename": "another file with spaces",
    "type": "empty"
  },
  {
    "filename": "argstest.py",
    "type": "Python script text executable, ASCII text"
  },
  {
    "filename": "blkid-p.out",
    "type": "ASCII text"
  },
  {
    "filename": "blkid-pi.out",
    "type": "ASCII text, with very long lines"
  },
  {
    "filename": "cd_catalog.xml",
    "type": "XML 1.0 document text, ASCII text, with CRLF line terminators"
  },
  {
    "filename": "centosserial.sh",
    "type": "Bourne-Again shell script text executable, UTF-8 Unicode text"
  },
  ...
]

`ntpq` command parser

Linux support for the ntpq -p command.

$ ntpq -p | jc --ntpq -p          # or:  jc -p ntpq -p
[
  {
    "remote": "44.190.6.254",
    "refid": "127.67.113.92",
    "st": 2,
    "t": "u",
    "when": 1,
    "poll": 64,
    "reach": 1,
    "delay": 23.399,
    "offset": -2.805,
    "jitter": 2.131,
    "state": null
  },
  {
    "remote": "mirror1.sjc02.s",
    "refid": "216.218.254.202",
    "st": 2,
    "t": "u",
    "when": 2,
    "poll": 64,
    "reach": 1,
    "delay": 29.325,
    "offset": 1.044,
    "jitter": 4.069,
    "state": null
  }
]

`timedatectl` command parser

Linux support for the timedatectl command:

$ timedatectl | jc --timedatectl -p          # or:  jc -p timedatectl
{
  "local_time": "Tue 2020-03-10 17:53:21 PDT",
  "universal_time": "Wed 2020-03-11 00:53:21 UTC",
  "rtc_time": "Wed 2020-03-11 00:53:21",
  "time_zone": "America/Los_Angeles (PDT, -0700)",
  "ntp_enabled": true,
  "ntp_synchronized": true,
  "rtc_in_local_tz": false,
  "dst_active": true
}

Updated Parsers

No updated parsers in this release.

Schema Changes

There are no schema changes in this release.

Full Parser List

airport -I
airport -s
arp
blkid
crontab
crontab-u
CSV
df
dig
du
env
file
free
fstab
/etc/group
/etc/gshadow
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
ntpq
/etc/passwd
pip list
pip show
ps
route
/etc/shadow
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
timedatectl
uname -a
uptime
w
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

JSON Tables in the Terminal

The other day I was looking around for a simple command-line tool to print JSON and JSON Lines data to a table in the terminal. I found a few programs that can do it with some massaging of the data, like visidata, jt, and json-table, but these really didn’t meet my requirements.

I wanted to pipe JSON or JSON Lines data into a program and get a nicely formatted table with correct headers without any additional configuration or arguments. I also wanted it to automatically fit the terminal width and wrap or truncate the columns to fit the data with no complicated configuration. Basically, I just wanted it to “do the right thing” so I can view JSON data in a tabular format without any fuss.

I ended up creating a little command-line utility called jtbl that does exactly that:

$ cat cities.json | jtbl 
  LatD    LatM    LatS  NS      LonD    LonM    LonS  EW    City               State
------  ------  ------  ----  ------  ------  ------  ----  -----------------  -------
    41       5      59  N         80      39       0  W     Youngstown         OH
    42      52      48  N         97      23      23  W     Yankton            SD
    46      35      59  N        120      30      36  W     Yakima             WA
    42      16      12  N         71      48       0  W     Worcester          MA
    43      37      48  N         89      46      11  W     Wisconsin Dells    WI
    36       5      59  N         80      15       0  W     Winston-Salem      NC
    49      52      48  N         97       9       0  W     Winnipeg           MB

jtbl is simple and elegant. It just takes in piped JSON or JSON Lines data and prints a table. There’s only one option to turn on column truncation vs. wrapping columns if the terminal width is too narrow to display the complete table.

$ jtbl -h
jtbl:   Converts JSON and JSON Lines to a table

Usage:  <JSON Data> | jtbl [OPTIONS]

        -t  truncate data instead of wrapping if too long for the terminal width
        -v  version info
        -h  help

Here’s an example using a relatively slim terminal width of 75:

$ jc dig www.cnn.com | jq '.[].answer' | jtbl 
╒═════════════════╤═════════╤════════╤═══════╤═════════════════╕
│ name            │ class   │ type   │   ttl │ data            │
╞═════════════════╪═════════╪════════╪═══════╪═════════════════╡
│ www.cnn.com.    │ IN      │ CNAME  │   201 │ turner-tls.map. │
│                 │         │        │       │ fastly.net.     │
├─────────────────┼─────────┼────────┼───────┼─────────────────┤
│ turner-tls.map. │ IN      │ A      │    22 │ 151.101.189.67  │
│ fastly.net.     │         │        │       │                 │
╘═════════════════╧═════════╧════════╧═══════╧═════════════════╛

or with truncation enabled:

$ jc dig www.cnn.com | jq '.[].answer' | jtbl -t 
name                  class    type      ttl  data
--------------------  -------  ------  -----  --------------------
www.cnn.com.          IN       CNAME     219  turner-tls.map.fastl
turner-tls.map.fastl  IN       A          10  151.101.189.67

Here’s an example using it to print the result of an XML API query response, converted to JSON with jc, and filtered with jq:

$ curl -X GET --basic -u "testuser:testpassword" https://reststop.randomhouse.com/resources/works/19306 | jc --xml | jq '.work' | jtbl
╒═════════════╤══════════╤══════════╤════════════╤══════════════╤════════════╤═════════════╤══════════════╤══════════════╤════════════╕
│ authorweb   │ titles   │   workid │ @uri       │ onsaledate   │ series     │ titleAuth   │ titleSubti   │ titleshort   │ titleweb   │
│             │          │          │            │              │            │             │ tleAuth      │              │            │
╞═════════════╪══════════╪══════════╪════════════╪══════════════╪════════════╪═════════════╪══════════════╪══════════════╪════════════╡
│ BROWN, DAN  │          │    19306 │ https://re │ 2003-09-02   │ Robert Lan │ Angels & D  │ Angels & D   │ ANGELS & D   │ Angels & D │
│             │          │          │ ststop.ran │ T00:00:00-   │ gdon       │ emons : Da  │ emons :  :   │ EMON(LPTP)   │ emons      │
│             │          │          │ domhouse.c │ 04:00        │            │ n Brown     │  Dan Brown   │ (REI)(MTI)   │            │
│             │          │          │ om/resourc │              │            │             │              │              │            │
│             │          │          │ es/works/1 │              │            │             │              │              │            │
│             │          │          │ 9306       │              │            │             │              │              │            │
╘═════════════╧══════════╧══════════╧════════════╧══════════════╧════════════╧═════════════╧══════════════╧══════════════╧════════════╛

Again, with truncation enabled:

$ curl -X GET --basic -u "testuser:testpassword" https://reststop.randomhouse.com/resources/works/19306 | jc --xml | jq '.work' | jtbl -t
authorweb    titles      workid  @uri        onsaledate    series      titleAuth    titleSubti    titleshort    titleweb
-----------  --------  --------  ----------  ------------  ----------  -----------  ------------  ------------  ----------
BROWN, DAN                19306  https://re  2003-09-02    ROBERT LAN  Angels & D   Angels & D    ANGELS & D    Angels & D

I found that having the ability to quickly see the JSON data in a tabular, horizontal format can sometimes help me visualize ‘where I am’ in the data more easily than looking at long vertical lists of JSON.

I hope you enjoy it!

Featured

JC Version 1.8.0 Released

Try the jc web demo!

I’m excited to announce the release of jc version 1.8.0 available on github and pypi. See below for more information on the new features and parsers.

To upgrade, run:

$ pip3 install --upgrade jc

New Parsers

jc now includes 45 parsers! New parsers (tested on linux and OSX) include blkid, last, lastb, who, /etc/passwd files, /etc/shadow files, /etc/group files, /etc/gshadow files, and CSV files.

Documentation and schemas for all parsers can be found here.

`blkid` command parser

Linux support for the blkid command:

$ blkid | jc --blkid -p          # or:  jc -p blkid
[
  {
    "device": "/dev/sda1",
    "uuid": "05d927ab-5875-49e4-ada1-7f46cb32c932",
    "type": "xfs"
  },
  {
    "device": "/dev/sda2",
    "uuid": "3klkIj-w1kk-DkJi-0XBJ-y3i7-i2Ac-vHqWBM",
    "type": "LVM2_member"
  },
  {
    "device": "/dev/mapper/centos-root",
    "uuid": "07d718ff-950c-4e5b-98f0-42a1147c77d9",
    "type": "xfs"
  },
  {
    "device": "/dev/mapper/centos-swap",
    "uuid": "615eb89a-bcbf-46fd-80e3-c483ff5c931f",
    "type": "swap"
  }
]

$ sudo blkid -o udev -ip /dev/sda2 | jc --blkid -p          # or:  sudo jc -p blkid -o udev -ip /dev/sda2
[
  {
    "id_fs_uuid": "3klkIj-w1kk-DkJi-0XBJ-y3i7-i2Ac-vHqWBM",
    "id_fs_uuid_enc": "3klkIj-w1kk-DkJi-0XBJ-y3i7-i2Ac-vHqWBM",
    "id_fs_version": "LVM2\x20001",
    "id_fs_type": "LVM2_member",
    "id_fs_usage": "raid",
    "id_iolimit_minimum_io_size": 512,
    "id_iolimit_physical_sector_size": 512,
    "id_iolimit_logical_sector_size": 512,
    "id_part_entry_scheme": "dos",
    "id_part_entry_type": "0x8e",
    "id_part_entry_number": 2,
    "id_part_entry_offset": 2099200,
    "id_part_entry_size": 39843840,
    "id_part_entry_disk": "8:0"
  }
]

`last` and `lastb` command parsers

Linux and OSX support for the last command. Linux support for the lastb command.

$ last | jc --last -p          # or:  jc -p last
[
  {
    "user": "joeuser",
    "tty": "ttys002",
    "hostname": null,
    "login": "Thu Feb 27 14:31",
    "logout": "still logged in"
  },
  {
    "user": "joeuser",
    "tty": "ttys003",
    "hostname": null,
    "login": "Thu Feb 27 10:38",
    "logout": "10:38",
    "duration": "00:00"
  },
  {
    "user": "joeuser",
    "tty": "ttys003",
    "hostname": null,
    "login": "Thu Feb 27 10:18",
    "logout": "10:18",
    "duration": "00:00"
  },
  ...
]

$ sudo lastb | jc --last -p          # or:  sudo jc -p lastb
[
  {
    "user": "joeuser",
    "tty": "ssh:notty",
    "hostname": "127.0.0.1",
    "login": "Tue Mar 3 00:48",
    "logout": "00:48",
    "duration": "00:00"
  },
  {
    "user": "joeuser",
    "tty": "ssh:notty",
    "hostname": "127.0.0.1",
    "login": "Tue Mar 3 00:48",
    "logout": "00:48",
    "duration": "00:00"
  },
  {
    "user": "jouser",
    "tty": "ssh:notty",
    "hostname": "127.0.0.1",
    "login": "Tue Mar 3 00:48",
    "logout": "00:48",
    "duration": "00:00"
  }
]

`who` command parser

Linux and OSX support for the who command:

$ who | jc --who -p          # or:  jc -p who
[
  {
    "user": "joeuser",
    "tty": "ttyS0",
    "time": "2020-03-02 02:52"
  },
  {
    "user": "joeuser",
    "tty": "pts/0",
    "time": "2020-03-02 05:15",
    "from": "192.168.71.1"
  }
]

$ who -a | jc --who -p          # or:  jc -p who -a
[
  {
    "event": "reboot",
    "time": "Feb 7 23:31",
    "pid": 1
  },
  {
    "user": "joeuser",
    "writeable_tty": "-",
    "tty": "console",
    "time": "Feb 7 23:32",
    "idle": "old",
    "pid": 105
  },
  {
    "user": "joeuser",
    "writeable_tty": "+",
    "tty": "ttys000",
    "time": "Feb 13 16:44",
    "idle": ".",
    "pid": 51217,
    "comment": "term=0 exit=0"
  },
  {
    "user": "joeuser",
    "writeable_tty": "?",
    "tty": "ttys003",
    "time": "Feb 28 08:59",
    "idle": "01:36",
    "pid": 41402
  },
  {
    "user": "joeuser",
    "writeable_tty": "+",
    "tty": "ttys004",
    "time": "Mar 1 16:35",
    "idle": ".",
    "pid": 15679,
    "from": "192.168.1.5"
  }
]

CSV File Parser

Convert generic CSV files to JSON. The parser will attempt to automatically detect the delimiter character. If it cannot detect the delimiter it will use the comma (‘,‘) as the delimiter. The file must contain a header row as the first line:

$ cat homes.csv 
"Sell", "List", "Living", "Rooms", "Beds", "Baths", "Age", "Acres", "Taxes"
142, 160, 28, 10, 5, 3,  60, 0.28,  3167
175, 180, 18,  8, 4, 1,  12, 0.43,  4033
129, 132, 13,  6, 3, 1,  41, 0.33,  1471
...

$ cat homes.csv | jc --csv -p
[
  {
    "Sell": "142",
    "List": "160",
    "Living": "28",
    "Rooms": "10",
    "Beds": "5",
    "Baths": "3",
    "Age": "60",
    "Acres": "0.28",
    "Taxes": "3167"
  },
  {
    "Sell": "175",
    "List": "180",
    "Living": "18",
    "Rooms": "8",
    "Beds": "4",
    "Baths": "1",
    "Age": "12",
    "Acres": "0.43",
    "Taxes": "4033"
  },
  {
    "Sell": "129",
    "List": "132",
    "Living": "13",
    "Rooms": "6",
    "Beds": "3",
    "Baths": "1",
    "Age": "41",
    "Acres": "0.33",
    "Taxes": "1471"
  },
  ...
]

`/etc/passwd`, `/etc/shadow`, `/etc/group`, and `/etc/gshadow` file parsers

Convert /etc/passwd, /etc/shadow, /etc/group, and /etc/gshadow files to JSON format:

$ cat /etc/passwd | jc --passwd -p
[
  {
    "username": "nobody",
    "password": "*",
    "uid": -2,
    "gid": -2,
    "comment": "Unprivileged User",
    "home": "/var/empty",
    "shell": "/usr/bin/false"
  },
  {
    "username": "root",
    "password": "*",
    "uid": 0,
    "gid": 0,
    "comment": "System Administrator",
    "home": "/var/root",
    "shell": "/bin/sh"
  },
  {
    "username": "daemon",
    "password": "*",
    "uid": 1,
    "gid": 1,
    "comment": "System Services",
    "home": "/var/root",
    "shell": "/usr/bin/false"
  },
  ...
]

$ sudo cat /etc/shadow | jc --shadow -p
[
  {
    "username": "root",
    "password": "*",
    "last_changed": 18113,
    "minimum": 0,
    "maximum": 99999,
    "warn": 7,
    "inactive": null,
    "expire": null
  },
  {
    "username": "daemon",
    "password": "*",
    "last_changed": 18113,
    "minimum": 0,
    "maximum": 99999,
    "warn": 7,
    "inactive": null,
    "expire": null
  },
  {
    "username": "bin",
    "password": "*",
    "last_changed": 18113,
    "minimum": 0,
    "maximum": 99999,
    "warn": 7,
    "inactive": null,
    "expire": null
  },
  ...
]

$ cat /etc/group | jc --group -p
[
  {
    "group_name": "nobody",
    "password": "*",
    "gid": -2,
    "members": []
  },
  {
    "group_name": "nogroup",
    "password": "*",
    "gid": -1,
    "members": []
  },
  {
    "group_name": "wheel",
    "password": "*",
    "gid": 0,
    "members": [
      "root"
    ]
  },
  {
    "group_name": "certusers",
    "password": "*",
    "gid": 29,
    "members": [
      "root",
      "_jabber",
      "_postfix",
      "_cyrus",
      "_calendar",
      "_dovecot"
    ]
  },
  ...
]

$ cat /etc/gshadow | jc --gshadow -p
[
  {
    "group_name": "root",
    "password": "*",
    "administrators": [],
    "members": []
  },
  {
    "group_name": "adm",
    "password": "*",
    "administrators": [],
    "members": [
      "syslog",
      "joeuser"
    ]
  },
  ...
]

Updated Parsers

The ls parser now supports filenames that contain newline characters when using ls -l or ls -b. A warning message will be sent to stderr if newlines are detected and ls -l or ls -b are not used:

$ ls | jc --ls

jc:  Warning - Newline characters detected. Filenames probably corrupted. Use ls -l or -b instead.

[{"filename": "this file has"}, {"filename": "a newline inside"}, {"filename": "this file has"}, {"filename": "four contiguous newlines inside"}, ...]

The ls parser now supports multiple directory listings, globbing, and recursive listings.

$ ls -R | jc --ls
[{"filename": "centos-7.7"}, {"filename": "create_fixtures.sh"}, {"filename": "generic"}, {"filename": "osx-10.11.6"}, {"filename": "osx-10.14.6"}, ...]

Alternative “Magic” Syntax

jc now accepts a simplified syntax for most command parsers. Instead of piping the data into jc you can now also prepend “jc” to the command you would like to convert. Note that command aliases are not supported:

$ jc dig www.example.com
[{"id": 31113, "opcode": "QUERY", "status": "NOERROR", "flags": ["qr", "rd", "ra"], "query_num": 1, "answer_num": 1, "authority_num": 0, "additional_num": 1, "question": {"name": "www.example.com.", "class": "IN", "type": "A"}, "answer": [{"name": "www.example.com.", "class": "IN", "type": "A", "ttl": 35366, "data": "93.184.216.34"}], "query_time": 37, "server": "2600", "when": "Mon Mar 02 16:13:31 PST 2020", "rcvd": 60}]

You can also insert jc options before the command:

$ jc -pqd dig www.example.com
[
  {
    "id": 7495,
    "opcode": "QUERY",
    "status": "NOERROR",
    "flags": [
      "qr",
      "rd",
      "ra"
    ],
    "query_num": 1,
    "answer_num": 1,
    "authority_num": 0,
    "additional_num": 1,
    "question": {
      "name": "www.example.com.",
      "class": "IN",
      "type": "A"
    },
    "answer": [
      {
        "name": "www.example.com.",
        "class": "IN",
        "type": "A",
        "ttl": 36160,
        "data": "93.184.216.34"
      }
    ],
    "query_time": 40,
    "server": "2600",
    "when": "Mon Mar 02 16:15:21 PST 2020",
    "rcvd": 60
  }
]

Schema Changes

There are no schema changes in this release.

Full Parser List

arp
blkid
crontab
crontab-u
CSV
df
dig
du
env
free
fstab
/etc/group
/etc/gshadow
history
/etc/hosts
id
ifconfig
INI
iptables
jobs
last and lastb
ls
lsblk
lsmod
lsof
mount
netstat
/etc/passwd
pip list
pip show
ps
route
/etc/shadow
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
uname -a
uptime
w
who
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

Applying Orchestration and Choreography to Cybersecurity Automation

Imagine a world where most of your security stack seamlessly integrates with each other, has access to the latest threat intelligence from internal and external sources, and automatically mitigates the most severe incidents. Suspicious files found in emails get sent to the closest sandbox for detonation, where the hash and other IOCs are sent to endpoints, NGFWs, proxies, etc. to inoculate the organization, and then send all of the relevant information to the SOC as an incident ticket.

Many organizations can at least do the above with a Security Orchestration Automation and Response (SOAR) platform implementation. Several vendors offer this type of Orchestration platform, including Splunk (Phantom), Palo Alto Networks (Demisto), Fortinet (Cybersponse), and IBM (Resilient). These platforms have become mainstream within the past few years and with more and more cybersecurity professionals learning the python programming language it has become easier to implement and customize them. In fact, no programming experience is needed at all for many use cases since playbooks can be created and maintained with a graphical builder.

I’m a big fan of using Orchestration to automate workflows with playbooks – in fact I’ve written integrations for Phantom and Demisto, and FortiSOAR. But there is another automation paradigm that doesn’t get talked about as much in the cybersecurity realm: Choreography.

Orchestration

So we already have an idea of what Orchestration is: it’s a central repository of vendor integrations and associated actions that can be connected together in clever and novel ways to create playbooks. Playbooks are like scripts that run based on incoming events, schedules, or can even be run manually to automate repetitive tasks. This automation removes the human-error factor and can reduce the workload of the Security team.

The key piece about Orchestration is that it is centralized. There is typically a central server that has all of the vendor integration information and playbooks. Alarms, logs, alerts, etc. get sent to this server so it can act as the conductor and tell each security device in the stack what to do and when to do it.

This approach has pros and cons:

Pros:

Very flexible – you can make a playbook do almost anything you can think of
Can version control the playbooks in a central repository like git
Large libraries of vendor apps
Typically have a good user communities

Cons:

Can be brittle if APIs change, unsupported vendors are introduced, or if there are connectivity issues to the central Orchestrator
Vendor lock-in to a SOAR platform / not open source
Can require python programming experience to onboard an unsupported security service or to create a complex playbook

Let’s compare this to Choreography – the other, lesser-known automation paradigm available to us.

Choreography

Choreography? Where did that come from? Well, the concepts of Orchestration and Choreography come from the world of Service Oriented Architecture (SOA). SOA had some good ideas, but it didn’t really take off until it recently morphed and rebranded as Microservice Architecture. (Yes, this is an over-simplification for the scope of this post)

We almost take microservice architectures for granted now. Cloud application delivery and containerization of services are not as bleeding-edge as they were just a couple of years ago. We intuitively understand that microservices act independently yet are connected to other microservices to make up an application. The way these microservices are connected can be described as Orchestration or Choreography.

Now we are just extending the metaphor and considering each piece of our security stack as a ‘microservice’. For example, your NGFW, sandbox, email security gateway, NAC, Intel feed, etc. are all cybersecurity microservices that need to be configured to talk to one another to enable your cybersecurity ‘application’.

Distributed Automation with Choreography

In the case of Choreography, each of these security ‘microservices’ (or security appliances) knows what they are supposed to do by subscribing to one or more channels on a message bus. This bus allows the service to receive alerts and IOC information in near-real-time and then publish their results on one or more channels on that same bus. It’s almost like layer 7 multicast for you router geeks out there.

In this paradigm, there is no need for a central repository of rules or playbooks for many standard use-cases because the ‘fabric’ gets smarter as more and more different types of security services join. Unlike an orchestra, which follows the lead of the conductor, each service works independently based on its own configuration. Each service knows its own dance moves and works harmoniously in relation to the other services.

The Message Bus

How does this work in the real world?

There are a couple examples of the Choreography approach being used in the Cybersecurity realm. A proprietary implementation by Fortinet (disclaimer: I am a Fortinet employee) is called the “Security Fabric”.

Fortinet Security Fabric

Fortinet’s Security Fabric is a proprietary implementation that behaves like a message bus to learn about new Fortinet and Fabric Ready ecosystem partner appliances and services as soon as they connect to the fabric. These services are configured to connect to the Security Fabric and take appropriate action when a security incident is identified.

For example, after installing a FortiSandbox appliance and adding it to the Security Fabric, other Fortinet or “Fabric-Ready” partner appliances, such as the NGFW and Secure Email Gateway can send suspicious files they detect to the Security Fabric where the sandbox service is listening. The FortiSandbox, in turn, can publish the IOC results of the scans it performs to the Security Fabric so other Fortinet or Fabric-Ready partner appliances (e.g. NGFW, FortiGuard, FortiEDR) can ingest them and take appropriate action.

This is very powerful. As more services are connected to the Security Fabric, it gets smarter, more capable, and scales – automatically.

OpenDXL

Another open-source, multi-vendor example of a message bus being used for cybersecurity choreography is OpenDXL. OpenDXL was originally developed by McAfee, as a security-specific message bus, but it was open-sourced under the Organization for the Advancement of Structured Information Standards (OASIS) Open Cybersecurity Alliance (OCA) project. (Disclaimer: Fortinet is a sponsor of OCA) This project brings together the message bus concept to integrate multiple cybersecurity services using well-known formats like STIX2 to influence its ontology.

Some of the pros and cons of the Choreography approach:

Pros:

The ‘fabric’ automatically gets smarter and more capable as more security services are connected
No need for dozens of boilerplate playbooks
Open-source and proprietary options available
No reliance on a central conductor – less brittle to Orchestrator outages or misconfigurations.
Integrations “just work” together if they are part of the ecosystem

Cons:

Less granular control over automation workflows
Open-source options are still maturing
Typically, no central repository for service configurations

Which Way is the Best?

We know that automation will improve our security operations, but which approach is best? Since Orchestration and Choreography both have their own pros and cons that don’t overlap too much it probably makes sense to use both.

Choreography can reduce the amount of boilerplate playbooks you need to bootstrap your automation initiative, while Orchestration can be used to automate higher-level business or incident response workflows.

By applying the application architecture concepts of SOA and microservices to cybersecurity we can take security automation to the next level.

Featured

JC Version 1.7.1 Released

Try the jc web demo!

I’m happy to announce that jc version 1.7.1 has been released and is available on github and pypi. In addition to the new and updated parsers and features outlined below, some back-end code cleanup to improve performance along with minor bug fixes were completed.

To upgrade, run:

$ pip3 install --upgrade jc

New Parsers

jc now includes 37 parsers! New parsers (tested on linux and OSX) include id, crontab-u, INI, XML, and YAML:

`id` parser

Linux and OSX support for the id command:

$ id | jc --id -p
{
  "uid": {
    "id": 1000,
    "name": "joeuser"
  },
  "gid": {
    "id": 1000,
    "name": "joeuser"
  },
  "groups": [
    {
      "id": 1000,
      "name": "joeuser"
    },
    {
      "id": 10,
      "name": "wheel"
    }
  ],
  "context": {
    "user": "unconfined_u",
    "role": "unconfined_r",
    "type": "unconfined_t",
    "level": "s0-s0:c0.c1023"
  }
}

`crontab` files with user defined

Some crontab files contain the user field. In this case, use the new crontab-u parser:

$ cat /etc/crontab | jc --crontab-u -p
{
  "variables": [
    {
      "name": "MAILTO",
      "value": "root"
    },
    {
      "name": "PATH",
      "value": "/sbin:/bin:/usr/sbin:/usr/bin"
    },
    {
      "name": "SHELL",
      "value": "/bin/bash"
    }
  ],
  "schedule": [
    {
      "minute": [
        "5"
      ],
      "hour": [
        "10-11",
        "22"
      ],
      "day_of_month": [
        "*"
      ],
      "month": [
        "*"
      ],
      "day_of_week": [
        "*"
      ],
      "user": "root",
      "command": "/var/www/devdaily.com/bin/mk-new-links.php"
    },
    {
      "minute": [
        "30"
      ],
      "hour": [
        "4/2"
      ],
      "day_of_month": [
        "*"
      ],
      "month": [
        "*"
      ],
      "day_of_week": [
        "*"
      ],
      "user": "root",
      "command": "/var/www/devdaily.com/bin/create-all-backups.sh"
    },
    {
      "occurrence": "yearly",
      "user": "root",
      "command": "/home/maverick/bin/annual-maintenance"
    },
    {
      "occurrence": "reboot",
      "user": "root",
      "command": "/home/cleanup"
    },
    {
      "occurrence": "monthly",
      "user": "root",
      "command": "/home/maverick/bin/tape-backup"
    }
  ]
}

`INI` file parser

Convert generic INI files to JSON:

$ cat example.ini
[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no

$ cat example.ini | jc --ini -p
{
  "bitbucket.org": {
    "serveraliveinterval": "45",
    "compression": "yes",
    "compressionlevel": "9",
    "forwardx11": "yes",
    "user": "hg"
  },
  "topsecret.server.com": {
    "serveraliveinterval": "45",
    "compression": "yes",
    "compressionlevel": "9",
    "forwardx11": "no",
    "port": "50022"
  }
}

`XML` file parser

Convert generic XML files to JSON:

$ cat cd_catalog.xml 
<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  ...

$ cat cd_catalog.xml | jc --xml -p
{
  "CATALOG": {
    "CD": [
      {
        "TITLE": "Empire Burlesque",
        "ARTIST": "Bob Dylan",
        "COUNTRY": "USA",
        "COMPANY": "Columbia",
        "PRICE": "10.90",
        "YEAR": "1985"
      },
      {
        "TITLE": "Hide your heart",
        "ARTIST": "Bonnie Tyler",
        "COUNTRY": "UK",
        "COMPANY": "CBS Records",
        "PRICE": "9.90",
        "YEAR": "1988"
      },
  ...
}

`YAML` file parser

Convert YAML files to JSON – even files that contain multiple YAML documents:

$ cat istio-mtls-permissive.yaml 
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: "default"
spec:
  peers:
  - mtls: {}
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "default"
spec:
  host: "*.default.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

$ cat istio-mtls-permissive.yaml | jc --yaml -p
[
  {
    "apiVersion": "authentication.istio.io/v1alpha1",
    "kind": "Policy",
    "metadata": {
      "name": "default",
      "namespace": "default"
    },
    "spec": {
      "peers": [
        {
          "mtls": {}
        }
      ]
    }
  },
  {
    "apiVersion": "networking.istio.io/v1alpha3",
    "kind": "DestinationRule",
    "metadata": {
      "name": "default",
      "namespace": "default"
    },
    "spec": {
      "host": "*.default.svc.cluster.local",
      "trafficPolicy": {
        "tls": {
          "mode": "ISTIO_MUTUAL"
        }
      }
    }
  }
]

Updated Parsers

history parser now outputs line fields as integers
crontab parser bug fix for an issue that sometimes lost a row of data
Updated the compatibility information for du and history parsers

`version` Attribute Added

Python programmers can now call the __version__ attribute on all parsers when running them as modules.

>>> import jc.parsers.arp
>>> print(jc.parsers.arp.__version__)
1.1

Added Exit Codes

jc will now provide an exit code (1) if it did not successfully exit.

Schema Changes

The history parser now outputs line fields as integers

$ history | jc --history -p
[
  {
    "line": 118,
    "command": "sleep 100"
  },
  ...
]

Full Parser List

arp
crontab
crontab-u
df
dig
du
env
free
fstab
history
hosts
id
ifconfig
INI
iptables
jobs
ls
lsblk
lsmod
lsof
mount
netstat
pip list
pip show
ps
route
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
uname -a
uptime
w
XML
YAML

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

Microservice Security Design Patterns for Kubernetes (Part 5)

The Service Mesh Sidecar-on-Sidecar Pattern

In Part 4 of of my series on Microservice Security Patterns for Kubernetes we dove into the Sidecar Security Pattern and configured a working application with micro-segmentation enforcement and deep inspection for application-layer protection. The Sidecar Security Pattern is nice and clean, but what if you are running a Service Mesh like Istio with Envoy?

For a great overview of the state of the art in Service Mesh, see this article by Guillaume Dury. He provides a nice comparison between modern Service Mesh options.

In this post we will take the Sidecar Security Pattern from Part 4 and apply it in an Istio Service Mesh using Envoy sidecars. This is essentially a Sidecar-on-Sidecar Pattern that will allow us to not only use the native encryption and segmentation capabilities of the Service Mesh, but will allow us to layer on L7 application security for OWASP top 10 type of attacks against the microservices.

How does the Service Mesh Sidecar-on-Sidecar Pattern work?

It’s Sidecars All The Way Down

As we discussed in Part 4, you can have multiple containers in a Pod. We used the modsecurity container as a sidecar to intercept HTTP requests and inspect them before forwarding them on to the microsimserver container in the same pod. But with an Istio Service Mesh, there will also be an Envoy container injected into the Pod and it will do the egress and ingress traffic interception. Can we have two sidecars in a Pod?

The answer is yes. In the case of Envoy using the sidecar injection functionality, it configures itself based on the existing Pod spec in the deployment manifest. This means that we can use a manifest nearly identical to what we used in Part 4 and Envoy will correctly configure itself to send intercepted traffic on to the modsecurity container, which will then send the traffic to the microsimserver container.

In this post we will be demonstrating this in action. There are surprisingly few changes that need to be made to the Security Sidecar Pattern deployment file to make this work. Also, we’ll be able to easily see how this works using the Kiali dashboard which provides visualization for the Istio Service Mesh.

The Sidecar-on-Sidecar Pattern

We’ll be using this deployment manifest that is nearly identical to the Security Sidecar Pattern manifest from Part 4. Here is what the design looks like:

First we’ll enable service-to-service encryption, then strict mutual TLS (mTLS) with RBAC to provide micro-segmentation. Finally, we’ll configure Istio ingress gateway so we can access the app from the public internet.

But first, let’s just deploy the modified Sidecar Pattern manifest with a vanilla Istio configuration.

Spinning up the Cluster in GKE

We’ll spin up a kubernetes cluster in GKE similar to how we did previously in Part 2 except this time we’ll use 4 nodes of n1-standard-2 machine type instead of 3. Since we’ll be using Istio to control service-to-service traffic (East/West flows) we no longer need to check the Enable Network Policy box. Instead, we will need to check the Enable Istio (beta) box under Additional Features.

We’ll start with setting Enable mTLS (beta) to Permissive. We will change this later via configuration files as we try out some scenarios.

I’m not going to give a complete tutorial on how to complete the set up of Istio on GKE, but I basically used the instructions documented in the following links to enable Prometheus and Grafana. I used the same idea to enable the Kiali dashboard to visualize the Service Mesh. We’ll be using the Kiali service graphs to verify the status of the application.

Once you have Kiali enabled, you can configure port forwarding on the Service so you can browse to the dashboard using your laptop.

Click the https://ssh.cloud.google.com/devshell/proxy?port=8080 link and then append /kiali at the end of the translated link in your browser. You should see a login screen. Use the default credentials or the ones you specified with a kubernetes secret during setup. You should see a blank service graph:

Make sure to check the Security checkbox under the Display menu:

Finally, we want to enable automatic sidecar injection for the Envoy proxy by running this command within Cloud Shell:

$ kubectl label namespace default istio-injection=enabled

Alright! Now let’s deploy the app.

Deploying the Sidecar-on-Sidecar Manifest

There are only a few minor differences between the sidecar.yaml manifest used in Part 4 and the istio-sidecar.yaml that we will be using for the following examples. Let’s take a look:

Service Accounts

apiVersion: v1
kind: ServiceAccount
metadata:
  name: www
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: db
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: auth

First, we have added these ServiceAccount objects. This is what Istio uses to differentiate services within the mesh and affects how the certificates used in mTLS are generated. You’ll see how we bind these ServiceAccount objects to the Pods next.

Deployments

We’ll just take a look at the www Deployment since the same changes are required for all of the Deployments.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: www
spec:
  replicas: 3
  selector:
    matchLabels:
      app: www
  template:
    metadata:
      labels:
        app: www
        version: v1.0       # add version
    spec:
      serviceAccountName: www      # add serviceAccountName
      containers:
      - name: modsecurity
        image: owasp/modsecurity-crs:v3.2-modsec2-apache
        ports:
        - containerPort: 80
        env:
        - name: SETPROXY
          value: "True"
        - name: PROXYLOCATION
          value: "http://127.0.0.1:8080/"
      - name: microsimserver
        image: kellybrazil/microsimserver
        ports:
        - containerPort: 8080       # add microsimserver port
        env:
        - name: STATS_PORT
          value: "5000"
      - name: microsimclient
        image: kellybrazil/microsimclient
        env:
        - name: STATS_PORT
          value: "5001"
        - name: REQUEST_URLS
          value: "http://auth.default.svc.cluster.local:8080/,http://db.default.svc.cluster.local:8080/"
        - name: SEND_SQLI
          value: "True"

The only difference from the original sidecar.yaml is:

We have added a version label. Istio requires this label to be included.
We associated the Pods with the appropriate ServiceAccountName. This will be important for micro-segmentation later on.
We add the containerPort configuration for the microsimserver containers. This is important so the Envoy proxy sidecar can configure itself properly.

Services

Now let’s see the minor changes to the Services. Since they are all very similar, we will just take a look at the www Service:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: www
  name: www
spec:
  # externalTrafficPolicy: Local      # remove externalTrafficPolicy
  ports:
  - port: 8080
    targetPort: 80
    name: http         # add port name
  selector:
    app: www
  sessionAffinity: None
  # type: LoadBalancer          # remove LoadBalancer type

We have removed a couple of items from the www service: externalTrafficPolicy and type. This is because the www service is no longer directly exposed to the public internet. We’ll expose it later using an Istio Ingress Gateway.

Also, we have added the port name field. This is required so Istio can correctly configure Envoy to listen for the correct protocol and produce the correct telemetry for the inter-service traffic.

Deploy the App

Now let’s deploy the application using kubectl. Copy/paste the manifest to a file called istio-sidecar.yaml within Cloud Shell using vi. Then run:

$ kubectl apply -f istio-sidecar.yaml
serviceaccount/www created
serviceaccount/db created
serviceaccount/auth created
deployment.apps/www created
deployment.apps/auth created
deployment.apps/db created
service/www created
service/auth created
service/db created

After a couple of minutes you should see this within the Kiali dashboard:

Excellent! You’ll notice the services will alternate between green and orange. This is because the www service is sending SQLi attacks to the db and auth services every so often and those are being blocked with HTTP 403 errors being returned by the modsecurity WAF container.

Voila! We have application layer security in Istio!

But you may have noticed that there is no encryption between services enabled yet. Also, all services can talk to each other, so we don’t have proper micro-segmentation. We can illustrate that with a curl from auth to db:

$ kubectl exec auth-cf6f45fb-9k678 -c microsimserver curl http://db:8080
<snip>
sufH1FhoMgvXvbPOkE3O0H3MwNAN
Tue Jan 28 01:16:48 2020   hostname: db-55747d84d8-jlz7z   ip: 10.8.0.13   remote: 127.0.0.1   hostheader: 127.0.0.1:8080   path: /

Let’s fix these issues.

Encrypting the East/West Traffic

It is fairly easy to encrypt East/West traffic using Istio. First we’ll demonstrate permissive mTLS and then we’ll advance to strict mTLS with RBAC to enforce micro-segmentation.

Here’s what the manifest for this configuration looks like:

apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: "default"
spec:
  peers:
  - mtls: {}
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "default"
spec:
  host: "*.default.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

The Policy manifest specifies that all Pods in the default namespace will only accept encrypted requests using TLS. The DestinationRule manifest specifies how the client-side outbound connections are handled. Here we see that connections to any services in the default namespace will use TLS (*.default.svc.cluster.local) This effectively disables plaintext traffic between services in the namespace.

Copy/paste the manifest text to a file called istio-mtls-permissive.yaml. Then apply it with kubectl:

$ kubectl apply -f istio-mtls-permissive.yaml
policy.authentication.istio.io/default created
destinationrule.networking.istio.io/default created

After 30 seconds or so you should start to see the padlocks between the services in the Kiali Dashboard indicating that the communications are encrypted. (Ensure you checked the Security checkbox under the Display drop-down)

Nice! We have successfully encrypted traffic between our services.

Enforcing micro-segmentation

Even though the communications between services is now encrypted, we still don’t have effective micro-segmentation between Pods running the Envoy sidecar. We can test this again with a curl from an auth pod to a db pod:

$ kubectl exec auth-cf6f45fb-9k678 -c microsimserver curl http://db:8080
<snip>
2S76Q83lFt3eplRkAHoHkqUl1PhX
Tue Jan 28 03:47:03 2020   hostname: db-55747d84d8-9bhwx   ip: 10.8.1.5   remote: 127.0.0.1   hostheader: 127.0.0.1:8080   path: /

And here is the connection displayed in Kiali:

So the good news is that the connection is encrypted. The bad news is that auth shouldn’t be able to communicate with db. Let’s implement micro-segmentation.

The first step is to enforce strict mTLS and enable Role Based Access Control (RBAC) for the default namespace. First copy/paste the manifest to a file called istio-mtls-strict.yaml with vi. Let’s take a look at the configuration:

apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: "default"
  namespace: "default"
spec:
  peers:
  - mtls:
      mode: STRICT
---
apiVersion: "networking.istio.io/v1alpha3"
kind: "DestinationRule"
metadata:
  name: "default"
  namespace: "default"
spec:
  host: "*.default.svc.cluster.local"
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ClusterRbacConfig
metadata:
  name: default
spec:
  mode: 'ON_WITH_INCLUSION'
  inclusion:
    namespaces: ["default"]

The important bits here are:

Line 9: mode: STRICT in the Policy, which disallows any plaintext communications
Line 27: mode: 'ON_WITH_INCLUSION', which requires RBAC policies to be satisfied before allowing connections between services for the namespaces defined in line 29
Line 29: namespaces: ["default"], which are the namespaces that have the RBAC policies applied

Let’s apply this by deleting the old config and applying the new one:

$ kubectl delete -f istio-mtls-permissive.yaml
policy.authentication.istio.io "default" deleted
destinationrule.networking.istio.io "default" deleted

$ kubectl apply -f istio-mtls-strict.yaml
policy.authentication.istio.io/default created
destinationrule.networking.istio.io/default created
clusterrbacconfig.rbac.istio.io/default created

Hmm… the entire application is broken now. No worries – this is expected! We did this to illustrate that policies need to be explicitly defined to allow any service-to-service (East/West) communications.

Let’s add one service at a time to see these policies in action. Copy paste this manifest to a file called istio-rbac-policy-test.yaml with vi:

apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
  name: www-access-role
  namespace: default
spec:
  rules:
  - services: ["db.default.svc.cluster.local"]
    methods: ["GET", "POST"]
    paths: ["*"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
  name: www-to-db
  namespace: default
spec:
  subjects:
  - user: "cluster.local/ns/default/sa/www"
  roleRef:
    kind: ServiceRole
    name: "www-access-role"

Remember those serviceAccounts we created in the beginning? Now we are tying them to an RBAC policy. In this case we are allowing GET and POST requests to db.default.svc.cluster.local from Pods that offer client certificates identifying themselves as www.

The user field takes an entry in the form of cluster.local/ns/<namespace>/sa/<serviceAcountName>. In this case cluster.local/ns/default/sa/www refers to the www Service Account we created earlier.

Let’s apply this:

$ kubectl apply -f istio-rbac-policy-test.yaml
servicerole.rbac.istio.io/www-access-role created
servicerolebinding.rbac.istio.io/www-to-db created

It worked! www can now talk to db. Now we can fix auth by updating the policy to look like this:

spec:
  rules:
  - services: ["db.default.svc.cluster.local", "auth.default.svc.cluster.local"]

Let’s do that, plus allow the Istio Ingress Gateway service istio-ingressgateway-service-account to access www. This will allow public access to the service when we configure the Ingress Gateway later. Copy/paste this manifest to a file called istio-rbac-policy-final.yaml and apply it:

$ kubectl delete -f istio-rbac-policy-test.yaml
servicerole.rbac.istio.io "www-access-role" deleted
servicerolebinding.rbac.istio.io "www-to-db" deleted

$ kubectl apply -f istio-rbac-policy-final.yaml
servicerole.rbac.istio.io/www-access-role created
servicerolebinding.rbac.istio.io/www-to-db created
servicerole.rbac.istio.io/pub-access-role created
servicerolebinding.rbac.istio.io/pub-to-www created

Very good! We’re back up and running. Let’s verify that micro-segmentation is in place and that requests cannot get through even by using IP addresses instead of Service names. We’ll try connecting from an auth Pod to a db Pod:

$ kubectl exec auth-cf6f45fb-9k678 -c microsimserver curl http://db:8080
RBAC: access denied

$ kubectl exec auth-cf6f45fb-9k678 -c microsimserver curl 10.4.3.10:8080
upstream connect error or disconnect/reset before headers. reset reason: connection termination

Success!

Exposing the App to the Internet

Now that we have secured the app internally, we can expose it to the internet. If you try to visit the site now it will fail since the Istio Ingress has not been configured to forward traffic to the www service.

In Cloud Shell, copy/paste this manifest to a file called istio-ingress.yaml with vi:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: www-gateway
spec:
  selector:
    app: istio-ingressgateway
    istio: ingressgateway
    release: istio
  servers:
  - port:
      number: 80
      name: http2
      protocol: HTTP2
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: www-vservice
spec:
  hosts:
  - "*"
  gateways:
  - www-gateway
  http:
  - match:
    - uri:
        prefix: "/"
    route:
    - destination:
        port:
          number: 8080
        host: www.default.svc.cluster.local

Here we’re telling Istio Ingress to listen on port 80 using HTTP2 protocol and then we attach our www service to that gateway. We allowed the Ingress Gateway to communicate with the www service earlier via RBAC policy so we should be good to apply this:

$ kubectl apply -f istio-ingress.yaml
gateway.networking.istio.io/www-gateway created
virtualservice.networking.istio.io/www-vservice created

Now we should be able to reach the application from the internet:

$ kubectl get services -n istio-system
NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                                                                                                                                      AGE
grafana                  ClusterIP      10.70.12.231   <none>          3000/TCP                                                                                                                                     83m
istio-citadel            ClusterIP      10.70.2.197    <none>          8060/TCP,15014/TCP                                                                                                                           87m
istio-galley             ClusterIP      10.70.11.184   <none>          443/TCP,15014/TCP,9901/TCP                                                                                                                   87m
istio-ingressgateway     LoadBalancer   10.70.10.196   34.68.212.250   15020:30100/TCP,80:31596/TCP,443:32314/TCP,31400:31500/TCP,15029:32208/TCP,15030:31368/TCP,15031:31242/TCP,15032:31373/TCP,15443:30451/TCP   87m
istio-pilot              ClusterIP      10.70.3.210    <none>          15010/TCP,15011/TCP,8080/TCP,15014/TCP                                                                                                       87m
istio-policy             ClusterIP      10.70.4.74     <none>          9091/TCP,15004/TCP,15014/TCP                                                                                                                 87m
istio-sidecar-injector   ClusterIP      10.70.3.147    <none>          443/TCP                                                                                                                                      87m
istio-telemetry          ClusterIP      10.70.10.55    <none>          9091/TCP,15004/TCP,15014/TCP,42422/TCP                                                                                                       87m
kiali                    ClusterIP      10.70.15.2     <none>          20001/TCP                                                                                                                                    86m
prometheus               ClusterIP      10.70.7.187    <none>          9090/TCP                                                                                                                                     84m
promsd                   ClusterIP      10.70.8.70     <none>          9090/TCP     

$ curl 34.68.212.250
<snip>
ja1IO2Hm2GJAqKBPao2YyccDAVrd
Wed Jan 29 01:24:46 2020   hostname: www-74f9dc9df8-j54k4   ip: 10.4.3.9   remote: 127.0.0.1   hostheader: 127.0.0.1:8080   path: /

Excellent! Our simple App is secured internally and exposed to the Internet.

Conclusion

I really enjoyed this challenge and I see great potential in using a Service Mesh along with a security sidecar proxy like modsecurity. Though, I have to say that things are changing quickly, including the best practices and configuration syntax.

For example, in this proof of concept I used the default version of Istio that was installed on my GKE cluster (1.1.16) which already seems old since version 1.4 has deprecated the RBAC configuration I used for a new style called AuthorizationPolicy. Unfortunately, this option was not available in my version of Istio but it does look more straightforward than RBAC.

There is a great deal more complexity in a Service Mesh deployment and troubleshooting connectivity issues can be difficult.

One thing that would probably need to be addressed in a production environment would be the Envoy proxy sidecar configuration. In my simple scenario I was getting very strange connectivity results until I exposed port 8080 on the microsimserver container in the Deployment. Without that configuration (which worked fine without Istio) Envoy didn’t properly grab all of the ports, so it was possible to completely bypass Envoy altogether which meant broken micro-segmentation and WAF bypass when connecting directly to the Pod IP address.

There is a traffic management configuration called sidecar which allows you to fine-tune how the Envoy sidecar configures itself. Fortunately, I ended up not needing to do this in this example, though I did go through some iterations of experimenting with it to get micro-segmentation working without exposing port 8080 on the Pod.

So in the end, the Service Mesh Sidecar-on-Sidecar Pattern may work for you, but you might end up tearing out a fair bit of your hair getting it to work in your environment.

I’m looking forward to doing a proof of concept of the Service Mesh Security Plugin Pattern in the future, which will require compiling a custom version of Envoy that automatically filters traffic through modsecurity. I may let the versions of Istio and Envoy mature a bit before attempting that, though.

What do you think about the Sidecar-on-Sidecar Pattern?

Featured

Explaining Kubernetes to a Five Year Old

A friend of mine pointed me to a twitter thread on how to explain Kubernetes to a five year old. Since I have a two year old, this immediately popped into my head.

I’ve seen the Lonely Goatherd scene from The Sound of Music many a time – my daughter absolutely loves it. And it seems to be a fairly good explanation for Kubernetes. Hear me out:

Stage = Kubernetes Cluster

The stage is the Kubernetes cluster where the application is deployed. This includes the Nodes, environment, config maps, secrets, etc.

Puppets = Containers/Pods/Microservices

The puppets are the actual microservices made up of Pods and Containers.

Julie Andrews = DevOps

Julie Andrews (Maria) is the poor DevOps soul who is staving off disaster with kubectl, helm charts, APIs, etc.

Kids = Kubernetes Scheduler

The Kids are (mostly) doing what Julie (DevOps) is telling them to do. They are adding and removing the puppets (containers) as she has directed.

Audience = End Users

The Audience is the end users of the application… but let’s not kid ourselves – this app is not in production, so the audience is really QA. 🙂

Featured

Silly Terminal Plotting with jc, jq, and jp

I ran across a cool little utility called jp that takes JSON input and plots bar, line, histogram, and scatterplot graphs right in the terminal. I’m always looking for ways to hone my jq skills so I found some time to play around with it.

I figured it would be fun to plot some system stats like CPU and Memory utilization in the terminal, so I started with some simple plots piping jc, and jp together. For example, here’s a bar graph of the output of df:

df | jc --df | jp -type bar -canvas full-escape -x ..filesystem -y ..used

Not super useful, but we’re just having fun here! How about graphing the relative sizes of files in a directory using ls?

ls -l Documents/lab\ license/ | jc --ls | jp -type bar -canvas full-escape -x ..filename -y ..size

Not bad! Let’s get a little fancier by filtering results through jq. We’ll plot the output of ps to see the CPU utilization of processes with more than .5% CPU utilization:

ps axu | jc --ps | jq '[.[] | select (.cpu_percent > 0.5)]' | jp -type bar -canvas full-escape -x ..pid -y ..cpu_percent

That’s a nice static bar chart of the most active PIDs on the system. But we can do better. Let’s make the graph dynamic by enclosing the above in a while true loop:

while true; do ps axu | jc --ps | jq '[.[] | select (.cpu_percent > 0.5)]' | jp -type bar -canvas full-escape -x ..pid -y ..cpu_percent; sleep 3; done

Fancy! Of course we could have plotted mem_percent instead to plot memory utilization by PID. By the way, I made the animated GIF above using ttyrec and ttygif.

Ok, one last dynamic graph. This time, let’s track system load over time using the output of uptime. To pull this off we’ll need to keep a history of load values over time, so we’ll move from a one-liner to a small bash script:

#!/bin/bash

rm /tmp/load.json
SECONDS=0

while true; do 

    uptime | jc --uptime | jq --arg sec "$SECONDS" '{"seconds": $sec | tonumber, "load": .load_1m}' >> /tmp/load.json
    cat /tmp/load.json | jq -s . | jp -canvas full-escape -x ..seconds -y ..load
    sleep 2

done

Fun! We got to do a couple of neat things with jq here.

We pulled in the uptime output converted to JSON with jc and rebuilt the JSON to use only the load_1m value and the SECONDS environment variable. We used tonumber to convert the SECONDS variable into a number that could be plotted by jp. We redirect the output to a temporary text file called /tmp/load.json so jp can read it later and build out the line graph.

I know, I know – I’m piping cat output into jq but I just wanted to make the script readable. The interesting thing here is that we are using the -s or “slurp” option of jq, which essentially reformats the JSON lines output in /tmp/load.json into a proper JSON array so jp can consume it.

By the way, the graphs animate a little nicer in real life since you don’t get the artificial delay between frames you see in the animated GIF.

I thought that was pretty fun and I got to try a couple different things in jq I haven’t tried before. Happy JSON plotting!

Featured

Microservice Security Design Patterns for Kubernetes (Part 4)

The Security Sidecar Pattern

In Part 3 of my series on Microservice Security Patterns for Kubernetes we dove into the Security Service Layer Pattern and configured a working application with micro-segmentation enforcement and deep inspection for application-layer protection. We were able to secure the application with that configuration, but, as we saw, the micro-segmentation configuration can get a bit unwieldy when you have more than a couple services.

In this post we’ll configure a Security Sidecar Pattern which will provide the same level of security but with a simpler configuration. I really like the Security Sidecar Pattern because it tightly couples the application security layer with the application without requiring any changes to the application.

This also means you can scale the application and your security together, so you don’t have to worry about scaling the security layer separately as your application needs grow. The only downside to this is that the application security layer (we’ll be using the Modsecurity WAF) may be overprovisioned and could waste cluster resources if not kept in check.

Let’s find out how the Security Sidecar Pattern works.

Sidecar where art thou?

One of the really cool things about Kubernetes is that the smallest workload unit is a Pod and a Pod can be made up of multiple containers. Even better, these containers share the loopback network interface. (127.0.0.1) This means you can communicate between containers using normal network protocols without needing to expose these ports to the rest of the cluster.

In practice, what this means is that you can deploy a reverse proxy, such as the one we have been using in Part 3, but instead of setting the origin server as the Kubernetes cluster DNS name of the service, we can just use localhost or 127.0.0.1. Pretty neat!

Sidecar Injection

Another cool thing about Pods is that there are multiple ways to define how the containers within the Pod are defined. In the most basic scenario (and the one we will be deploying in this post) you can simply manually define the application and the WAF container in the Deployment YAML.

But there are fancier ways to automatically inject a sidecar container, like the WAF, by using Mutating Webhooks. Some examples of how this can be done can be found here and here. The nice thing about automatic sidecar injection is that the developers or DevOps team can define their Deployment YAML per usual and the sidecar will be injected without them needing to change their process. Automatic application layer protection!

One more thing about automatic sidecar injection – this is how the Envoy dataplane proxy sidecar is typically injected in an Istio Service Mesh deployment. Istio has its own sidecar injection service, but you can also manually configure the Envoy sidecar if you would like.

The Security Sidecar Pattern

Let’s dive in and see how to configure the Security Sidecar Pattern. We will be using the same application that we set up in Part 2, so go ahead and take a look there to refresh your memory on how things are set up. Here is the diagram:

Figure 1: Insecure Application

As demonstrated before, all microsim services can communicate with each other and there is no deep inspection implemented to block application layer attacks like SQLi. In this post, we will be implementing this sidecar.yaml deployment that adds modsecurity reverse proxy WAF containers with the Core Rule Set as sidecars in front of the microsim services. modsecurity will perform deep inspection on the JSON/HTTP traffic and block application layer attacks.

Then we will add on a Kubernetes Network Policy to enforce segmentation between the services.

Security Sidecar Pattern Deployment Spec

We’ll immediately notice how much smaller and simpler the Security Sidecar Pattern configuration is compared to the Security Service Layer Pattern. We went from 238 lines of configuration down to 142!

Instead of creating separate security deployments and services to secure the application like we did in the Security Service Layer Pattern, we will simply add the WAF container to the same Pod as the application. We will need to make sure the WAF and the application listen on different TCP Ports since they share the loopback interface which doesn’t allow overlapping ports.

In this case, the WAF will become the front-end and will be listening on behalf of the application and will forward on the clean, inspected traffic to the application via the loopback interface. We will only need to expose the WAF listening port to the cluster. Since we don’t want to allow bypassing the WAF we don’t want to expose the application port directly any longer.

Note: Container TCP and UDP ports are still accessible via IP within the Kubernetes cluster even if they are not explicitly configured in the deployment YAML via containerPort configuration. To completely lock down direct access to the application TCP port so the WAF cannot be bypassed we will need to configure Network Policy.

Figure 2: Security Sidecar Pattern

Let’s take a closer look at the spec.

`www` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: www
spec:
  replicas: 3
  selector:
    matchLabels:
      app: www
  template:
    metadata:
      labels:
        app: www
    spec:
      containers:
      - name: modsecurity
        image: owasp/modsecurity-crs:v3.2-modsec2-apache
        ports:
        - containerPort: 80
        env:
        - name: SETPROXY
          value: "True"
        - name: PROXYLOCATION
          value: "http://127.0.0.1:8080/"
      - name: microsimserver
        image: kellybrazil/microsimserver
        env:
        - name: STATS_PORT
          value: "5000"
      - name: microsimclient
        image: kellybrazil/microsimclient
        env:
        - name: STATS_PORT
          value: "5001"
        - name: REQUEST_URLS
          value: "http://auth.default.svc.cluster.local:8080/,http://db.default.svc.cluster.local:8080/"
        - name: SEND_SQLI
          value: "True"

We see three replicas of the www pods that are made up of both the official OWASP modsecurity container available on Docker Hub configured as a reverse proxy WAF listening on TCP port 80. The microsimserver application container listening on TCP port 8080 remains unchanged. Note that it is important that services listen on different ports since they are sharing the same loopback interface in the Pod.

All requests that go to the WAF containers will be inspected and proxied to the microsimserver application container within the same Pod at http://127.0.0.1:8080/.

These WAF containers are effectively impersonating the original service so the user or application does not need to modify its configuration. One nice thing about this design is that it allows you to scale the security layer along with the application, so as you scale up the application, security scales along with it automatically.

The microsimclient container configuration remains unchanged from the original, which is nice. This shows that you can implement the Security Sidecar Pattern with little to no application logic changes if you are careful about how you set up the ports.

Now, let’s take a look at the www Service that points to this deployment.

`www` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: www
  name: www
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 8080
    targetPort: 80
  selector:
    app: www
  sessionAffinity: None
  type: LoadBalancer

Here we are just forwarding TCP port 8080 application traffic to TCP port 80 on the www Pods since that is the port the modsecurity reverse proxy containers listen on. Since this is an externally facing service we are using type: LoadBalancer and externalTrafficPolicy: Local just like the original Service did.

Next we’ll take a look at the internal microservices. Since the auth and db deployments and services are configured identically we’ll just go over the db configuration.

`db` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: modsecurity
        image: owasp/modsecurity-crs:v3.2-modsec2-apache
        ports:
        - containerPort: 80
        env:
        - name: SETPROXY
          value: "True"
        - name: PROXYLOCATION
          value: "http://127.0.0.1:8080/"
      - name: microsimserver
        image: kellybrazil/microsimserver
        env:
        - name: STATS_PORT
          value: "5000"

Again, we have just added the modsecurity WAF container to the Pod listening on TCP Port 80. Since this is different than the listening port of the microsimserver container we are good to go without any changes to the app. Just like on the www Deployment, we have configured the modsecurity reverse proxy to send inspected traffic locally within the Pod to http://127.0.0.1:8080/.

Note that even though we aren’t explicitly configuring the microsimserver TCP port 8080 via containerPort in the Deployment spec, this port is still technically available on the cluster via direct IP access. To fully lock down connectivity, we will be using Network Policy later on.

`db` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: db
  name: db
spec:
  ports:
  - port: 8080
    targetPort: 80
  selector:
    app: db
  sessionAffinity: None

Nothing fancy here – just listening on TCP port 8080 and forwarding to port 80, which is what the modsecurity WAF containers listen on. This is an internal service so no need for type: LoadBalancer or externalTrafficPolicy: Local.

Now that we understand how the Deployment and Service specs work, let’s apply them on our Kubernetes cluster.

See Part 2 for more information on setting up the cluster.

Applying the Deployments and Services

First, let’s delete the original insecure deployment in Cloud Shell if it is still running:

$ kubectl delete -f simple.yaml

Your Pods, Deployments, and Services should be empty before you proceed:

$ kubectl get pods
No resources found.

$ kubectl get deploy
No resources found.

$ kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.12.0.1    <none>        443/TCP   3m46s

Next, copy/paste the deployment text into a file called sidecar.yaml using vi. Then apply the deployment with kubectl:

$ kubectl create -f sidecar.yaml
deployment.apps/www created
deployment.apps/auth created
deployment.apps/db created
service/www created
service/auth created
service/db created

Testing the Deployment

Once the www service has an external IP, you can send an HTTP GET or POST request to it from Cloud Shell or your laptop:

$ kubectl get services
NAME         TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)          AGE
auth         ClusterIP      10.12.7.96    <none>          8080/TCP         90m
db           ClusterIP      10.12.8.118   <none>          8080/TCP         90m
kubernetes   ClusterIP      10.12.0.1     <none>          443/TCP          93m
www          LoadBalancer   10.12.14.67   35.238.35.208   8080:32032/TCP   90m

$ curl 35.238.35.208:8080
...vME2NtSGaTBnt2zsprKdes5KKXCCAG9pk0yUr4K
Thu Jan  9 22:09:27 2020   hostname: www-5bfc744996-tdzsk   ip: 10.8.2.3   remote: 127.0.0.1   hostheader: 127.0.0.1:8080   path: /

The originating IP address is now the IP address of the local WAF in the Pod that handled the request. (always 127.0.0.1, since it is a sidecar). Since the WAF is deployed as a reverse proxy, the only way to get the originating IP information will be via HTTP headers, such as X-Forwarded-For (XFF). Also, the host header has now changed, so keep this in mind if the application is expecting certain values in the headers.

We can do a quick check to see if the modsecurity WAF is inspecting traffic by sending an HTTP POST request to an IP address with no data or size information. This will be seen as an anomalous request and blocked:

$ curl -X POST 35.238.35.208:8080
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>

Excellent! Now let’s take a look at the microsim stats to see if the WAF layers are blocking the East/West SQLi attacks. Let’s open two tabs in Cloud Shell: one for shell access to a www microsimclient container and another for shell access to a db microsimserver container.

In the first tab, use kubectl to find the name of one of the www pods and shell into the microsimclient container running in it:

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
auth-7559599f89-d8tnw   2/2     Running   0          102m
auth-7559599f89-k8qht   2/2     Running   0          102m
auth-7559599f89-wfbp4   2/2     Running   0          102m
db-59f8d84df-4kbvg      2/2     Running   0          102m
db-59f8d84df-5csh8      2/2     Running   0          102m
db-59f8d84df-ncksp      2/2     Running   0          102m
www-5bfc744996-6jbr7    3/3     Running   0          102m
www-5bfc744996-bgh9h    3/3     Running   0          102m
www-5bfc744996-tdzsk    3/3     Running   0          102m

$ kubectl exec www-5bfc744996-6jbr7 -c microsimclient -it sh
/app #

Then curl to the microsimclient stats server on localhost:5001:

/app # curl localhost:5001
{
  "time": "Thu Jan  9 22:23:25 2020",
  "runtime": 6349,
  "hostname": "www-5bfc744996-6jbr7",
  "ip": "10.8.0.4",
  "stats": {
    "Requests": 6320,
    "Sent Bytes": 6547520,
    "Received Bytes": 112275897,
    "Internet Requests": 0,
    "Attacks": 64,
    "SQLi": 64,
    "XSS": 0,
    "Directory Traversal": 0,
    "DGA": 0,
    "Malware": 0,
    "Error": 0
  },
  "config": {
    "STATS_PORT": 5001,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "REQUEST_URLS": "http://auth.default.svc.cluster.local:8080/,http://db.default.svc.cluster.local:8080/",
    "REQUEST_INTERNET": false,
    "REQUEST_MALWARE": false,
    "SEND_SQLI": true,
    "SEND_DIR_TRAVERSAL": false,
    "SEND_XSS": false,
    "SEND_DGA": false,
    "REQUEST_WAIT_SECONDS": 1.0,
    "REQUEST_BYTES": 1024,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0,
    "REQUEST_PROBABILITY": 1.0,
    "EGRESS_PROBABILITY": 0.1,
    "ATTACK_PROBABILITY": 0.01
  }
}

Here we see 64 SQLi attacks have been sent to the auth and db services in the last 6349 seconds.

Now, let’s see if the attacks are getting through like they did in the insecure deployment. In the other tab, find the name of one of the db pods and shell into the microsimserver container running in it:

$ kubectl exec db-59f8d84df-4kbvg -c microsimserver -it sh
/app #

/app # curl localhost:5000
{
  "time": "Thu Jan  9 22:39:30 2020",
  "runtime": 7316,
  "hostname": "db-59f8d84df-4kbvg",
  "ip": "10.8.0.5",
  "stats": {
    "Requests": 3659,
    "Sent Bytes": 60563768,
    "Received Bytes": 3790724,
    "Attacks": 0,
    "SQLi": 0,
    "XSS": 0,
    "Directory Traversal": 0
  },
  "config": {
    "LISTEN_PORT": 8080,
    "STATS_PORT": 5000,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "RESPOND_BYTES": 16384,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0
  }

In the insecure deployment we saw the SQLi value incrementing. Now that the modsecurity WAF is inspecting the East/West traffic, the SQLi attacks are no longer getting through, though we still see normal Requests, Sent Bytes, and Received Bytes incrementing.

`modsecurity` Logs

Now, let’s check the modsecurity logs to see how the East/West application attacks are being identified. To see the modsecurity audit log we’ll need to shell into one of the WAF containers and look at the /var/log/modsec_audit.log file:

$ kubectl exec db-59f8d84df-4kbvg -c modsecurity -it sh
# grep -C 60 sql /var/log/modsec_audit.log
<snip>
--a05a312e-A--
[09/Jan/2020:23:41:46 +0000] Xhe6OmUpgBRl4hgX8QIcmAAAAIE 10.8.0.4 50990 10.8.0.5 80
--a05a312e-B--
GET /?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1-- HTTP/1.1
Host: db.default.svc.cluster.local:8080
User-Agent: python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

--a05a312e-F--
HTTP/1.1 403 Forbidden
Content-Length: 209
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1

--a05a312e-E--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>

--a05a312e-H--
Message: Warning. Pattern match "(?i:(?:[\"'`](?:;?\\s*?(?:having|select|union)\\b\\s*?[^\\s]|\\s*?!\\s*?[\"'`\\w])|(?:c(?:onnection_id|urrent_user)|database)\\s*?\\([^\\)]*?|u(?:nion(?:[\\w(\\s]*?select| select @)|ser\\s*?\\([^\\)]*?)|s(?:chema\\s*?\\([^\\)]*?|elect.*?\\w?user\\()|in ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "190"] [id "942190"] [msg "Detects MSSQL code execution and information gathering attempts"] [data "Matched Data: UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]
Message: Warning. Pattern match "(?i:(?:^[\\W\\d]+\\s*?(?:alter\\s*(?:a(?:(?:pplication\\s*rol|ggregat)e|s(?:ymmetric\\s*ke|sembl)y|u(?:thorization|dit)|vailability\\s*group)|c(?:r(?:yptographic\\s*provider|edential)|o(?:l(?:latio|um)|nversio)n|ertificate|luster)|s(?:e(?:rv(?:ice|er)| ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "471"] [id "942360"] [msg "Detects concatenated basic SQL injection and SQLLFI attempts"] [data "Matched Data: ;UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]
Message: Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "91"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total Score: 10)"] [severity "CRITICAL"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"]
Message: Warning. Operator GE matched 5 at TX:inbound_anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "86"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 10 - SQLI=10,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 10, 0, 0, 0"] [tag "event-correlation"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.0.4] ModSecurity: Warning. Pattern match "(?i:(?:[\\\\"'`](?:;?\\\\\\\\s*?(?:having|select|union)\\\\\\\\b\\\\\\\\s*?[^\\\\\\\\s]|\\\\\\\\s*?!\\\\\\\\s*?[\\\\"'`\\\\\\\\w])|(?:c(?:onnection_id|urrent_user)|database)\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?|u(?:nion(?:[\\\\\\\\w(\\\\\\\\s]*?select| select @)|ser\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?)|s(?:chema\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?|elect.*?\\\\\\\\w?user\\\\\\\\()|in ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "190"] [id "942190"] [msg "Detects MSSQL code execution and information gathering attempts"] [data "Matched Data: UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "Xhe6OmUpgBRl4hgX8QIcmAAAAIE"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.0.4] ModSecurity: Warning. Pattern match "(?i:(?:^[\\\\\\\\W\\\\\\\\d]+\\\\\\\\s*?(?:alter\\\\\\\\s*(?:a(?:(?:pplication\\\\\\\\s*rol|ggregat)e|s(?:ymmetric\\\\\\\\s*ke|sembl)y|u(?:thorization|dit)|vailability\\\\\\\\s*group)|c(?:r(?:yptographic\\\\\\\\s*provider|edential)|o(?:l(?:latio|um)|nversio)n|ertificate|luster)|s(?:e(?:rv(?:ice|er)| ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "471"] [id "942360"] [msg "Detects concatenated basic SQL injection and SQLLFI attempts"] [data "Matched Data: ;UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "Xhe6OmUpgBRl4hgX8QIcmAAAAIE"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.0.4] ModSecurity: Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "91"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total Score: 10)"] [severity "CRITICAL"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "Xhe6OmUpgBRl4hgX8QIcmAAAAIE"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.0.4] ModSecurity: Warning. Operator GE matched 5 at TX:inbound_anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "86"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 10 - SQLI=10,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 10, 0, 0, 0"] [tag "event-correlation"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "Xhe6OmUpgBRl4hgX8QIcmAAAAIE"]
Action: Intercepted (phase 2)
Apache-Handler: proxy-server
Stopwatch: 1578613306195047 3522 (- - -)
Stopwatch2: 1578613306195047 3522; combined=2944, p1=904, p2=1734, p3=0, p4=0, p5=306, sr=353, sw=0, l=0, gc=0
Response-Body-Transformed: Dechunked
Producer: ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/); OWASP_CRS/3.2.0.
Server: Apache
Engine-Mode: "ENABLED"

--a05a312e-Z--

Here we see modsecurity has blocked and logged the East/West SQLi attack from one of the www Pods to a db Pod. Sweet!

Yet, we’re still not done. Even though we are now inspecting and protecting traffic at the application layer, we are not yet enforcing micro-segmentation between the services. That means that, even with the WAFs in place, any auth Pod can communicate with any db Pod. We can demonstrate this by opening a shell on any auth microsimserver container and attempting to send a request to a db Pod from it:

/app # curl 'http://db:8080'
...JsHT4A8GK8H0Am47jSG7MppM3o7BOlTrRZl4EEA9bNzsjND
Thu Jan  9 23:57:54 2020   hostname: db-59f8d84df-5csh8   ip: 10.8.2.5   remote: 127.0.0.1   hostheader: 127.0.0.1:8080   path: /

Even worse, if I know the IP address of the db pod, I can even bypass the WAF and send a successful SQLi attack:

/app # curl 'http://10.8.2.5:8080/?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--'
...7Z7Kw2JxEgXipBnDZyyoZI4TK3RswBuZ509y2WY1wJTsERJFoRW6ZYY1QiA
Fri Jan 10 00:01:37 2020   hostname: db-59f8d84df-5csh8   ip: 10.8.2.5   remote: 10.8.2.4   hostheader: 10.8.2.5:8080   path: /?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--

Not good! Now, let’s add Network Policy to provide micro-segmentation and button this thing up.

Adding Micro-segmentation

Here is a simple Network Policy spec that will control the ingress to each internal service. I tried to keep the rules simple, but in a production deployment a tighter policy would likely be desired. For example, you would probably also want to include Egress policies.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: auth-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: auth
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: www
    to:
    ports:
    - protocol: TCP
      port: 80
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: db
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: www
    to:
    ports:
    - protocol: TCP
      port: 80

Another big difference here is the simplicity of the Network Policy when compared to the Security Service Layer Pattern. We went from 104 lines of configuration down to 41.

This policy is says:

On the auth Pods only accept traffic from the www Pods that is destined to TCP port 80
On the db Pods only accept traffic from the www Pods that is destined to TCP port 80

Let’s try it out. Copy the Nework Policy text to a file named sidecar-network-policy.yaml in vi and apply the Network Policy to the cluster with kubectl:

$ kubectl create -f sidecar-network-policy.yaml
networkpolicy.networking.k8s.io/auth-ingress created
networkpolicy.networking.k8s.io/db-ingress created

Next, let’s try that simulated SQLi attack again from auth to db:

$ kubectl exec auth-7559599f89-d8tnw -c microsimserver -it sh
/app #

/app # curl 'http://10.8.2.5:8080/?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--'
curl: (7) Failed to connect to 10.8.2.5 port 8080: Operation timed out

Good stuff – no matter how you try to connect from auth to db it will now fail.

Finally, let’s ensure that the rest of the application is still working correctly by checking the db logs. If we are still getting legitimate requests then we should be good to go:

$ kubectl logs -f db-59f8d84df-4kbvg microsimserver
<snip>
127.0.0.1 - - [10/Jan/2020 00:27:57] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [10/Jan/2020 00:27:58] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [10/Jan/2020 00:27:59] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [10/Jan/2020 00:28:02] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [10/Jan/2020 00:28:04] "POST / HTTP/1.1" 200 -
{"Total": {"Requests": 6987, "Sent Bytes": 115648879, "Received Bytes": 7235424, "Attacks": 1, "SQLi": 1, "XSS": 0, "Directory Traversal": 0}, "Last 30 Seconds": {"Requests": 15, "Sent Bytes": 248280, "Received Bytes": 15540, "Attacks": 0, "SQLi": 0, "XSS": 0, "Directory Traversal": 0}}
127.0.0.1 - - [10/Jan/2020 00:28:04] "POST / HTTP/1.1" 200 -

The service is still getting requests with the Network Policy in place. We can even see the test SQLi request we sent earlier when we bypassed the WAF, but no SQLi attacks are seen since the Network Policy was applied.

Conclusion

We have successfully secured the intra-cluster service communication (East/West communications) via micro-segmentation and WAF utilizing the Sidecar Security Pattern. This pattern is great for quickly and easily adding security to your cluster without creating a lot of overhead for the developers or DevOps teams. The configuration is also smaller and simpler than the Security Service Layer Pattern. It is also possible to automate the injection of the security sidecar with Mutating Webhooks. The nice thing about this pattern is that the security layer scales alongside the application automatically, though one downside to this pattern is that you could waste cluster resources if the WAF containers are not being fully utilized.

What’s next?

My goal is to demonstrate the Service Mesh Security Plugin Pattern in a future post. There are a couple of commercial and open source projects that provide this option, but it’s still early days in this space. In my opinion this pattern makes the most sense since it tightly integrates security with the cluster and cleanly provides both micro-segmentation and application layer security as code, which is the direction everything is moving.

I’m also looking at implementing a Security Sidecar Pattern in conjunction with Istio Service Mesh. This is effectively a Sidecar on Sidecar Pattern. (The Envoy container and WAF container are both added to the application Pod) We’ll see how that goes, and if successful I’ll write that one up as well.

I hope this series has been helpful and if you have suggestions for future topics, please feel free to let me know!

Next in the series: Part 5

Featured

Tools of the Trade for Security Systems Engineers in 2020

Happy New Year, everyone! As we begin a new decade and I reflect on the last quarter century of networking and security I thought it would be cool to see how the tools of the trade for pre-sales Systems Engineers in the network security field have changed and which tools the SE’s SE will need to be proficient with in 2020.

As an SE in the 90’s and early 2000’s I remember carrying a heavy laptop bag filled with now obsolete dongles, serial converters, null-modem cables, ethernet patch cables and crossover cables, screw drivers, papers and excerpts of manuals. I probably couldn’t get through TSA with that bag these days!

Networking and security has changed so much from those years. My early days were spent learning the opaque details of Windows NT and the black art of IPv4 subnetting (and CIDR!). I was obsessed with linux, OSPF, and BGP and made sure I understood the details of how encryption and key exchanges work for IPSEC VPNs.

Obviously all of those foundational skills have served all of us well, but in the past few years we’ve seen the security industry change quite dramatically. Stateful inspection firewalls have given way to Defense in Depth and Zero Trust, which includes so much more. (EDR, NDR, IPS, VM/Cloud/Micro Services, UEBA, Deception, SOAR… whew!) To that end, here are a few tools that I have added to my toolbox in the past few years that I look for SEs to at least have some familiarity with on my high-performing teams.

Cloud Providers

Every SE should have accounts in all of the major cloud providers. Each has its own flavor, advantages, and APIs. Cloud accounts are perfect for setting up temporary labs to test out a configuration or a quick POC. You never know which combination of providers your customers will be using these days so you really need to be familiar with at least these:

AWS (including EC2, VPCs, S3, Route53, and even Lambda)
Azure (believe me, your customers will be using it)
GCP (GKE is a great way to get familiar with Kubernetes!)

The good news is that all of the providers have free signups and the monthly bill is usually very low for lab usage.

Integrations and Automation

A lot of SEs have at least some background in scripting and programming and those skills are becoming more important now with everything becoming more connected and integrated. Integrations are the name of the game and if you can make a POC successful by building one yourself in a pinch it will make you that much more valuable to the customer and your company.

Python has become so popular in the past few years that it’s definitely something that I look for in SE candidates, but BASH, and PowerShell skills are still very relevant. Extra credit for learning Go! Here are some of the more important tools to help in this area:

Proper IDE or text editor (I like Sublime, but there are many options, including old-school vi!)
- UPDATE: I now tend to use VSCode for most of my Python work, but I still use Sublime for smaller code snippets and as a scratch pad/staging area
git (open some sort of git account, like github, and share your code)
- I’m not a git expert, by any means, so I use Sourcetree to keep me sane. (UPDATE: I now tend to use the git source control features built-in to VSCode)
SOAR Platforms (Phantom, Demisto, FortiSOAR)
- These typically have free community editions
SIEM (Elastic Search, Splunk, etc.)
- Again, set up the free community editions in your lab

APIs

In line with Integrations and Automation, some of the lower-level skills that will be needed is to understand the different flavors of APIs. You’ll find that RESTful or REST-like APIs are very common these days, which makes things easy, but you’ll definitely need to understand JSON format.

Here are some helpful tools for navigating APIs:

Online JSON pretty printer and validator
Online encoder/decoder (Cyberchef)
Postman – I love using this tool to learn a new API or to share quick python/BASH snippets with a customer.
jq – one of my favorite command line tools. It’s like sed or awk for JSON. Also, a quick and dirty JSON pretty printer/validator at the command line.

Containers and Microservices

Don’t worry, all of your legacy networking skills (OSI 1-7) aren’t obsolete, but a lot of the lower levels are becoming more abstracted and more emphasis is being laid on layer 7 for security.

I think it’s a good exercise to write a small, simple app in Python and package it up as a Docker container running standalone or in a Kubernetes cluster. Extra credit for learning Service Mesh technologies like Istio/Envoy and CI/CD Pipelines and tools like Jenkins.

It’s a big topic and a lot of things are changing rapidly, so this is an opportunity to learn something a bit bleeding edge, but quickly becoming mainstream. The SEs that understand these technologies will be the most relevant in 2020 and beyond as their customers transition to them.

To get started, make sure these tools are in your tool belt:

Docker Desktop
Kubernetes Cluster (I use Google GKE, but you can also use something like Amazon EKS or Azure AKS)

Penetration Testing/Hacking

Of course, we can’t forget the basics of security, including pen testing and hacking tools that will enable you to test and demonstrate your technologies and solutions.

netcat (aka ncat or nc) – this is one of the first command line tools I install on my laptop. It’s a Swiss army knife for network testing.
nmap – another must have at the command line – tried and true for many years.
Kali Linux – here is a nice summary.
Application security test tools available from the OWASP site.
Virus Total – just be careful you don’t upload sensitive files or compromise an ongoing investigation by uploading a file the incident responders are still reversing.

There are so many more tools for this section but they will typically be dependent on the type of security products you support.

2020 and Beyond!

There’s no shortage of things to learn and tools in the toolbox, though I have noticed that my laptop bag is a lot lighter these days! What are your favorite tools that I have missed?

Featured

Microservice Security Design Patterns for Kubernetes (Part 3)

The Security Service Layer Pattern

In Part 1 of this series on microservices security patterns for Kubernetes we went over three design patterns that enable micro-segmentation and deep inspection of the application and API traffic between microservices:

Security Service Layer Pattern
Security Sidecar Pattern
Service Mesh Security Plugin Pattern

In Part 2 we set up a simple, insecure deployment and demonstrated application layer attacks and the lack of micro-segmentation. In this post we will take that insecure deployment and implement a Security Service Layer Pattern to block application layer attacks and enforce strict segmentation between services.

The Insecure Deployment

Let’s take a quick look at the insecure deployment from Part 2:

Figure 1: Insecure Deployment

As demonstrated before, all microsim services can communicate with each other and there is no deep inspection implemented to block application layer attacks like SQLi. In this post, we will be implementing this servicelayer.yaml deployment that adds modsecurity reverse proxy WAF Pods with the Core Rule Set in front of the microsim services. modsecurity will perform deep inspection on the JSON/HTTP traffic and block application layer attacks.

Then we will add on a Kubernetes Network Policy to enforce segmentation between the services. In the end, the deployment will look like this:

Figure 2: Security Service Layer Pattern

Security Service Layer Deployment Spec

You’ll notice that each original service has been split into two services: a modsecurity WAF service (in orange) and the original service (in blue). Let’s take a look at the deployment YAML file to understand how this pattern works.

The Security Service Layer Pattern does add quite a bit of lines to our deployment file, but they are simple additions. We’ll just need to keep our port numbers and service names straight as we add the WAF layers into the deployment.

Let’s take a closer look at the components that have changed from the insecure deployment.

`www` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: www
spec:
  replicas: 3
  selector:
    matchLabels:
      app: www
  template:
    metadata:
      labels:
        app: www
    spec:
      containers:
      - name: modsecurity
        image: owasp/modsecurity-crs:v3.2-modsec2-apache
        ports:
        - containerPort: 80
        env:
        - name: SETPROXY
          value: "True"
        - name: PROXYLOCATION
          value: "http://wwworigin.default.svc.cluster.local:8080/"

We see three replicas of the official OWASP modsecurity container available on Docker Hub configured as a reverse proxy WAF listening on TCP port 80. All requests that go to any of these WAF instances will be inspected and proxied to the origin service, wwworigin, on TCP port 8080. wwworigin is the original Service and Deployment from the insecure deployment.

These WAF containers are effectively impersonating the original service so the user or application does not need to modify its configuration. One nice thing about this design is that it allows you to scale the security layer independent from the application. For instance, you might only require two modsecurity Pods to secure 10 of your application Pods.

Now, let’s take a look at the www Service that points to this WAF deployment.

`www` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: www
  name: www
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: www
  sessionAffinity: None
  type: LoadBalancer

Nothing too fancy here – just forwarding TCP port 80 application traffic to TCP port 80 on the modsecurity WAF Pods since that is the port they listen on. Since this is an externally facing service we are using type: LoadBalancer and externalTrafficPolicy: Local just like the original Service did.

Next, let’s check out the wwworigin Deployment spec where the original application Pods are defined.

`wwworigin` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wwworigin
spec:
  replicas: 3
  selector:
    matchLabels:
      app: wwworigin
  template:
    metadata:
      labels:
        app: wwworigin
    spec:
      containers:
      - name: microsimserver
        image: kellybrazil/microsimserver
        env:
        - name: STATS_PORT
          value: "5000"
        ports:
        - containerPort: 8080
      - name: microsimclient
        image: kellybrazil/microsimclient
        env:
        - name: REQUEST_URLS
          value: "http://auth.default.svc.cluster.local:80,http://db.default.svc.cluster.local:80"
        - name: SEND_SQLI
          value: "True"
        - name: STATS_PORT
          value: "5001"

There’s a lot going on here, but basically it’s nearly identical to what we had in the insecure deployment. The only thing that has changed is the name of the deployment from www to wwworigin and we changed the REQUEST_URLS destination ports from 8080 to 80. This is because the modsecurity WAF containers listen on port 80 and they are the true front-end to the auth and db services.

Next, let’s take a look at the wwworigin Service spec.

`wwworigin` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: wwworigin
  name: wwworigin
spec:
  ports:
  - port: 8080
    targetPort: 8080
  selector:
    app: wwworigin
  sessionAffinity: None

The only change to the original deployment here is that we changed the name from www to wwworigin and the port from 80 to 8080 since the origin Pods are now internal and not directly exposed to the internet.

Now we need to repeat this process for the auth and db services. Since they are configured the same way, we will only go over the db Deployment and Service. Remember, there is now a db (WAF) and dborigin (application) Deployment and Service that we need to define.

`db` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: modsecurity
        image: owasp/modsecurity-crs:v3.2-modsec2-apache
        ports:
        - containerPort: 80
        env:
        - name: SETPROXY
          value: "True"
        - name: PROXYLOCATION
          value: "http://dborigin.default.svc.cluster.local:8080/"

This is essentially the same as the www Deployment except we are proxying to dborigin. The WAF containers listen on port 80 and then they proxy the traffic to port 8080 on the origin application service.

`db` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: db
  name: db
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: db
  sessionAffinity: None

Again, nothing fancy here – just listening on TCP port 80, which is what the modsecurity WAF containers listen on. This is an internal service so no need for type: LoadBalancer or externalTrafficPolicy: Local.

Finally, let’s take a look at the dborigin Deployment and Service.

`dborigin` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dborigin
spec:
  replicas: 3
  selector:
    matchLabels:
      app: dborigin
  template:
    metadata:
      labels:
        app: dborigin
    spec:
      containers:
      - name: microsimserver
        image: kellybrazil/microsimserver
        ports:
        - containerPort: 8080
        env:
        - name: STATS_PORT
          value: "5000"

This Deployment is essentially the same as the original, except the name has been changed from db to dborigin.

`dborigin` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: dborigin
  name: dborigin
spec:
  ports:
  - port: 8080
    targetPort: 8080
  selector:
    app: dborigin
  sessionAffinity: None

Again, the only change from the original here is the name from db to dborigin.

Now that we understand how the Deployment and Service specs work, let’s apply them on our Kubernetes cluster.

See Part 2 for more information on setting up the cluster.

Applying the Deployments and Services

First, let’s delete the original insecure deployment in Cloud Shell if it is still running:

$ kubectl delete -f simple.yaml

Your Pods, Deployments, and Services should be empty before you proceed:

$ kubectl get pods
No resources found.

$ kubectl get deploy
No resources found.

$ kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.12.0.1    <none>        443/TCP   3m46s

Next, copy/paste the deployment text into a file called servicelayer.yaml using vi. Then apply the deployment with kubectl:

$ kubectl apply -f servicelayer.yaml
deployment.apps/www created
deployment.apps/wwworigin created
deployment.apps/auth created
deployment.apps/authorigin created
deployment.apps/db created
deployment.apps/dborigin created
service/www created
service/auth created
service/db created
service/wwworigin created
service/authorigin created
service/dborigin created

Testing the Deployment

Once the www service has an external IP, you can send an HTTP GET or POST request to it from Cloud Shell or your laptop:

$ kubectl get services
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
auth         ClusterIP      10.12.14.41    <none>        80/TCP         52s
authorigin   ClusterIP      10.12.5.222    <none>        8080/TCP       52s
db           ClusterIP      10.12.9.224    <none>        80/TCP         52s
dborigin     ClusterIP      10.12.13.80    <none>        8080/TCP       51s
kubernetes   ClusterIP      10.12.0.1      <none>        443/TCP        7m43s
www          LoadBalancer   10.12.13.193   34.66.99.16   80:30394/TCP   52s
wwworigin    ClusterIP      10.12.6.122    <none>        8080/TCP       52s

$ curl 34.66.99.16
...o7yXXg70Olfu2MvVsm9kos8ksEXyzX4oYnZ7wQh29FaqSF
Thu Dec 19 00:58:15 2019   hostname: wwworigin-6c8fb48f79-frmk9   ip: 10.8.1.9   remote: 10.8.0.7   hostheader: wwworigin.default.svc.cluster.local:8080   path: /

You can probably already see some interesting side effects of this deployment. The originating IP address is now the IP address of the WAF that handled the request. (10.8.0.7 in this case). Since the WAF is deployed as a reverse proxy, the only way to get the originating IP information will be via HTTP headers, such as X-Forwarded-For (XFF). Also, the host header has now changed, so keep this in mind if the application is expecting certain values in the headers.

We can do a quick check to see if the modsecurity WAF is inspecting traffic by sending an HTTP POST request with no data or size information. This will be seen as an anomalous request and blocked:

$ curl -X POST http://34.66.99.16
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>

That looks good! Now let’s take a look at the microsim stats to see if the WAF layers are blocking the East/West SQLi attacks. Let’s open two tabs in Cloud Shell: one for shell access to a wwworigin container and another for shell access to a dborigin container.

In the first tab, use kubectl to find the name of one of the wwworigin pods and shell into the microsimclient container running in it:

$ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
auth-865675dd7f-4nld7         1/1     Running   0          23m
auth-865675dd7f-7xsks         1/1     Running   0          23m
auth-865675dd7f-lzdzg         1/1     Running   0          23m
authorigin-5f6b795dcd-47gwn   1/1     Running   0          23m
authorigin-5f6b795dcd-r5lr2   1/1     Running   0          23m
authorigin-5f6b795dcd-xb68n   1/1     Running   0          23m
db-dc6f6f5f9-b2j2f            1/1     Running   0          23m
db-dc6f6f5f9-kb5q9            1/1     Running   0          23m
db-dc6f6f5f9-wmj4n            1/1     Running   0          23m
dborigin-7dc8d69f86-6mj2d     1/1     Running   0          23m
dborigin-7dc8d69f86-bvpdn     1/1     Running   0          23m
dborigin-7dc8d69f86-n42vg     1/1     Running   0          23m
www-7cdc675f9-bhrhp           1/1     Running   0          23m
www-7cdc675f9-dldhq           1/1     Running   0          23m
www-7cdc675f9-rlqwv           1/1     Running   0          23m
wwworigin-6c8fb48f79-9tq5t    2/2     Running   0          23m
wwworigin-6c8fb48f79-frmk9    2/2     Running   0          23m
wwworigin-6c8fb48f79-tltzd    2/2     Running   0          23m

$ kubectl exec wwworigin-6c8fb48f79-9tq5t -c microsimclient -it sh
/app #

Then curl to the microsimclient stats server on localhost:5001:

/app # curl localhost:5001
{
  "time": "Thu Dec 19 01:26:24 2019",
  "runtime": 1855,
  "hostname": "wwworigin-6c8fb48f79-9tq5t",
  "ip": "10.8.0.10",
  "stats": {
    "Requests": 1848,
    "Sent Bytes": 1914528,
    "Received Bytes": 30650517,
    "Internet Requests": 0,
    "Attacks": 18,
    "SQLi": 18,
    "XSS": 0,
    "Directory Traversal": 0,
    "DGA": 0,
    "Malware": 0,
    "Error": 0
  },
  "config": {
    "STATS_PORT": 5001,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "REQUEST_URLS": "http://auth.default.svc.cluster.local:80,http://db.default.svc.cluster.local:80",
    "REQUEST_INTERNET": false,
    "REQUEST_MALWARE": false,
    "SEND_SQLI": true,
    "SEND_DIR_TRAVERSAL": false,
    "SEND_XSS": false,
    "SEND_DGA": false,
    "REQUEST_WAIT_SECONDS": 1.0,
    "REQUEST_BYTES": 1024,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0,
    "REQUEST_PROBABILITY": 1.0,
    "EGRESS_PROBABILITY": 0.1,
    "ATTACK_PROBABILITY": 0.01
  }
}

Here we see 18 SQLi attacks have been sent to the auth and db services in the last 1855 seconds.

Now, let’s see if the attacks are getting through like they did in the insecure deployment. In the other tab, find the name of one of the dborigin pods and shell into the microsimserver container running in it:

$ kubectl exec dborigin-7dc8d69f86-6mj2d -c microsimserver -it sh
/app #

Then curl to the microsimserver stats server on localhost:5000:

/app # curl localhost:5000
{
  "time": "Thu Dec 19 01:29:00 2019",
  "runtime": 2013,
  "hostname": "dborigin-7dc8d69f86-6mj2d",
  "ip": "10.8.2.10",
  "stats": {
    "Requests": 1009,
    "Sent Bytes": 16733599,
    "Received Bytes": 1045324,
    "Attacks": 0,
    "SQLi": 0,
    "XSS": 0,
    "Directory Traversal": 0
  },
  "config": {
    "LISTEN_PORT": 8080,
    "STATS_PORT": 5000,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "RESPOND_BYTES": 16384,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0
  }
}

Remember in the insecure deployment we saw the SQLi value incrementing. Now that the modsecurity WAF is inspecting the East/West traffic, the SQLi attacks are no longer getting through, though we still see normal Requests, Sent Bytes, and Received Bytes incrementing.

`modsecurity` Logs

Let’s check the modsecurity logs to see how the East/West application attacks are being identified. To see the modsecurity audit log we’ll need to shell into one of the WAF containers and look at the /var/log/modsec_audit.log file:

$ kubectl exec db-dc6f6f5f9-b2j2f -it sh
/app # grep -C 60 sql /var/log/modsec_audit.log
<snip>
--fa628b64-A--
[19/Dec/2019:03:06:44 +0000] XfrpRArFgedF@mTDKh9QvAAAAI4 10.8.1.9 60612 10.8.2.9 80
--fa628b64-B--
GET /?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1-- HTTP/1.1
Host: db.default.svc.cluster.local
User-Agent: python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

--fa628b64-F--
HTTP/1.1 403 Forbidden
Content-Length: 209
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1

--fa628b64-E--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /
on this server.<br />
</p>
</body></html>

--fa628b64-H--
Message: Warning. Pattern match "(?i:(?:[\"'`](?:;?\\s*?(?:having|select|union)\\b\\s*?[^\\s]|\\s*?!\\s*?[\"'`\\w])|(?:c(?:onnection_id|urrent_user)|database)\\s*?\\([^\\)]*?|u(?:nion(?:[\\w(\\s]*?select| select @)|ser\\s*?\\([^\\)]*?)|s(?:chema\\s*?\\([^\\)]*?|elect.*?\\w?user\\()|in ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "190"] [id "942190"] [msg "Detects MSSQL code execution and information gathering attempts"] [data "Matched Data: UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]
Message: Warning. Pattern match "(?i:(?:^[\\W\\d]+\\s*?(?:alter\\s*(?:a(?:(?:pplication\\s*rol|ggregat)e|s(?:ymmetric\\s*ke|sembl)y|u(?:thorization|dit)|vailability\\s*group)|c(?:r(?:yptographic\\s*provider|edential)|o(?:l(?:latio|um)|nversio)n|ertificate|luster)|s(?:e(?:rv(?:ice|er)| ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "471"] [id "942360"] [msg "Detects concatenated basic SQL injection and SQLLFI attempts"] [data "Matched Data: ;UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"]
Message: Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "91"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total Score: 10)"] [severity "CRITICAL"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"]
Message: Warning. Operator GE matched 5 at TX:inbound_anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "86"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 10 - SQLI=10,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 10, 0, 0, 0"] [tag "event-correlation"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.1.9] ModSecurity: Warning. Pattern match "(?i:(?:[\\\\"'`](?:;?\\\\\\\\s*?(?:having|select|union)\\\\\\\\b\\\\\\\\s*?[^\\\\\\\\s]|\\\\\\\\s*?!\\\\\\\\s*?[\\\\"'`\\\\\\\\w])|(?:c(?:onnection_id|urrent_user)|database)\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?|u(?:nion(?:[\\\\\\\\w(\\\\\\\\s]*?select| select @)|ser\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?)|s(?:chema\\\\\\\\s*?\\\\\\\\([^\\\\\\\\)]*?|elect.*?\\\\\\\\w?user\\\\\\\\()|in ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "190"] [id "942190"] [msg "Detects MSSQL code execution and information gathering attempts"] [data "Matched Data: UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "XfrpRArFgedF@mTDKh9QvAAAAI4"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.1.9] ModSecurity: Warning. Pattern match "(?i:(?:^[\\\\\\\\W\\\\\\\\d]+\\\\\\\\s*?(?:alter\\\\\\\\s*(?:a(?:(?:pplication\\\\\\\\s*rol|ggregat)e|s(?:ymmetric\\\\\\\\s*ke|sembl)y|u(?:thorization|dit)|vailability\\\\\\\\s*group)|c(?:r(?:yptographic\\\\\\\\s*provider|edential)|o(?:l(?:latio|um)|nversio)n|ertificate|luster)|s(?:e(?:rv(?:ice|er)| ..." at ARGS:password. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-942-APPLICATION-ATTACK-SQLI.conf"] [line "471"] [id "942360"] [msg "Detects concatenated basic SQL injection and SQLLFI attempts"] [data "Matched Data: ;UNION SELECT found within ARGS:password: ;UNION SELECT 1, version() limit 1,1--"] [severity "CRITICAL"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-sqli"] [tag "OWASP_CRS"] [tag "OWASP_CRS/WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "XfrpRArFgedF@mTDKh9QvAAAAI4"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.1.9] ModSecurity: Access denied with code 403 (phase 2). Operator GE matched 5 at TX:anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "91"] [id "949110"] [msg "Inbound Anomaly Score Exceeded (Total Score: 10)"] [severity "CRITICAL"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "XfrpRArFgedF@mTDKh9QvAAAAI4"]
Apache-Error: [file "apache2_util.c"] [line 273] [level 3] [client 10.8.1.9] ModSecurity: Warning. Operator GE matched 5 at TX:inbound_anomaly_score. [file "/etc/modsecurity.d/owasp-crs/rules/RESPONSE-980-CORRELATION.conf"] [line "86"] [id "980130"] [msg "Inbound Anomaly Score Exceeded (Total Inbound Score: 10 - SQLI=10,XSS=0,RFI=0,LFI=0,RCE=0,PHPI=0,HTTP=0,SESS=0): individual paranoia level scores: 10, 0, 0, 0"] [tag "event-correlation"] [hostname "db.default.svc.cluster.local"] [uri "/"] [unique_id "XfrpRArFgedF@mTDKh9QvAAAAI4"]
Action: Intercepted (phase 2)
Apache-Handler: proxy-server
Stopwatch: 1576724804853810 2752 (- - -)
Stopwatch2: 1576724804853810 2752; combined=2296, p1=669, p2=1340, p3=0, p4=0, p5=287, sr=173, sw=0, l=0, gc=0
Response-Body-Transformed: Dechunked
Producer: ModSecurity for Apache/2.9.3 (http://www.modsecurity.org/); OWASP_CRS/3.2.0.
Server: Apache
Engine-Mode: "ENABLED"

--fa628b64-Z--

Here we see modsecurity has blocked and logged the East/West SQLi attack from one of the wwworigin containers to a dborigin container. Excellent!

But there’s still a bit more to do. Even though we are now inspecting and protecting traffic at the application layer, we are not yet enforcing micro-segmentation between the services. That means that, even with the WAFs in place, any authorigin container can communicate with any dborigin container. We can demonstrate this by opening a shell on a authorigin container and attempting to send a simulated SQLi to a dborigin container from it:

# curl 'http://dborigin:8080/?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--'
X7fJ4MnlHo5gzJFQ1...
Thu Dec 19 04:54:25 2019   hostname: dborigin-7dc8d69f86-6mj2d   ip: 10.8.2.10   remote: 10.8.2.5   hostheader: dborigin:8080   path: /?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--

Not only can they communicate – we have completely bypassed the WAF! Let’s fix this with Network Policy.

Network Policy

Here is a Network Policy spec that will control the ingress to each internal pod. I tried to keep the rules simple, but in a production deployment a tighter policy would likely be desired. For example, you would probably also want to include Egress policies.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: wwworigin-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: wwworigin
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: www
    to:
    ports:
    - protocol: TCP
      port: 8080
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: auth-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: auth
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: wwworigin
    to:
    ports:
    - protocol: TCP
      port: 80
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: db
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: wwworigin
    to:
    ports:
    - protocol: TCP
      port: 80
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: authorigin-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: authorigin
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: auth
    to:
    ports:
    - protocol: TCP
      port: 8080
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: dborigin-ingress
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: dborigin
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: db
    to:
    ports:
    - protocol: TCP
      port: 8080

Even with a simple Network Policy you can see one of the downsides to the Security Services Layer Pattern: it can be tedious to set the proper micro-segmentation policy without making errors.

Basically what this policy is saying is:

On the wwworigin containers only accept traffic from the www containers that is destined to TCP port 8080
On the auth containers only accept traffic from the wwworigin containers that is destined to TCP port 80
On the db containers only accept traffic from the wwworigin containers that is destined to TCP port 80
On the authorigin containers only accept traffic from the auth containers that is destined to TCP port 8080
On the dborigin containers only accept traffic from the db containers that is destined to TCP port 8080

Not Fun! In a large deployment with many services, this can quickly get out of hand and errors will be easy to make as you trace the traffic flow between each service. That’s why a Service Mesh is probably a better choice for an application with more than a few services.

So let’s see if this works. Let’s copy the Nework Policy text to a file named servicelayer-network-policy.yaml in vi and apply the Network Policy to the cluster with kubectl:

$ kubectl create -f servicelayer-network-policy.yaml
networkpolicy.networking.k8s.io/wwworigin-ingress created
networkpolicy.networking.k8s.io/auth-ingress created
networkpolicy.networking.k8s.io/db-ingress created
networkpolicy.networking.k8s.io/authorigin-ingress created
networkpolicy.networking.k8s.io/dborigin-ingress created

And now let’s try that simulated SQLi attack again from authorigin to dborigin:

/var/log # curl 'http://dborigin:8080/?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--'
curl: (7) Failed to connect to dborigin port 8080: Operation timed out

Success!

Finally, let’s doublecheck that the rest of the application is still working by checking the dborigin logs. If we are still getting legitimate requests then we should be good to go:

$ kubectl logs -f dborigin-7dc8d69f86-6mj2d
<snip>
10.8.2.6 - - [19/Dec/2019 05:23:26] "POST / HTTP/1.1" 200 -
10.8.2.6 - - [19/Dec/2019 05:23:28] "POST / HTTP/1.1" 200 -
10.8.2.9 - - [19/Dec/2019 05:23:31] "POST / HTTP/1.1" 200 -
10.8.2.6 - - [19/Dec/2019 05:23:33] "POST / HTTP/1.1" 200 -
10.8.0.11 - - [19/Dec/2019 05:23:34] "POST / HTTP/1.1" 200 -
10.8.2.9 - - [19/Dec/2019 05:23:34] "POST / HTTP/1.1" 200 -
10.8.2.6 - - [19/Dec/2019 05:23:35] "POST / HTTP/1.1" 200 -
10.8.2.9 - - [19/Dec/2019 05:23:39] "POST / HTTP/1.1" 200 -
10.8.2.9 - - [19/Dec/2019 05:23:40] "POST / HTTP/1.1" 200 -
10.8.2.9 - - [19/Dec/2019 05:23:41] "POST / HTTP/1.1" 200 -
{"Total": {"Requests": 8056, "Sent Bytes": 133603375, "Received Bytes": 8342908, "Attacks": 1, "SQLi": 1, "XSS": 0, "Directory Traversal": 0}, "Last 30 Seconds": {"Requests": 17, "Sent Bytes": 281932, "Received Bytes": 17612, "Attacks": 0, "SQLi": 0, "XSS": 0, "Directory Traversal": 0}}
10.8.2.6 - - [19/Dec/2019 05:23:43] "POST / HTTP/1.1" 200 -
10.8.2.6 - - [19/Dec/2019 05:23:43] "POST / HTTP/1.1" 200 -

Nice! We see the service is still getting requests with the Network Policy in place. We can even see that test SQLi request we sent earlier when we bypassed the WAF, but no SQLi attacks are seen since the Network Policy was applied.

Conclusion

Whew – that was fun! As you can see, it is possible to lock down an application with just a few microservices that need to communicate with each other using the Security Services Layer Pattern, but for anything more than a few services things can get complicated quickly. It does have the advantage, however, of allowing you to independently scale the security layers and the application layers.

Stay tuned for the next post where we’ll go over the Security Sidecar Pattern and we’ll see the advantages and disadvantages of that approach.

Next in the series: Part 4

Featured

JC Version 1.6.1 Released

Try the jc web demo!

I’m happy to announce that jc version 1.6.1 has been released and is available on github and pypi.

To upgrade, run:

$ pip install --upgrade jc

New Parsers

jc now includes 32 parsers! New parsers (tested on linux and OSX) include:

du
crontab files
pip list
pip show

Updated Parsers

ifconfig parser now outputs rx_bytes and tx_bytes as integers.

More OSX Support

Version 1.6.1 provides more OSX support and testing for several existing parsers, including:

ifconfig
arp
df
mount
uname -a
ls
dig
ps
w
uptime

About JC Information

jc now has an about option that will show the version of jc and all of the included parsers. Other information, including parser compatibility and authorship will also be shown in JSON format.

$ jc -a -p
{
  "name": "jc",
  "version": "1.6.1",
  "description": "jc cli output JSON conversion tool",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "parser_count": 32,
  "parsers": [
    {
      "name": "arp",
      "argument": "--arp",
      "version": "1.1",
      "description": "arp parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "linux",
        "aix",
        "freebsd",
        "darwin"
      ]
    },
    {
      "name": "crontab",
      "argument": "--crontab",
      "version": "1.0",
      "description": "crontab file parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "linux",
        "darwin",
        "aix",
        "freebsd"
      ]
    },
    ...
  ]
}

Schema Changes

The ifconfig parser output now prints the state value as a JSON array instead of a string. Also, as mentioned above, rx_bytes and tx_bytes are available.

$ ifconfig lo | jc --ifconfig -p
[
  {
    "name": "lo",
    "flags": 73,
    "state": [
      "UP",
      "LOOPBACK",
      "RUNNING"
    ],
    "mtu": 65536,
    "ipv4_addr": "127.0.0.1",
    "ipv4_mask": "255.0.0.0",
    "ipv4_bcast": null,
    "ipv6_addr": "::1",
    "ipv6_mask": 128,
    "ipv6_scope": "0x10",
    "mac_addr": null,
    "type": "Local Loopback",
    "rx_packets": 0,
    "rx_bytes": 0,
    "rx_errors": 0,
    "rx_dropped": 0,
    "rx_overruns": 0,
    "rx_frame": 0,
    "tx_packets": 0,
    "tx_bytes": 0,
    "tx_errors": 0,
    "tx_dropped": 0,
    "tx_overruns": 0,
    "tx_carrier": 0,
    "tx_collisions": 0,
    "metric": null
  }
]

The df parser now uses an underscore instead of a dash in the “blocks” field name:

$ df | jc --df -p
[
  {
    "filesystem": "devtmpfs",
    "1k_blocks": 1918816,
    "used": 0,
    "available": 1918816,
    "mounted_on": "/dev",
    "use_percent": 0
  },
  ...
]

Full Parser List

arp
crontab
df
dig
du
env
free
fstab
history
hosts
ifconfig
iptables
jobs
ls
lsblk
lsmod
lsof
mount
netstat
pip list
pip show
ps
route
ss
stat
systemctl
systemctl list-jobs
systemctl list-sockets
systemctl list-unit-files
uname -a
uptime
w

For more information on the motivations for creating jc, see my blog post.

Happy parsing!

Featured

Microservice Security Design Patterns for Kubernetes (Part 2)

Setting Up the Insecure Deployment

Security Service Layer Pattern
Security Sidecar Pattern
Service Mesh Security Plugin Pattern

In this post we will set the groundwork to deep dive into the Security Service Layer Pattern with a live insecure deployment on Google Kubernetes Engine (GKE). By the end of this post you will be able to bring up an insecure deployment and demonstrate layer 7 attacks and unrestricted access between internal services. In the next post we will layer on a Security Service Layer Pattern to secure the application.

The Base Deployment

Let’s first get our cluster up and running with a simple deployment with no security and show what is possible in a nearly default state. We’ll use this simple.yaml deployment I have created using my microsim app. microsim is a microservice simulator that can send simulated JSON/HTTP and application attack traffic between services. It has some logging and statistics reporting functionality that will allow us to see attacks being sent by the client and received or blocked by the server.

Here is a diagram of the deployment.

Figure 1: Simple Deployment

In this microservice architecture we see three simulated services:

Public Web interface service
Internal Authentication service
Internal Database service

In the default state, all services are able to communicate with one another and there are no protections from application layer attacks. Let’s take a quick look at the Pod Deployments and Services in this application.

`www` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: www
spec:
  replicas: 3
  selector:
    matchLabels:
      app: www
  template:
    metadata:
      labels:
        app: www
    spec:
      containers:
      - name: microsimserver
        image: kellybrazil/microsimserver
        env:
        - name: STATS_PORT
          value: "5000"
        ports:
        - containerPort: 8080
      - name: microsimclient
        image: kellybrazil/microsimclient
        env:
        - name: REQUEST_URLS
          value: "http://auth.default.svc.cluster.local:8080,http://db.default.svc.cluster.local:8080"
        - name: SEND_SQLI
          value: "True"
        - name: STATS_PORT
          value: "5001"

In the www deployment above we see three Pod replicas, each running two containers. (microsimserver and microsimclient)

The microsimserver container is configured to expose port 8080, which is the default port the service listens on. By default, the server will respond with 16KB of data and some diagnostic information in either plain HTTP or JSON/HTTP, depending on whether the request is an HTTP GET or POST.

The microsimclient container is configured to send a single 1KB JSON/HTTP POST request every second to http://auth.default.svc.cluster.local:8080 or http://db.default.svc.cluster.local:8080 which will resolve to the internal auth and db Services using the default Kubernetes DNS resolver.

We also see that microsimclient is configured to occasionally send SQLi attack traffic to the auth and db Services. There are many other behaviors that can be configured, but we’ll keep things simple.

The stats server for microsimserver is configured to run on port 5000 and the stats server for microsimclient is configured to run on port 5001. These ports are not exposed to the cluster, so we will need to get shell access to the containers to see the stats.

Now, let’s look at the www service.

`www` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: www
  name: www
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: www
  sessionAffinity: None
  type: LoadBalancer

The service is configured to publicly expose the www service via port 80 with a LoadBalancer type. The externalTrafficPolicy: Local option allows the originating IP address to be preserved within the cluster.

Now let’s take a look at the db deployment and service. The auth service is exactly the same as the db service so we’ll skip going over that one.

`db` Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: microsimserver
        image: kellybrazil/microsimserver
        env:
        - name: STATS_PORT
          value: "5000"
        ports:
        - containerPort: 8080

Just like the www service, there are three Pod replicas, but only one container (microsimserver) runs in each Pod. The default microsimserver listening port of 8080 is exposed and the stats server listens on port 5000, though it is not exposed, so we’ll need to shell into it to view the stats.

And here is the db Service:

`db` Service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: db
  name: db
spec:
  ports:
  - port: 8080
    targetPort: 8080
  selector:
    app: db
  sessionAffinity: None

Since this is an internal service, we are not using the LoadBalancer type, which will cause the Service to be created as a ClusterIP type, nor do we need to define externalTrafficPolicy.

Firing up the Cluster

Let’s bring up the cluster from within the GKE console. Create a standard cluster using the n1-standard-2 machine type with the Enable network policy option checked under the advanced Network security options:

Figure 2: Enable network policy in GKE

Note: you can also create a cluster with network policy enabled at the command line with the --enable-network-policy argument:

$ gcloud container clusters create test --machine-type=n1-standard-2 --enable-network-policy

Once the cluster is up and running, we can spin up the deployment using kubectl locally after configuring it with the gcloud command, or you can use the Google Cloud Shell terminal. For simplicity, let’s use the Cloud Shell and connect to the cluster:

Figure 3: Connect to the Cluster via Cloud Shell

Within Cloud Shell, copy paste the deployment text into a new file called simple.yaml with vi.

Then create the deployment:

$ kubectl create -f simple.yaml
deployment.apps/www created
deployment.apps/auth created
deployment.apps/db created
service/www created
service/auth created
service/db created

You will see the deployments and services start up. You can verify the application is running successfully with the following commands:

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
auth-5f964774bd-mvtcl   1/1     Running   0          67s
auth-5f964774bd-sn4cw   1/1     Running   0          66s
auth-5f964774bd-xtt54   1/1     Running   0          66s
db-578757bf68-dzjdq     1/1     Running   0          66s
db-578757bf68-kkwzr     1/1     Running   0          66s
db-578757bf68-mlf5t     1/1     Running   0          66s
www-5d89bcb54f-bcjm9    2/2     Running   0          67s
www-5d89bcb54f-bzpwl    2/2     Running   0          67s
www-5d89bcb54f-vbdf6    2/2     Running   0          67s

$ kubectl get deploy
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
auth   3/3     3            3           92s
db     3/3     3            3           92s
www    3/3     3            3           92s

$ kubectl get service
NAME         TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)        AGE
auth         ClusterIP      10.0.13.227   <none>          8080/TCP       2m1s
db           ClusterIP      10.0.3.1      <none>          8080/TCP       2m1s
kubernetes   ClusterIP      10.0.0.1      <none>          443/TCP        10m
www          LoadBalancer   10.0.6.39     35.188.221.11   80:32596/TCP   2m1s

Find the external address assigned to the www service and send an HTTP GET request to it to verify the service is responding. You can do this from Cloud Shell or your laptop:

$ curl http://35.188.221.11
FPGpqiVZivddHQvkvDHFErFiW2WK8Kl3ky9cEeI7TA6vH8PYmA1obaZGd1AR3avz3SqPZlcrbXFOn3hVlFQdFm9S07ca
<snip>
jYbD5jNA62JEQbUSqk9V0JGgYLATbYe2rv3XeFQIEayJD4qeGnPp7UbEESPBmxrw
Wed Dec 11 20:07:08 2019   hostname: www-5d89bcb54f-vbdf6   ip: 10.56.0.4   remote: 35.197.46.124   hostheader: 35.188.221.11   path: /

You should see a long block of random text and some client and server information on the last line. Notice if you send the request as an HTTP POST the response comes back as JSON. Here I have run the response through jq to pretty-print the response:

$ curl -X POST http://35.188.221.11 | jq .
{
  "data": "hhV9jogGrM7FMxsQCUAcjdsLQRgjgpCoO...",
  "time": "Wed Dec 11 20:14:20 2019",
  "hostname": "www-5d89bcb54f-vbdf6",
  "ip": "10.56.0.4",
  "remote": "46.18.117.38",
  "hostheader": "35.188.221.11",
  "path": "/"
}

Testing the Deployment

Now, let’s prove that any Pod can communicate with any other Pod and that the SQLi attacks are being received by the internal services. We can do this by opening a shell to one of the www pods and one of the db pods.

Open two new tabs in Cloud Shell and find the Pod names from the kubectl get pods command output above.

In one tab, run the following to get a shell on the microsimclient container in the www Pod:

$ kubectl exec www-5d89bcb54f-bcjm9 -c microsimclient -it sh
/app #

In the other tab, run the following to get a shell on the microsimserver container in the db Pod:

$ kubectl exec db-578757bf68-dzjdq -c microsimserver -it sh
/app #

From the microsimclient shell, run the following curl command to see the application stats. This will show us how many normal and attack requests have been sent:

/app # curl http://localhost:5001
{
  "time": "Wed Dec 11 20:21:30 2019",
  "runtime": 1031,
  "hostname": "www-5d89bcb54f-bcjm9",
  "ip": "10.56.1.3",
  "stats": {
    "Requests": 1026,
    "Sent Bytes": 1062936,
    "Received Bytes": 17006053,
    "Internet Requests": 0,
    "Attacks": 9,
    "SQLi": 9,
    "XSS": 0,
    "Directory Traversal": 0,
    "DGA": 0,
    "Malware": 0,
    "Error": 1
  },
  "config": {
    "STATS_PORT": 5001,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "REQUEST_URLS": "http://auth.default.svc.cluster.local:8080,http://db.default.svc.cluster.local:8080",
    "REQUEST_INTERNET": false,
    "REQUEST_MALWARE": false,
    "SEND_SQLI": true,
    "SEND_DIR_TRAVERSAL": false,
    "SEND_XSS": false,
    "SEND_DGA": false,
    "REQUEST_WAIT_SECONDS": 1.0,
    "REQUEST_BYTES": 1024,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0,
    "REQUEST_PROBABILITY": 1.0,
    "EGRESS_PROBABILITY": 0.1,
    "ATTACK_PROBABILITY": 0.01
  }
}

Run the command a few times until you see a number of SQLi attacks have been sent. Here we see that this microsimclient instance has sent 9 SQLi attacks in the last 1031 seconds of runtime.

From the microsimserver shell, curl the server stats to see if any SQLi attacks have been detected:

/app # curl http://localhost:5000
{
  "time": "Wed Dec 11 20:23:52 2019",
  "runtime": 1177,
  "hostname": "db-578757bf68-dzjdq",
  "ip": "10.56.2.11",
  "stats": {
    "Requests": 610,
    "Sent Bytes": 10110236,
    "Received Bytes": 629888,
    "Attacks": 2,
    "SQLi": 2,
    "XSS": 0,
    "Directory Traversal": 0
  },
  "config": {
    "LISTEN_PORT": 8080,
    "STATS_PORT": 5000,
    "STATSD_HOST": null,
    "STATSD_PORT": 8125,
    "RESPOND_BYTES": 16384,
    "STOP_SECONDS": 0,
    "STOP_PADDING": false,
    "TOTAL_STOP_SECONDS": 0
  }
}

Here we see that this particular server has detected two SQLi attacks coming from the clients within the cluster. (East/West traffic) Remember, there are also five other db and auth Pods that are receiving attacks so you will see the attack load shared amongst them.

Let’s also demonstrate that the db server can directly communicate with the auth service:

/app # curl http://auth:8080
firOXAY4hktZLjHvbs41JhReCWHqs... <snip>
Wed Dec 11 20:26:38 2019   hostname: auth-5f964774bd-mvtcl   ip: 10.56.1.4   remote: 10.56.2.11   hostheader: auth:8080   path: /

Since we get a response it is clear that there is no micro-segmentation in place between the db and auth Services and Pods.

Microservice logging

As with most services in Kubernetes, both microsimclient and microsimserver regularly send logs for each request and response to stdout, which means they can be found with the kubectl logs command. Every 30 seconds a JSON summary will also be logged:

`microsimclient` logs

$ kubectl logs www-5d89bcb54f-bcjm9 microsimclient
2019-12-11T20:04:19   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:20   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:21   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:22   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:23   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:23   SQLi sent: http://auth.default.svc.cluster.local:8080/?username=joe%40example.com&password=%3BUNION+SELECT+1%2C+version%28%29+limit+1%2C1--
2019-12-11T20:04:24   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16574
2019-12-11T20:04:25   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:26   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:27   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:28   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:29   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:30   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:31   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:32   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:33   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:34   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:35   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:36   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:37   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:38   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:39   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:40   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:41   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:42   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:43   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:44   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:45   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
2019-12-11T20:04:46   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:47   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
2019-12-11T20:04:48   Request to http://auth.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16577
{"Total": {"Requests": 30, "Sent Bytes": 31080, "Received Bytes": 497267, "Internet Requests": 0, "Attacks": 1, "SQLi": 1, "XSS": 0, "Directory Traversal": 0, "DGA": 0, "Malware": 0, "Error": 0}, "Last 30 Seconds": {"Requests": 30, "Sent Bytes": 31080, "Received Bytes": 497267, "Internet Requests": 0, "Attacks": 1, "SQLi": 1, "XSS": 0, "Directory Traversal": 0, "DGA": 0, "Malware": 0, "Error": 0}}
2019-12-11T20:04:49   Request to http://db.default.svc.cluster.local:8080/   Request size: 1036   Response size: 16573
...

`microsimserver` logs

$ kubectl logs db-578757bf68-dzjdq microsimserver
10.56.1.5 - - [11/Dec/2019 20:04:22] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:22] "POST / HTTP/1.1" 200 -
10.56.1.3 - - [11/Dec/2019 20:04:24] "POST / HTTP/1.1" 200 -
10.56.1.5 - - [11/Dec/2019 20:04:25] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:26] "POST / HTTP/1.1" 200 -
10.56.1.5 - - [11/Dec/2019 20:04:27] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:33] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:35] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:41] "POST / HTTP/1.1" 200 -
10.56.0.4 - - [11/Dec/2019 20:04:43] "POST / HTTP/1.1" 200 -
{"Total": {"Requests": 10, "Sent Bytes": 165740, "Received Bytes": 10360, "Attacks": 0, "SQLi": 0, "XSS": 0, "Directory Traversal": 0}, "Last 30 Seconds": {"Requests": 10, "Sent Bytes": 165740, "Received Bytes": 10360, "Attacks": 0, "SQLi": 0, "XSS": 0, "Directory Traversal": 0}}
10.56.1.5 - - [11/Dec/2019 20:04:47] "POST / HTTP/1.1" 200 -
...

You can see how the traffic is automatically being load balanced by the Kubernetes cluster by inspecting the request sources in the microsimserver logs.

Adding Micro-segmentation and Application Layer Protection

Stay tuned for the next post where we will take this simple, insecure deployment, and implement a Security Services Layer pattern. Then we’ll show how the internal application layer attacks are blocked with this approach. Finally, we will demonstrate micro-segmentation which will restrict access between microservices, for example, traffic between the auth and db services.

Note: Depending on your Google Cloud account status you may incur charges for the cluster, so remember to delete it from the GKE console when you are done. You may also need to delete any load balancer objects that were created by the deployment within GCP to avoid residual charges to your account.

Next in the series: Part 3

Featured

Microservice Security Design Patterns for Kubernetes (Part 1)

In this multi-part blog series, I will describe some microservice security design patterns to implement micro-segmentation and deep inspection in the interior of your Kubernetes cluster to further secure your microservice applications, not just the cluster. I will also demonstrate the design patterns with working Proof of Concept deployments that you can use as a starting point.
Follow up posts:
– Part 2 Setting up the Insecure Deployment
– Part 3 The Security Service Layer Pattern
– Part 4 The Security Sidecar Pattern
– Part 5 The Service Mesh Sidecar-on-Sidecar Pattern

There are many tutorials on microservice security and how to secure your Kubernetes clusters and they include many of the following topics:

These are worthy topics and they encompass many of the issues that are relevant to securing modern microservice architectures. But there are a couple of important items that I’d like to emphasize:

Controlling East/West traffic (layers 3 and 4) between Pods within the Kubernetes cluster (aka micro-segmentation)
Deep inspection of the application traffic (layers 5, 6, and 7) between Pods within the Kubernetes cluster (aka IPS or WAF)

These are concepts that have been around for a while in the traditional on-premises and virtualized data center world. Long ago it was recognized that it was no longer adequate to create a hard, crusty edge and leave a soft, gooey interior for attackers to exploit. The attack surface area can include vulnerabilities buried deep inside the application architecture that can be exploited. These include well-known OWASP top 10 web application attacks such as Cross-site Scripting (XSS), SQL Injection, Remote Code Execution (RCE), API attacks, and more.

Let’s discuss some microservice security patterns that can help.

Kubernetes Application Security Patterns

There are three fairly intuitive design patterns that I will be describing:

Security Service Layer Pattern
Security Sidecar Pattern
Service Mesh Security Plugin Pattern

I’ll be using the following simple Kubernetes deployment to show how we can layer micro-segmentation and application inspection within the cluster to provide better microservice security.

Figure 1: Simple Simulated Microservice Deployment

In this microservice architecture we see three simulated services:

Public Web interface service
Internal Authentication service
Internal Database service

I’m using my microservice traffic and attack generation simulator called microsim to provide a realistic environment with a majority of ‘normal’ JSON/HTTP traffic between services with occasional SQL Injection attack traffic from the WWW service to the internal Auth and DB services.

Now let’s get into the different design patterns.

Security Service Layer Pattern

The Security Service Layer Pattern is probably the simplest to understand, since it is analogous to how micro-segmentation and deep inspection are deployed in traditional environments.

Figure 2: Security Service Layer Pattern

In this design pattern we see the insertion of a security layer in front of each microservice. In this case we are using the official OWASP modsecurity-crs container on Docker Hub. This container provides WAF functionality with the OWASP Core Rule Set and will detect attacks over HTTP, including the simulated SQL Injection attack traffic between microservices. Layer 3 and 4 micro-segmentation is implemented via a network provider that supports Network Policy.

Some of the pros and cons of this design include:

Pros:

Simple to understand
Allows scaling of the security tiers independent of the microservices they are protecting
Treats application security as a microservice
No need to change microservice ports

Cons:

Creates additional services in the cluster
Adds traffic flow complexity
Requires more micro-segmentation rules

Security Sidecar Pattern

The Security Sidecar Pattern takes the concept of the Security Service Layer Pattern and collapses the additional services into the microservice Pods. Sidecar proxy containers, such as modsecurity, can be explicitly configured as part of the Deployment spec or can be injected into the Pods via MutatingWebhook.

Figure 3: Security Sidecar Pattern

In this design pattern we see the insertion of a security proxy container within each Pod, so both the security proxy and application containers are running in the same Pod. In this case we are also using the official OWASP modsecurity-crs container on Docker Hub. Layer 3 and 4 micro-segmentation is implemented via a network provider that supports Network Policy.

Some of the pros and cons of this design include:

Pros:

Simple to understand
Unifies the scaling of the security and application microservices
The security proxy can be automatically injected into the Pod
Works with an existing Service Mesh using the Sidecar on Sidecar pattern
Requires fewer micro-segmentation rules

Cons:

Requires the Security container and Application container to run on different TCP ports within the Pod
May result in over-provisioning of the security layer resources

Service Mesh Security Plugin Pattern

The Service Mesh Security Plugin Pattern takes the concept of the Security Sidecar Pattern but implements the security functionality as a plugin to the Service Mesh’s data plane sidecar container (e.g. Envoy in an Istio Service Mesh).

Figure 4: Service Mesh Security Plugin Pattern

In this design pattern we see the insertion of a service mesh data plane container (e.g. Envoy) within each Pod, so both the Service Mesh proxy and application containers are running in the same Pod. In this case the application layer inspection is handled through a modsecurity plugin for Envoy. Layer 3 and 4 micro-segmentation is implemented via the Service Mesh policy.

Some of the pros and cons of this design include:

Pros:

More cleanly extends security into an existing Service Mesh
Unifies the scaling of the security and application microservices
The Service Mesh proxy can be automatically injected into the Pod
Micro-segmentation rules can be implemented via Service Mesh policy
Service Mesh enables many advanced application delivery features

Cons:

Service Mesh deployments are more complex
May result in over-provisioning of the security layer resources

Secure Microservice POC Deployments

In my opinion, the Security Sidecar Pattern is the most convenient for small projects, but using a Service Mesh is probably a better idea for larger, more complex architectures. In some cases a hybrid approach will make more sense.

Leave a reply if you know of any other designs you’ve seen in the field! Stay tuned for future posts where I will demonstrate simple proof of concept implementations of these security design patterns.

Next in the series: Part 2

Featured

Bringing the Unix Philosophy to the 21st Century

Try the jc web demo!

Do One Thing Well

The Unix philosophy of using compact expert tools that do one thing well and pipelining them together to manipulate data is a great idea and has worked well for the past few decades. This philosophy was outlined in the 1978 Foreward to the Bell System Technical Journal describing the UNIX Time-Sharing System:

Foreward to the Bell System Technical Journal

Items i and ii are oft repeated, and for good reason. But it is time to take this philosophy to the 21st century by further defining a standard output format for non-interactive use.

Unfortunately, this is the state of things today if you want to grab the IP address of one of the ethernet interfaces on your linux system:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1

This is not beautiful.

Up until about 2013 it made just as much sense as anything to assume unstructured text was a good way to output data at the command line. Unix/linux has many text parsing tools like sed, awk, grep, tr, cut, rev, etc. that can be pipelined together to reformat the desired data before sending it to the next program. Of course, this has always been a pain and is the source of many questions all over the web about how to parse the output of so-and-so program. The requirement to parse unstructured (in some cases only human readable) data manually has made life much more difficult than it needs to be for the average linux administrator.

But in 2013 a certain data format called JSON was standardized as ECMA-404 and later in 2017 as RFC 8259 and ISO/IEC 21778:2017. JSON is ubiquitous these days in REST APIs and is used to serialize everything from data between web applications, to Indicators of Compromise in the STIX2 specification, to configuration files. There are JSON parsing libraries in all modern programming languages and even JSON parsing tools for the command line, like jq. JSON is everywhere, it’s easy to use, and it’s a standard.

Had JSON been around when I was born in the 1970’s Ken Thompson and Dennis Ritchie may very well have embraced it as a recommended output format to help programs “do one thing well” in a pipeline.

To that end, I argue that linux and all of its supporting GNU and non-GNU utilities should offer JSON output options. We already see some limited support of this in systemctl and the iproute2 utilities like ip where you can output in JSON format with the -j option. The problem is that many linux distros do not include a version that offers JSON output (e.g. centos, currently). And even then, not all functions support JSON output as shown below:

Here is ip addr with JSON output:

$ ip -j addr show dev ens33
 [{
         "addr_info": [{},{}]
     },{
         "ifindex": 2,
         "ifname": "ens33",
         "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
         "mtu": 1500,
         "qdisc": "fq_codel",
         "operstate": "UP",
         "group": "default",
         "txqlen": 1000,
         "link_type": "ether",
         "address": "00:0c:29:99:45:17",
         "broadcast": "ff:ff:ff:ff:ff:ff",
         "addr_info": [{
                 "family": "inet",
                 "local": "192.168.71.131",
                 "prefixlen": 24,
                 "broadcast": "192.168.71.255",
                 "scope": "global",
                 "dynamic": true,
                 "label": "ens33",
                 "valid_life_time": 1732,
                 "preferred_life_time": 1732
             },{
                 "family": "inet6",
                 "local": "fe80::20c:29ff:fe99:4517",
                 "prefixlen": 64,
                 "scope": "link",
                 "valid_life_time": 4294967295,
                 "preferred_life_time": 4294967295
             }]
     }
 ]

And here is ip route not outputting JSON, even with the -j flag:

$ ip -j route
 default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100 
 192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131 
 192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100

Some other more modern tools like, kubectl and the aws-cli tool offer more consistent JSON output options which allow much easier parsing and pipelining of the output. But there are many older tools that still output nearly unparsable text. (e.g. netstat, lsblk, ifconfig, iptables, etc.) Interestingly Windows PowerShell has embraced using structured data, and that’s a good thing that the linux community can learn from.

How do we move forward?

The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them. All operating system APIs, like the /proc and /sys filesystems should serialize their files in JSON or provide the data in an alternative API that outputs JSON.

https://github.com/kellyjonbrazil/jc

In the meantime, I have created a tool called jc (https://github.com/kellyjonbrazil/jc) that converts the output of dozens of GNU and non-GNU commands and configuration files to JSON. Instead of everyone needing to create their own custom parsers for these common utilities and files, jc acts as a central clearinghouse of parsing libraries that just need to be written once and can be used by everyone.

Try the jc web demo!

jc is now available as an Ansible filter plugin!

JC In Action

Here’s how jc can be used to make your life easier today. Let’s take that same example of grabbing an ethernet IP address from above:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138

And here’s how you do the same thing with jc and a CLI JSON parsing tool like jq:

$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138

$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138

Here’s another example of listing the listening TCP ports on the system:

$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
25
22

That’s a lot of text manipulation just to get a simple list of port numbers! Here’s the same thing using jc and jq:

$ netstat -tln | jc --netstat | jq '.[].local_port_num'
25
22

$ jc netstat -tln | jq '.[].local_port_num'
25
22

Notice how much more intuitive it is to search and compare semantically enhanced structured data vs. awkwardly parsing low-level text? Also, the JSON output can be preserved to be used by any higher-level programming language like Python or JavaScript without line parsing. This is the future, my friends!

jc currently supports the following parsers: arp, df, dig, env, free, /etc/fstab, history, /etc/hosts, ifconfig, iptables, jobs, ls, lsblk, lsmod, lsof, mount, netstat, ps, route, ss, stat, systemctl, systemctl list-jobs, systemctl list-sockets, systemctl list-unit-files, uname -a, uptime, and w.

Note: jc now supports over 100 programs and file-types.

If you have a recommendation for a command or file type that is not currently supported by jc, add it to the comments and I’ll see if I can figure out how to parse and serialize it. If you would like to contribute a parser, please feel free!

With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

Featured

Hello World!

I’m Kelly Brazil and I dabble in a few interests, including music and tech spanning network security, open source projects, song writing, guitar, and Oxford commas. I’m particularly proud of my most popular Open Source Software project, jc, which is used in production around the world to simplify the lives of engineers. I’ve recently rekindled my love of making and performing music and had a blast performing at my 50th birthday party. I also just released a new single on the major streaming platforms called Breaking Apart.

Welcome to my blog – I’ll get the urge to write on one of these topics from time to time and I encourage open dialogue.

Where to find me:

LinkedIn: https://www.linkedin.com/in/kellybrazil/
Bluesky: @kellyjonbrazil.bsky.social
Mastodon: @kellyjonbrazil@sfba.social
Facebook: https://www.facebook.com/kellyjbrazil
GitHub: https://github.com/kellyjonbrazil
Spotify: https://open.spotify.com/artist/6bxXRQonPQxl8s3L2X4F8e
Apple Music: https://music.apple.com/us/artist/kelly-brazil/1606380010

My Music Projects:

Breaking Apart (Single)
My 50th Birthday Party Performance (Live)
More live performances
Songs from when I was much younger

My Open Source Projects:

jc JSON Convert – converts the output of scores of popular commands and filetypes to JSON for easier parsing
- blog post
- github
jello JSON and JSON Lines processor (like jq but uses full python syntax)
- blog post
- github
- github pages
- pypi
- web demo
Jello Explorer (jellex) Interactive TUI JSON and JSON Lines processor using python syntax built on jello
- github
- pypi
- Video
jtbl prints JSON and JSON lines as plaintext tables in the terminal
- blog post
- github
- github pages
- pypi
microsim Simulated Microservice, Traffic, and Attack Generator (used in my Microservice Security Design Patterns for Kubernetes series)
- github
- github pages
- docker hub (microsimserver, microsimclient)
Open Cybersecurity Alliance OASIS Member

Media, Mentions, and Events:

Some of my prior blog posts can be found here:

New Song Release – Breaking Apart

Hi friends! Many of you know me from my work in the Cybersecurity industry, but you probably don’t know that writing and performing music was one of my first loves. I’ve recently gotten back into music a bit with my 50th birthday performance back in November and a new original song I recently wrote and recorded called Breaking Apart.

Breaking Apart is available on all of the major streaming platforms:

And more!

Though the song’s inspiration is a bit heavy as it deals with the topic of loss that affects us all, I really enjoyed the process of writing, recording, mixing, mastering, and distributing the track. It’s amazing how much the music business has changed over the years and how much is involved in aspects beyond just writing and recording a song.

I hope you enjoy Breaking Apart – it’s a song that has been in my head for quite a while and I’m glad to be able to finally share it.

Thanks for listening!

JC Version 1.22.0 Released

I’m excited to announce the release of jc version 1.22.0 available on github and pypi. jc now supports over 170 standard and streaming parsers. Thank you to the Open Source community for making this possible!

To upgrade with pip:

$ pip3 install --upgrade jc

Try the jc web demo!

Sections

What’s New

Add /proc file parsers for linux. Support for the following files:
/proc/buddyinfo
/proc/consoles
/proc/cpuinfo
/proc/crypto
/proc/devices
/proc/diskstats
/proc/filesystems
/proc/interrupts
/proc/iomem
/proc/ioports
/proc/loadavg
/proc/locks
/proc/meminfo
/proc/modules
/proc/mtrr
/proc/pagetypeinfo
/proc/partitions
/proc/slabinfo
/proc/softirqs
/proc/stat
/proc/swaps
/proc/uptime
/proc/version
/proc/vmallocinfo
/proc/vmstat
/proc/zoneinfo
/proc/driver/rtc
/proc/net/arp
/proc/net/dev
/proc/net/dev_mcast
/proc/net/if_inet6
/proc/net/igmp
/proc/net/igmp6
/proc/net/ipv6_route
/proc/net/netlink
/proc/net/netstat
/proc/net/packet
/proc/net/protocols
/proc/net/route
/proc/net/unix
/proc/<pid>/fdinfo/<fd>
/proc/<pid>/io
/proc/<pid>/maps
/proc/<pid>/mountinfo
/proc/<pid>/numa_maps
/proc/<pid>/smaps
/proc/<pid>/stat
/proc/<pid>/statm
/proc/<pid>/status
Magic syntax support for /proc files
Enhance bash and zsh autocompletions for /proc files, including the Magic syntax
Enhance metadata output to output metadata even when results are empty
Enhance free parser to support -w option integer conversions
Fix ini and kv parsers so they don’t change key-names to lower case
NOTE: This can be a breaking change in your scripts
Fix id command parser to allow usernames and group names with spaces

New Features

Metadata Enhancement

The --meta-out option was introduced in jc version 1.21.0. This feature has been enhanced to ensure metadata is always output – even if there are no results. for example, when running arp on macOS without the required -a flag:

% jc --meta-out --pretty arp
usage: arp [-n] [-i interface] hostname
       arp [-n] [-i interface] [-l] -a
       arp -d hostname [pub] [ifscope interface]
       arp -d [-i interface] -a
       arp -s hostname ether_addr [temp] [reject] [blackhole] [pub [only]] [ifscope interface]
       arp -S hostname ether_addr [temp] [reject] [blackhole] [pub [only]] [ifscope interface]
       arp -f filename
[
  {
    "_jc_meta": {
      "parser": "arp",
      "timestamp": 1664302590.903263,
      "magic_command": [
        "arp"
      ],
      "magic_command_exit": 1
    }
  }
]

New Parsers

`/proc` File Parsers

Many new /proc file parsers have been added and more will be released in future versions of jc. The easiest way to use these parsers is by invoking the --proc parser. The --proc parser (Documentation) will analyze the input data and select the correct /proc parser automatically:

% cat /proc/uptime | jc --proc -p
{
  "up_time": 46901.13,
  "idle_time": 46856.66
}

It is possible to manually select the /proc file parser by designating the specific parser. For example:

% cat /proc/uptime | jc --proc-uptime -p
{
  "up_time": 46901.13,
  "idle_time": 46856.66
}

Finally, you can also use the Magic syntax to convert /proc files:

% jc -p /proc/uptime
{
  "up_time": 46901.13,
  "idle_time": 46856.66
}

The bash and zsh autocompletion scripts have been updated to allow autocompletion of all of the new /proc file parsers and the /proc file magic syntax.

Individual /proc file parsers are hidden from jc --help. To find the low-level parser names, you can use jc -hh or jc --about:

% jc -hh
jc converts the output of many commands, file-types, and strings to JSON or YAML

Usage:

    Standard syntax:

        COMMAND | jc [OPTIONS] PARSER

        cat FILE | jc [OPTIONS] PARSER

        echo STRING | jc [OPTIONS] PARSER

    Magic syntax:

        jc [OPTIONS] COMMAND

        jc [OPTIONS] /proc/<path-to-procfile>

Parsers:
<snip>
    --ping              `ping` and `ping6` command parser
    --ping-s            `ping` and `ping6` command streaming parser
    --pip-list          `pip list` command parser
    --pip-show          `pip show` command parser
    --plist             PLIST file parser
    --postconf          `postconf -M` command parser
    --proc              `/proc/` file parser
    --proc-buddyinfo    `/proc/buddyinfo` file parser
    --proc-consoles     `/proc/consoles` file parser
    --proc-cpuinfo      `/proc/cpuinfo` file parser
    --proc-crypto       `/proc/crypto` file parser
    --proc-devices      `/proc/devices` file parser
    --proc-diskstats    `/proc/diskstats` file parser
    --proc-filesystems  `/proc/filesystems` file parser
    --proc-interrupts   `/proc/interrupts` file parser
    --proc-iomem        `/proc/iomem` file parser
<snip>

Options:
    -a,  --about        about jc
    -C,  --force-color  force color output even when using pipes (overrides -m)
    -d,  --debug        debug (double for verbose debug)
    -h,  --help         help (--help --parser_name for parser documentation)
    -m,  --monochrome   monochrome output
    -M,  --meta-out     add metadata to output including timestamp, etc.
    -p,  --pretty       pretty print output
    -q,  --quiet        suppress warnings (double to ignore streaming errors)
    -r,  --raw          raw output
    -u,  --unbuffer     unbuffer output
    -v,  --version      version info
    -y,  --yaml-out     YAML output
    -B,  --bash-comp    gen Bash completion: jc -B > /etc/bash_completion.d/jc
    -Z,  --zsh-comp     gen Zsh completion: jc -Z > "${fpath[1]}/_jc"

Examples:
    Standard Syntax:
        $ dig www.google.com | jc --pretty --dig
        $ cat /proc/meminfo | jc --pretty --proc

    Magic Syntax:
        $ jc --pretty dig www.google.com
        $ jc --pretty /proc/meminfo

    Parser Documentation:
        $ jc --help --dig

    Show Hidden Parsers:
        $ jc -hh

You can find the schema for each /proc parser with the same help syntax:

% jc --help --proc-uptime

v1.22.1 Updates

Add udevadm command parser
Add lspci command parser
Add pci.ids file parser
Fix proc-pid-stat parser for command names with spaces and newlines
Enhance ip-address parser to add ip_split field
Rename iso-datetime parser to datetime-iso. A deprecation warning will display until iso-datetime is removed in a future version.
Optimize performance of calculated timestamps
Add support for deprecating parsers
Move jc-web demo site from heroku to render.com

In addition to three new parsers, this release includes several enhancements and fixes. The proc-pid-stat parser now supports command names that contain spaces and newlines. The ip-address string parser now includes a new ip-split field which splits each IPv4 octet or IPv6 segment into its own string within a list:

$ echo 192.168.2.10/24 | jc --ip-address -p
{
  "version": 4,
  "max_prefix_length": 32,
  "ip": "192.168.2.10",
  "ip_compressed": "192.168.2.10",
  "ip_exploded": "192.168.2.10",
  "ip_split": [
    "192",
    "168",
    "2",
    "10"
  ],
  ...
}

The iso-datetime parser has been renamed to datetime-iso. Don’t worry, the old name still works, but will print a deprecation warning to STDERR. This change introduces the new parser deprecation functionality. Deprecated parsers are identified with a deprecated field in the --about JSON output.

Some performance optimizations have been included for calculated timestamps. And, finally, we have moved the jc-web demo from Heroku to Render due to pricing changes.

`udevadm` command parser

A new udevadm linux command parser has been released. (Documentation)

$ udevadm info --query=all /dev/sda | jc --udevadm -p
{
  "P": "/devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda",
  "N": "sda",
  "L": 0,
  "S": [
    "disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0"
  ],
  "E": {
    "DEVPATH": "/devices/pci0000:00/0000:00:10.0/host32/target32:0:0/32:0:0:0/block/sda",
    "DEVNAME": "/dev/sda",
    "DEVTYPE": "disk",
    "MAJOR": "8",
    "MINOR": "0",
    "SUBSYSTEM": "block",
    "USEC_INITIALIZED": "6100111",
    "SCSI_TPGS": "0",
    "SCSI_TYPE": "disk",
    "SCSI_VENDOR": "VMware,",
    "SCSI_VENDOR_ENC": "VMware,\\x20",
    "SCSI_MODEL": "VMware_Virtual_S",
    "SCSI_MODEL_ENC": "VMware\\x20Virtual\\x20S",
    "SCSI_REVISION": "1.0",
    "ID_SCSI": "1",
    "ID_VENDOR": "VMware_",
    "ID_VENDOR_ENC": "VMware\\x2c\\x20",
    "ID_MODEL": "VMware_Virtual_S",
    "ID_MODEL_ENC": "VMware\\x20Virtual\\x20S",
    "ID_REVISION": "1.0",
    "ID_TYPE": "disk",
    "MPATH_SBIN_PATH": "/sbin",
    "ID_BUS": "scsi",
    "ID_PATH": "pci-0000:00:10.0-scsi-0:0:0:0",
    "ID_PATH_TAG": "pci-0000_00_10_0-scsi-0_0_0_0",
    "ID_PART_TABLE_UUID": "a5bd0c01-4210-46f2-b558-5c11c209a8f7",
    "ID_PART_TABLE_TYPE": "gpt",
    "DEVLINKS": "/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0",
    "TAGS": ":systemd:"
  }
}

`lspci` command parser

A new lspci linux command parser has been added. (Documentation)

$ lspci -nnmmv | jc --lspci -p
[
  {
    "slot": "ff:02:05.0",
    "domain": "ff",
    "domain_int": 255,
    "bus": "02",
    "bus_int": 2,
    "dev": "05",
    "dev_int": 5,
    "function": "0",
    "function_int": 0,
    "class": "SATA controller",
    "class_id": "0106",
    "class_id_int": 262,
    "vendor": "VMware",
    "vendor_id": "15ad",
    "vendor_id_int": 5549,
    "device": "SATA AHCI controller",
    "device_id": "07e0",
    "device_id_int": 2016,
    "svendor": "VMware",
    "svendor_id": "15ad",
    "svendor_id_int": 5549,
    "sdevice": "SATA AHCI controller",
    "sdevice_id": "07e0",
    "sdevice_id_int": 2016,
    "physlot": "37",
    "physlot_int": 55,
    "progif": "01",
    "progif_int": 1
  },
  ...
]

`pci.ids` file parser

A new file parser for the pci.ids database file has been added. (Documentation) This parser allows you to use tools like jq to query the database. The pci.ids database file can be found here: https://raw.githubusercontent.com/pciutils/pciids/master/pci.ids

$ cat pci.ids | jc --pci-ids | jq '.vendors._001c._0001._001c._0005.subsystem_name'
"2 Channel CAN Bus SJC1000 (Optically Isolated)"

$ cat pci.ids | jc --pci-ids | jq '.classes._0c._03._40'
"USB4 Host Interface"

v1.22.2 Updates

Add sshd-conf parser for sshd configuration files and sshd -T output
Add findmnt command parser
Add git ls-remote command parser
Add os-prober command parser
Add SemVer string parser
Enhance the ifconfig parser so it can output multiple IPv4 and IPv6 addresses
Enhance the ifconfig parser so it can output additional fields common on BSD
Enhance XML file parser with optional _ prefix for attributes instead of @ by using the --raw option. This can make it easier to filter the JSON output in some tools.
Fix the XML file parser to output a normal Dictionary instead of OrderdDict. This cleans up YAML output. (No !!omap comments)
Fix the standard and streaming CSV parsers for UTF-8 encoded CSV files with leading BOM bytes
Fix exit code to be non-zero on keyboard interrupt
Allow parser module objects to be used as arguments to jc.get_help() and jc.parser_info()
Catch unexpected exceptions in the CLI
Add error message on keyboard interrupt to STDERR
Add python 3.11 tests to GitHub actions

`sshd-conf` file and command parser

A new sshd-conf file and sshd -T parser has been added. (Documentation)

$ sshd -T | jc --sshd-conf -p
{
  "acceptenv": [
    "LANG",
    "LC_*"
  ],
  "addressfamily": "any",
  "allowagentforwarding": "yes",
  "allowstreamlocalforwarding": "yes",
  "allowtcpforwarding": "yes",
  "authenticationmethods": "any",
  "authorizedkeyscommand": "none",
  "authorizedkeyscommanduser": "none",
  "authorizedkeysfile": [
    ".ssh/authorized_keys",
    ".ssh/authorized_keys2"
  ],
  "authorizedprincipalscommand": "none",
  ...
}

`findmnt` command parser

A new findmnt command parser has been added. (Documentation)

$ findmnt | jc --findmnt -p
[
  {
    "target": "/",
    "source": "/dev/mapper/centos-root",
    "fstype": "xfs",
    "options": [
      "rw",
      "relatime",
      "seclabel",
      "attr2",
      "inode64",
      "noquota"
    ]
  },
  {
    "target": "/sys/fs/cgroup",
    "source": "tmpfs",
    "fstype": "tmpfs",
    "options": [
      "ro",
      "nosuid",
      "nodev",
      "noexec",
      "seclabel"
    ],
    "kv_options": {
      "mode": "755"
    }
  },
  ...
]

`git ls-remote` command parser

A new git ls-remote command parser has been added. (Documentation)

$ git ls-remote | jc --git-ls-remote -p
{
  "HEAD": "214cd6b9e09603b3c4fa02203b24fb2bc3d4e338",
  "refs/heads/dev": "b884f6aacca39e05994596d8fdfa7e7c4f1e0389",
  "refs/heads/master": "214cd6b9e09603b3c4fa02203b24fb2bc3d4e338",
  "refs/pull/1/head": "e416c77bed1267254da972b0f95b7ff1d43fccef",
  ...
}

$ git ls-remote | jc --git-ls-remote -p -r
[
  {
    "reference": "HEAD",
    "commit": "214cd6b9e09603b3c4fa02203b24fb2bc3d4e338"
  },
  {
    "reference": "refs/heads/dev",
    "commit": "b884f6aacca39e05994596d8fdfa7e7c4f1e0389"
  },
  ...
]

`os-prober` command parser

A new os-prober command parser has been added. (Documentation)

$ os-prober | jc --os-prober -p
{
  "partition": "/dev/sda1",
  "name": "Windows 10",
  "short_name": "Windows",
  "type": "chain"
}

Semantic Version string parser

A new Semantic Version string parser has been added. (Documentation)

$ echo 1.2.3-rc.1+44837 | jc --semver -p
{
  "major": 1,
  "minor": 2,
  "patch": 3,
  "prerelease": "rc.1",
  "build": "44837"
}

`ifconfig` command parser enhancements

The ifconfig command parser has been enhanced to support multiple IPv4 and IPv6 addresses. Also many more BSD/macOS fields are parsed.

$ ifconfig | jc --ifconfig -p
[
  { 
    "name": "en0",
    "flags": 8863,
    "state": [
      "UP",
      "BROADCAST",
      "SMART",
      "RUNNING",
      "SIMPLEX",
      "MULTICAST"
    ],
    "mtu": 1500,
    "type": null,
    "mac_addr": "f0:18:98:03:d9:30",
    "ipv4_addr": "192.168.1.72",
    "ipv4_mask": "255.255.255.0",
    "ipv4_bcast": "192.168.1.255",
    "ipv6_addr": "fe80::8b7:1281:7499:b504",
    "ipv6_mask": 64,
    "ipv6_scope": "0x8",
    "ipv6_type": null,
    "metric": null,
    "rx_packets": null,
    "rx_errors": null,
    "rx_dropped": null,
    "rx_overruns": null,
    "rx_frame": null,
    "tx_packets": null,
    "tx_errors": null,
    "tx_dropped": null,
    "tx_overruns": null,
    "tx_carrier": null,
    "tx_collisions": null,
    "rx_bytes": null,
    "tx_bytes": null,
    "nd6_options": 201,
    "nd6_flags": [
      "PERFORMNUD",
      "DAD"
    ],
    "status": "active",
    "ipv4": [
      {
        "address": "192.168.1.72",
        "mask": "255.255.255.0",
        "broadcast": "192.168.1.255"
      }
    ],
    "ipv6": [
      {
        "address": "fe80::8b7:1281:7499:b504",
        "mask": 64,
        "scope": "0x8"
      }
    ]
  },
  ...
]

v1.22.3 Updates

Add Common Log Format and Combined Log Format file parser (standard and streaming)
Add PostgreSQL password file parser
Add openvpn-status.log file parser
Add cbt command parser (Google Big Table)
Enhance ifconfig parser with more information on BSD
Fix ifconfig parser to capture some IPv6 addresses missed on BSD
Fix git-log and git-log-s parsers for failure on empty author name
Update os-prober parser with split EFI partition fields

v1.22.4 Updates

Add iwconfig command parser
Add NeXTSTEP format support to the PLIST file parser
Fix proc parser magic signature detection for /proc/pid/stat hacks
Fix x509-cert parser for string serial numbers
Add category tags to parser metadata: generic, standard, file, string, binary, command
Add “list parsers by category” view to help
Fix python 3.6-related issues
Add python 3.6 to automated tests

v1.22.5 Updates

Add TOML file parser
Add INI with duplicate key support file parser
Add AIX support for the arp command parser
Add AIX support for the mount command parser
Fix lsusb command parser when extra hub port status information is output
Fix INI file parser to include top-level values with no section header
Fix INI file parser to not specially handle the [DEFAULT] section
Fix INI file and Key/Value parsers to only remove one quotation mark from the
beginning and end of values.

Happy parsing!

URL Parsing

POSIX Path Parsing

Exploring Git Log Output

Your New Favorite Subnet Calculator

Exploring X.509 Certificates

Converting Datetimes and Timestamps

Displaying valid interface options

Grab the selected interface IP and subnet mask

Grab detailed subnet information for the IP/Mask

Sanity check the subnet size

Grab the start time in ISO and Unix Epoch format

Show the user what is going to happen

The main loop

Grab the end time

Grab the number of alive hosts

Print the summary message

Conclusion

What’s New

New Features

New Parsers

IP Address string parser

Syslog string parser (RFC 5424)

Syslog string streaming parser (RFC 5424)

Syslog string parser (BSD-style RFC 3164)

Syslog string streaming parser (BSD-style RFC 3164)

CEF string parser

CEF string streaming parser

PLIST file parser

mdadm command parser

Converting DER Certificate Files to JSON

Converting PEM Certificate Files to JSON

Converting PKCS #7 Certificate Files to JSON

Converting PKCS #12 Certificate Files to JSON

Converting Certificate Signing Request (CSR) Files to JSON

Using in a Script

What’s New

New Parsers

top -b command parser

top -b command streaming parser

v1.20.1 Updates

postconf -M command parser

Long Options

Shell Completions

Bash

Linux

macOS

Zsh

Linux and macOS

v1.20.2 Updates

gpg –with-colons command parser

X.509 DER/PEM Certificate Files

v1.20.4 Updates

URL string parser

Email Address string parser

JWT string parser

ISO 8601 Datetime string parser

UNIX Epoch Timestamp string parser

M3U/M3U8 file parser

Bash

Elvish

Fish

Murex

NGS

Nushell

Oil

PowerShell

Windows Command Prompt (cmd.exe)

Conclusion

A Better git log Parser

What’s New

New Parsers

git log command streaming parser

chage --list command parser

New Features

v1.18.2 Updates

v1.18.3 Updates

rsync command parser

rsync command streaming parser

xrandr command parser

v1.18.4 Updates

`mdadm` command parser

`top -b` command parser

`top -b` command streaming parser

`postconf -M` command parser

Windows Command Prompt (`cmd.exe`)

A Better `git log` Parser

`git log` command streaming parser

`chage --list` command parser

`rsync` command parser

`rsync` command streaming parser

`xrandr` command parser

`nmcli` command parser

`pidstat` command parser

`pidstat` command streaming parser

`asciitable` ASCII and Unicode table parser

`asciitable-m` multi-line ASCII and Unicode table parser

`git-log` command parser

`update-alternatives --query` command parser

`update-alternatives --get-selections` command parser

`vmstat` command parser

`vmstat-s` streaming command parser

`ls-s` streaming command parser

`ping-s` streaming command parser

`lsusb` command parser

`csv-s` streaming command parser

`iostat` command parser

`iostat-s` streaming command parser

`zipinfo` command parser

`jar-manifest` file parser

`stat-s` streaming command parser