Bringing the Unix Philosophy to the 21st Century

Try the jc web demo!

Do One Thing Well

The Unix philosophy of using compact expert tools that do one thing well and pipelining them together to manipulate data is a great idea and has worked well for the past few decades. This philosophy was outlined in the 1978 Foreward to the Bell System Technical Journal describing the UNIX Time-Sharing System:

Foreward to the Bell System Technical Journal

Items i and ii are oft repeated, and for good reason. But it is time to take this philosophy to the 21st century by further defining a standard output format for non-interactive use.

Unfortunately, this is the state of things today if you want to grab the IP address of one of the ethernet interfaces on your linux system:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1

This is not beautiful.

Up until about 2013 it made just as much sense as anything to assume unstructured text was a good way to output data at the command line. Unix/linux has many text parsing tools like sed, awk, grep, tr, cut, rev, etc. that can be pipelined together to reformat the desired data before sending it to the next program. Of course, this has always been a pain and is the source of many questions all over the web about how to parse the output of so-and-so program. The requirement to parse unstructured (in some cases only human readable) data manually has made life much more difficult than it needs to be for the average linux administrator.

But in 2013 a certain data format called JSON was standardized as ECMA-404 and later in 2017 as RFC 8259 and ISO/IEC 21778:2017. JSON is ubiquitous these days in REST APIs and is used to serialize everything from data between web applications, to Indicators of Compromise in the STIX2 specification, to configuration files. There are JSON parsing libraries in all modern programming languages and even JSON parsing tools for the command line, like jq. JSON is everywhere, it’s easy to use, and it’s a standard.

Had JSON been around when I was born in the 1970’s Ken Thompson and Dennis Ritchie may very well have embraced it as a recommended output format to help programs “do one thing well” in a pipeline.

To that end, I argue that linux and all of its supporting GNU and non-GNU utilities should offer JSON output options. We already see some limited support of this in systemctl and the iproute2 utilities like ip where you can output in JSON format with the -j option. The problem is that many linux distros do not include a version that offers JSON output (e.g. centos, currently). And even then, not all functions support JSON output as shown below:

Here is ip addr with JSON output:

$ ip -j addr show dev ens33
 [{
         "addr_info": [{},{}]
     },{
         "ifindex": 2,
         "ifname": "ens33",
         "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
         "mtu": 1500,
         "qdisc": "fq_codel",
         "operstate": "UP",
         "group": "default",
         "txqlen": 1000,
         "link_type": "ether",
         "address": "00:0c:29:99:45:17",
         "broadcast": "ff:ff:ff:ff:ff:ff",
         "addr_info": [{
                 "family": "inet",
                 "local": "192.168.71.131",
                 "prefixlen": 24,
                 "broadcast": "192.168.71.255",
                 "scope": "global",
                 "dynamic": true,
                 "label": "ens33",
                 "valid_life_time": 1732,
                 "preferred_life_time": 1732
             },{
                 "family": "inet6",
                 "local": "fe80::20c:29ff:fe99:4517",
                 "prefixlen": 64,
                 "scope": "link",
                 "valid_life_time": 4294967295,
                 "preferred_life_time": 4294967295
             }]
     }
 ]

And here is ip route not outputting JSON, even with the -j flag:

$ ip -j route
 default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100 
 192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131 
 192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100

Some other more modern tools like, kubectl and the aws-cli tool offer more consistent JSON output options which allow much easier parsing and pipelining of the output. But there are many older tools that still output nearly unparsable text. (e.g. netstat, lsblk, ifconfig, iptables, etc.) Interestingly Windows PowerShell has embraced using structured data, and that’s a good thing that the linux community can learn from.

How do we move forward?

The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them. All operating system APIs, like the /proc and /sys filesystems should serialize their files in JSON or provide the data in an alternative API that outputs JSON.

https://github.com/kellyjonbrazil/jc

In the meantime, I have created a tool called jc (https://github.com/kellyjonbrazil/jc) that converts the output of dozens of GNU and non-GNU commands and configuration files to JSON. Instead of everyone needing to create their own custom parsers for these common utilities and files, jc acts as a central clearinghouse of parsing libraries that just need to be written once and can be used by everyone.

Try the jc web demo!

jc is now available as an Ansible filter plugin!

JC In Action

Here’s how jc can be used to make your life easier today and until GNU/linux brings the Unix philosophy into the 21st century. Let’s take that same example of grabbing an ethernet IP address from above:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138

And here’s how you do the same thing with jc and a CLI JSON parsing tool like jq:

$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138

or

$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138

Here’s another example of listing the listening TCP ports on the system:

$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
25
22

That’s a lot of text manipulation just to get a simple list of port numbers! Here’s the same thing using jc and jq:

$ netstat -tln | jc --netstat | jq '.[].local_port_num'
25
22

or

$ jc netstat -tln | jq '.[].local_port_num'
25
22

Notice how much more intuitive it is to search and compare semantically enhanced structured data vs. awkwardly parsing low-level text? Also, the JSON output can be preserved to be used by any higher-level programming language like Python or JavaScript without line parsing. This is the future, my friends!

jc currently supports the following parsers: arp, df, dig, env, free, /etc/fstab, history, /etc/hosts, ifconfig, iptables, jobs, ls, lsblk, lsmod, lsof, mount, netstat, ps, route, ss, stat, systemctl, systemctl list-jobs, systemctl list-sockets, systemctl list-unit-files, uname -a, uptime, and w.

If you have a recommendation for a command or file type that is not currently supported by jc, add it to the comments and I’ll see if I can figure out how to parse and serialize it. If you would like to contribute a parser, please feel free!

With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

Published by kellyjonbrazil

I'm a cybersecurity and cloud computing nerd.

15 thoughts on “Bringing the Unix Philosophy to the 21st Century

  1. I like the JC tool very much and make much sense for JSON usage. Remember, UNIX tools have been tested through time and time. JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level. Text format is fundamental for any file system, and the ability to read and write data in the text requires very minimal. The text way does not require any additional tooling or programs out of the standards across the system to read or write. I agree that having an additional output for JSON will increase the flexibility with the shell environment and the interaction with your system or computer. /proc and /sys are pseudo files and provide the minimal, adding, let say JSON serialize, will introduce complexity to the technology. Is all data in the form of text anyway or no? Wouldnt JSON create complexity to edit? Is read more critical when editing to writing data? Don’t get me wrong I do believe there should be improvements, but the questions of moving forward are much significant than converting data to JSON format.

    P.S great article to create critical thinking

    1. JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level

      I disagree. The complexity of manipulating unstructured text eventually just gets shoved elsewhere. We have tools like sort instead of each command having a --sort flag because it composes better. Taking the task of filtering/sorting data away from commands like ls and moving it to a separate tool that can ‘do one thing and do it well’ is a step in the same direction.

      Example: what do foo -ag5 -k or bar -sd -1 do? Hard to say. But with a different version:

      foo -j | jq 'select(.amount > 5) | map(.bytes / 1024)'
      bar -j | jq 'sort_by(.date) | del(.level > 1)'

      Suddenly It’s easier to tell.

      The programmer doesn’t have to write those extra flags, the user doesn’t have to learn and remember them. That’s why the json versions of the ifconfig and netstat commands in the article are cleaner, too.

      Here’s some other things of note:

      Compatibility becomes easier to handle. Take a look at this failure of unstructured text:

      https://bugzilla.kernel.org/show_bug.cgi?id=197353

      A structured version could have had a ‘version’ field so that other tools could handle legacy behavior.

      It doesn’t have to be json:

      https://github.com/json-next/awesome-json-next
      https://github.com/dbohdan/structured-text-tools

      Recutils is cool by the way 😀

      As another example, Imagine you wanted to backup your ~’s metadata (an ongoing side project that I’ll probably eventually turn into a blog thingy). You could do find ~ > files.txt, but what if you want to add more fields that can be queried and modified? The tool I use (github.com/leahneukirchen/lr). has string formatting, a great alternative to piles of filter flags:

      lr -Uf '{"path":"%P","type":"%y", "sz":%s, "entries": %e, "depth":%d, "mtime":%As, "target":"%l"} \n' $1

      I then ‘refine’ it with this bit of jq:

      (.path, .target) |= split("/") | if .type != "l" then del(.target) else . end | if .type != "d" then del(.entries) else . end

      The results look like this:

      {"path":["foo"],"type":"d","sz":4096,"entries":2,"depth":1,"mtime":1589584272}
      {"path":["foo","bar.symlink"],"type":"l","sz":11,"depth":2,"mtime":1589673357,"target":["foo","bar.txt"]}

      And I can do queries like:

      select(.depth > 1 and .type == "f")

      This becomes much more useful with larger dirs, of course. And If I wanted to, say, add a "backed_up": true/false field, I could do that easily.

Leave a Reply

RSS
Follow by Email
LinkedIn
LinkedIn
Share
%d bloggers like this: