Bringing the Unix Philosophy to the 21st Century

Try the jc web demo!

Do One Thing Well

The Unix philosophy of using compact expert tools that do one thing well and pipelining them together to manipulate data is a great idea and has worked well for the past few decades. This philosophy was outlined in the 1978 Foreward to the Bell System Technical Journal describing the UNIX Time-Sharing System:

Foreward to the Bell System Technical Journal

Items i and ii are oft repeated, and for good reason. But it is time to take this philosophy to the 21st century by further defining a standard output format for non-interactive use.

Unfortunately, this is the state of things today if you want to grab the IP address of one of the ethernet interfaces on your linux system:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1

This is not beautiful.

Up until about 2013 it made just as much sense as anything to assume unstructured text was a good way to output data at the command line. Unix/linux has many text parsing tools like sed, awk, grep, tr, cut, rev, etc. that can be pipelined together to reformat the desired data before sending it to the next program. Of course, this has always been a pain and is the source of many questions all over the web about how to parse the output of so-and-so program. The requirement to parse unstructured (in some cases only human readable) data manually has made life much more difficult than it needs to be for the average linux administrator.

But in 2013 a certain data format called JSON was standardized as ECMA-404 and later in 2017 as RFC 8259 and ISO/IEC 21778:2017. JSON is ubiquitous these days in REST APIs and is used to serialize everything from data between web applications, to Indicators of Compromise in the STIX2 specification, to configuration files. There are JSON parsing libraries in all modern programming languages and even JSON parsing tools for the command line, like jq. JSON is everywhere, it’s easy to use, and it’s a standard.

Had JSON been around when I was born in the 1970’s Ken Thompson and Dennis Ritchie may very well have embraced it as a recommended output format to help programs “do one thing well” in a pipeline.

To that end, I argue that linux and all of its supporting GNU and non-GNU utilities should offer JSON output options. We already see some limited support of this in systemctl and the iproute2 utilities like ip where you can output in JSON format with the -j option. The problem is that many linux distros do not include a version that offers JSON output (e.g. centos, currently). And even then, not all functions support JSON output as shown below:

Here is ip addr with JSON output:

$ ip -j addr show dev ens33
 [{
         "addr_info": [{},{}]
     },{
         "ifindex": 2,
         "ifname": "ens33",
         "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
         "mtu": 1500,
         "qdisc": "fq_codel",
         "operstate": "UP",
         "group": "default",
         "txqlen": 1000,
         "link_type": "ether",
         "address": "00:0c:29:99:45:17",
         "broadcast": "ff:ff:ff:ff:ff:ff",
         "addr_info": [{
                 "family": "inet",
                 "local": "192.168.71.131",
                 "prefixlen": 24,
                 "broadcast": "192.168.71.255",
                 "scope": "global",
                 "dynamic": true,
                 "label": "ens33",
                 "valid_life_time": 1732,
                 "preferred_life_time": 1732
             },{
                 "family": "inet6",
                 "local": "fe80::20c:29ff:fe99:4517",
                 "prefixlen": 64,
                 "scope": "link",
                 "valid_life_time": 4294967295,
                 "preferred_life_time": 4294967295
             }]
     }
 ]

And here is ip route not outputting JSON, even with the -j flag:

$ ip -j route
 default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100 
 192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131 
 192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100

Some other more modern tools like, kubectl and the aws-cli tool offer more consistent JSON output options which allow much easier parsing and pipelining of the output. But there are many older tools that still output nearly unparsable text. (e.g. netstat, lsblk, ifconfig, iptables, etc.) Interestingly Windows PowerShell has embraced using structured data, and that’s a good thing that the linux community can learn from.

How do we move forward?

The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them. All operating system APIs, like the /proc and /sys filesystems should serialize their files in JSON or provide the data in an alternative API that outputs JSON.

https://github.com/kellyjonbrazil/jc

In the meantime, I have created a tool called jc (https://github.com/kellyjonbrazil/jc) that converts the output of dozens of GNU and non-GNU commands and configuration files to JSON. Instead of everyone needing to create their own custom parsers for these common utilities and files, jc acts as a central clearinghouse of parsing libraries that just need to be written once and can be used by everyone.

Try the jc web demo!

jc is now available as an Ansible filter plugin!

JC In Action

Here’s how jc can be used to make your life easier today and until GNU/linux brings the Unix philosophy into the 21st century. Let’s take that same example of grabbing an ethernet IP address from above:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138

And here’s how you do the same thing with jc and a CLI JSON parsing tool like jq:

$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138

or

$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138

Here’s another example of listing the listening TCP ports on the system:

$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
25
22

That’s a lot of text manipulation just to get a simple list of port numbers! Here’s the same thing using jc and jq:

$ netstat -tln | jc --netstat | jq '.[].local_port_num'
25
22

or

$ jc netstat -tln | jq '.[].local_port_num'
25
22

Notice how much more intuitive it is to search and compare semantically enhanced structured data vs. awkwardly parsing low-level text? Also, the JSON output can be preserved to be used by any higher-level programming language like Python or JavaScript without line parsing. This is the future, my friends!

jc currently supports the following parsers: arp, df, dig, env, free, /etc/fstab, history, /etc/hosts, ifconfig, iptables, jobs, ls, lsblk, lsmod, lsof, mount, netstat, ps, route, ss, stat, systemctl, systemctl list-jobs, systemctl list-sockets, systemctl list-unit-files, uname -a, uptime, and w.

Note: jc now supports scores of programs and file-types.

If you have a recommendation for a command or file type that is not currently supported by jc, add it to the comments and I’ll see if I can figure out how to parse and serialize it. If you would like to contribute a parser, please feel free!

With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

Published by kellyjonbrazil

I'm a cybersecurity and cloud computing nerd.

11 thoughts on “Bringing the Unix Philosophy to the 21st Century

  1. I like the JC tool very much and make much sense for JSON usage. Remember, UNIX tools have been tested through time and time. JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level. Text format is fundamental for any file system, and the ability to read and write data in the text requires very minimal. The text way does not require any additional tooling or programs out of the standards across the system to read or write. I agree that having an additional output for JSON will increase the flexibility with the shell environment and the interaction with your system or computer. /proc and /sys are pseudo files and provide the minimal, adding, let say JSON serialize, will introduce complexity to the technology. Is all data in the form of text anyway or no? Wouldnt JSON create complexity to edit? Is read more critical when editing to writing data? Don’t get me wrong I do believe there should be improvements, but the questions of moving forward are much significant than converting data to JSON format.

    P.S great article to create critical thinking

    1. JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level

      I disagree. The complexity of manipulating unstructured text eventually just gets shoved elsewhere. We have tools like sort instead of each command having a --sort flag because it composes better. Taking the task of filtering/sorting data away from commands like ls and moving it to a separate tool that can ‘do one thing and do it well’ is a step in the same direction.

      Example: what do foo -ag5 -k or bar -sd -1 do? Hard to say. But with a different version:

      foo -j | jq 'select(.amount > 5) | map(.bytes / 1024)'
      bar -j | jq 'sort_by(.date) | del(.level > 1)'

      Suddenly It’s easier to tell.

      The programmer doesn’t have to write those extra flags, the user doesn’t have to learn and remember them. That’s why the json versions of the ifconfig and netstat commands in the article are cleaner, too.

      Here’s some other things of note:

      Compatibility becomes easier to handle. Take a look at this failure of unstructured text:

      https://bugzilla.kernel.org/show_bug.cgi?id=197353

      A structured version could have had a ‘version’ field so that other tools could handle legacy behavior.

      It doesn’t have to be json:

      https://github.com/json-next/awesome-json-next
      https://github.com/dbohdan/structured-text-tools

      Recutils is cool by the way 😀

      As another example, Imagine you wanted to backup your ~’s metadata (an ongoing side project that I’ll probably eventually turn into a blog thingy). You could do find ~ > files.txt, but what if you want to add more fields that can be queried and modified? The tool I use (github.com/leahneukirchen/lr). has string formatting, a great alternative to piles of filter flags:

      lr -Uf '{"path":"%P","type":"%y", "sz":%s, "entries": %e, "depth":%d, "mtime":%As, "target":"%l"} \n' $1

      I then ‘refine’ it with this bit of jq:

      (.path, .target) |= split("/") | if .type != "l" then del(.target) else . end | if .type != "d" then del(.entries) else . end

      The results look like this:

      {"path":["foo"],"type":"d","sz":4096,"entries":2,"depth":1,"mtime":1589584272}
      {"path":["foo","bar.symlink"],"type":"l","sz":11,"depth":2,"mtime":1589673357,"target":["foo","bar.txt"]}

      And I can do queries like:

      select(.depth > 1 and .type == "f")

      This becomes much more useful with larger dirs, of course. And If I wanted to, say, add a "backed_up": true/false field, I could do that easily.

  2. Wouldn’t it be better to use YAML instead of JSON? With JSON when you add/remove a line at the end of a list, you always need to change another line to get those commas right. You don’t have that with YAML.

  3. I agree with your point that YAML is often easier for humans to edit, but I think in this case JSON is probably a better option. The reason is in this use case the end-user is not editing the output. It is only being consumed – most probably by a program or script, not a human. Since YAML is a superset of JSON the parsing libraries need to be a bit heftier with no real benefit over JSON.

    Plus jc has a YAML parser, so you can actually create a YAML file and easily convert it to pure JSON with something like cat file.yaml | jc --yaml

  4. “Had JSON been around when I was born in the 1970’s”
    Univac called this kind of output “Self Describing Files”, so it’s been around a while.

  5. Your argument isn’t completely without merit, I would love for various applications including pacmd, ip, systemd, etc., to have json or yaml output, but your initial example is flawed. It is possible to pass the output to awk only, with | awk '/inet/{gsub("/.*", ""); print $2; exit}' and obtain the same result without the use of grep, cut, and head. Awk is a full language, but a simple one, meant for text manipulation.

    1. Fair point, but I still believe the point stands that the awk method is “write-once”. That is, it may not be too hard to write, but is difficult to read because there is no context unless you have a look at the underlying data. For example, I just tried that on my macOS and got an IPv6 address instead of IPv4.

      1. On my Mac, the command ifconfig en0 | grep inet | awk ‘{print $2}’ | cut -d/ -f1 | head -n 1 does also return the IPv6 address. Using awk only on macOS you can shorten it to ifconfig en0 | awk ‘$1 == “inet” {print $2}’

  6. I find it kind of surprising that this line of thinking resurfaces periodically. I found this idea interesting before, to the extent that I took the step of writing a similar set of tools, just to see what it would look like. I based it off of e-n-f’s junix which is very much inline with this article (and it was again, one of those times something like this was discussed before).

    Was it useful? Yes. Was it “better”? Not for everything (i.e. I would like it as an option but not as a complete replacement).

    https://github.com/aanastasiou/pyjunix
    https://github.com/e-n-f/junix

Leave a Reply

%d bloggers like this: