Bringing the Unix Philosophy to the 21st Century

kellyjonbrazilNovember 26, 2019October 24, 2022All Posts, Linux, Programmingcommand-line, jc, jq, json, linux, parsing

Try the jc web demo!

Do One Thing Well

The Unix philosophy of using compact expert tools that do one thing well and pipelining them together to manipulate data is a great idea and has worked well for the past few decades. This philosophy was outlined in the 1978 Foreward to the Bell System Technical Journal describing the UNIX Time-Sharing System:

Foreward to the Bell System Technical Journal

Items i and ii are oft repeated, and for good reason. But it is time to take this philosophy to the 21st century by further defining a standard output format for non-interactive use.

Unfortunately, this is the state of things today if you want to grab the IP address of one of the ethernet interfaces on your linux system:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1

This is not beautiful.

Up until about 2013 it made just as much sense as anything to assume unstructured text was a good way to output data at the command line. Unix/linux has many text parsing tools like sed, awk, grep, tr, cut, rev, etc. that can be pipelined together to reformat the desired data before sending it to the next program. Of course, this has always been a pain and is the source of many questions all over the web about how to parse the output of so-and-so program. The requirement to parse unstructured (in some cases only human readable) data manually has made life much more difficult than it needs to be for the average linux administrator.

But in 2013 a certain data format called JSON was standardized as ECMA-404 and later in 2017 as RFC 8259 and ISO/IEC 21778:2017. JSON is ubiquitous these days in REST APIs and is used to serialize everything from data between web applications, to Indicators of Compromise in the STIX2 specification, to configuration files. There are JSON parsing libraries in all modern programming languages and even JSON parsing tools for the command line, like jq. JSON is everywhere, it’s easy to use, and it’s a standard.

Had JSON been around when I was born in the 1970’s Ken Thompson and Dennis Ritchie may very well have embraced it as a recommended output format to help programs “do one thing well” in a pipeline.

To that end, I argue that linux and all of its supporting GNU and non-GNU utilities should offer JSON output options. We already see some limited support of this in systemctl and the iproute2 utilities like ip where you can output in JSON format with the -j option. The problem is that many linux distros do not include a version that offers JSON output (e.g. centos, currently). And even then, not all functions support JSON output as shown below:

Here is ip addr with JSON output:

$ ip -j addr show dev ens33
 [{
         "addr_info": [{},{}]
     },{
         "ifindex": 2,
         "ifname": "ens33",
         "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
         "mtu": 1500,
         "qdisc": "fq_codel",
         "operstate": "UP",
         "group": "default",
         "txqlen": 1000,
         "link_type": "ether",
         "address": "00:0c:29:99:45:17",
         "broadcast": "ff:ff:ff:ff:ff:ff",
         "addr_info": [{
                 "family": "inet",
                 "local": "192.168.71.131",
                 "prefixlen": 24,
                 "broadcast": "192.168.71.255",
                 "scope": "global",
                 "dynamic": true,
                 "label": "ens33",
                 "valid_life_time": 1732,
                 "preferred_life_time": 1732
             },{
                 "family": "inet6",
                 "local": "fe80::20c:29ff:fe99:4517",
                 "prefixlen": 64,
                 "scope": "link",
                 "valid_life_time": 4294967295,
                 "preferred_life_time": 4294967295
             }]
     }
 ]

And here is ip route not outputting JSON, even with the -j flag:

$ ip -j route
 default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100 
 192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131 
 192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100

Some other more modern tools like, kubectl and the aws-cli tool offer more consistent JSON output options which allow much easier parsing and pipelining of the output. But there are many older tools that still output nearly unparsable text. (e.g. netstat, lsblk, ifconfig, iptables, etc.) Interestingly Windows PowerShell has embraced using structured data, and that’s a good thing that the linux community can learn from.

How do we move forward?

The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them. All operating system APIs, like the /proc and /sys filesystems should serialize their files in JSON or provide the data in an alternative API that outputs JSON.

https://github.com/kellyjonbrazil/jc

In the meantime, I have created a tool called jc (https://github.com/kellyjonbrazil/jc) that converts the output of dozens of GNU and non-GNU commands and configuration files to JSON. Instead of everyone needing to create their own custom parsers for these common utilities and files, jc acts as a central clearinghouse of parsing libraries that just need to be written once and can be used by everyone.

Try the jc web demo!

jc is now available as an Ansible filter plugin!

JC In Action

Here’s how jc can be used to make your life easier today. Let’s take that same example of grabbing an ethernet IP address from above:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138

And here’s how you do the same thing with jc and a CLI JSON parsing tool like jq:

$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138

$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138

Here’s another example of listing the listening TCP ports on the system:

$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
25
22

That’s a lot of text manipulation just to get a simple list of port numbers! Here’s the same thing using jc and jq:

$ netstat -tln | jc --netstat | jq '.[].local_port_num'
25
22

$ jc netstat -tln | jq '.[].local_port_num'
25
22

Notice how much more intuitive it is to search and compare semantically enhanced structured data vs. awkwardly parsing low-level text? Also, the JSON output can be preserved to be used by any higher-level programming language like Python or JavaScript without line parsing. This is the future, my friends!

jc currently supports the following parsers: arp, df, dig, env, free, /etc/fstab, history, /etc/hosts, ifconfig, iptables, jobs, ls, lsblk, lsmod, lsof, mount, netstat, ps, route, ss, stat, systemctl, systemctl list-jobs, systemctl list-sockets, systemctl list-unit-files, uname -a, uptime, and w.

Note: jc now supports over 100 programs and file-types.

If you have a recommendation for a command or file type that is not currently supported by jc, add it to the comments and I’ll see if I can figure out how to parse and serialize it. If you would like to contribute a parser, please feel free!

With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

Published by kellyjonbrazil

I'm a cybersecurity and cloud computing nerd. View more posts

24 thoughts on “Bringing the Unix Philosophy to the 21st Century”

Elvin Ramirez says:

February 20, 2020 at 12:01 pm

I like the JC tool very much and make much sense for JSON usage. Remember, UNIX tools have been tested through time and time. JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level. Text format is fundamental for any file system, and the ability to read and write data in the text requires very minimal. The text way does not require any additional tooling or programs out of the standards across the system to read or write. I agree that having an additional output for JSON will increase the flexibility with the shell environment and the interaction with your system or computer. /proc and /sys are pseudo files and provide the minimal, adding, let say JSON serialize, will introduce complexity to the technology. Is all data in the form of text anyway or no? Wouldnt JSON create complexity to edit? Is read more critical when editing to writing data? Don’t get me wrong I do believe there should be improvements, but the questions of moving forward are much significant than converting data to JSON format.

P.S great article to create critical thinking

Loading...

Reply
1. Josh Peters says:
  
  September 22, 2020 at 1:08 am
  
  JSON is great for REST API calls, but in a shell environment, it can provide some challenges on the same level
  
  I disagree. The complexity of manipulating unstructured text eventually just gets shoved elsewhere. We have tools like sort instead of each command having a --sort flag because it composes better. Taking the task of filtering/sorting data away from commands like ls and moving it to a separate tool that can ‘do one thing and do it well’ is a step in the same direction.
  
  Example: what do foo -ag5 -k or bar -sd -1 do? Hard to say. But with a different version:
  
  foo -j | jq 'select(.amount > 5) | map(.bytes / 1024)'
  bar -j | jq 'sort_by(.date) | del(.level > 1)'
  
  Suddenly It’s easier to tell.
  
  The programmer doesn’t have to write those extra flags, the user doesn’t have to learn and remember them. That’s why the json versions of the ifconfig and netstat commands in the article are cleaner, too.
  
  Here’s some other things of note:
  
  Compatibility becomes easier to handle. Take a look at this failure of unstructured text:
  
  https://bugzilla.kernel.org/show_bug.cgi?id=197353
  
  A structured version could have had a ‘version’ field so that other tools could handle legacy behavior.
  
  It doesn’t have to be json:
  
  https://github.com/json-next/awesome-json-next
  https://github.com/dbohdan/structured-text-tools
  
  Recutils is cool by the way 😀
  
  As another example, Imagine you wanted to backup your ~’s metadata (an ongoing side project that I’ll probably eventually turn into a blog thingy). You could do find ~ > files.txt, but what if you want to add more fields that can be queried and modified? The tool I use (github.com/leahneukirchen/lr). has string formatting, a great alternative to piles of filter flags:
  
  lr -Uf '{"path":"%P","type":"%y", "sz":%s, "entries": %e, "depth":%d, "mtime":%As, "target":"%l"} \n' $1
  
  I then ‘refine’ it with this bit of jq:
  
  (.path, .target) |= split("/") | if .type != "l" then del(.target) else . end | if .type != "d" then del(.entries) else . end
  
  The results look like this:
  
  {"path":["foo"],"type":"d","sz":4096,"entries":2,"depth":1,"mtime":1589584272}
  {"path":["foo","bar.symlink"],"type":"l","sz":11,"depth":2,"mtime":1589673357,"target":["foo","bar.txt"]}
  
  And I can do queries like:
  
  select(.depth > 1 and .type == "f")
  
  This becomes much more useful with larger dirs, of course. And If I wanted to, say, add a "backed_up": true/false field, I could do that easily.
  
  Loading...
  
  Reply
Kurt Huwig says:

August 19, 2021 at 8:03 am

Wouldn’t it be better to use YAML instead of JSON? With JSON when you add/remove a line at the end of a list, you always need to change another line to get those commas right. You don’t have that with YAML.

Loading...

Reply
kellyjonbrazil says:

August 19, 2021 at 1:53 pm

I agree with your point that YAML is often easier for humans to edit, but I think in this case JSON is probably a better option. The reason is in this use case the end-user is not editing the output. It is only being consumed – most probably by a program or script, not a human. Since YAML is a superset of JSON the parsing libraries need to be a bit heftier with no real benefit over JSON.

Plus jc has a YAML parser, so you can actually create a YAML file and easily convert it to pure JSON with something like cat file.yaml | jc --yaml

Loading...

Reply
liamnal says:

August 22, 2021 at 4:41 pm

“Unix Philosophy” yet you’re unironically talking about json lol. Come on now.

Loading...

Reply
1. Steve Hollasch says:
  
  March 10, 2022 at 3:27 pm
  
  From Doug McIlroy, inventor of Unix pipes and one of the founders of the Unix tradition:
  
  (i) Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features.
  (ii) Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input. (iii) Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them.
  (iv) Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them.
  
  I don’t see the irony you seem to imply in emitting structured text for downstream consumption by arbitrary unknown tools.
  
  Loading...
  
  Reply
Dinosaur Wrangler says:

August 22, 2021 at 5:16 pm

“Had JSON been around when I was born in the 1970’s”
Univac called this kind of output “Self Describing Files”, so it’s been around a while.

Loading...

Reply
blusch says:

August 22, 2021 at 5:42 pm

Your argument isn’t completely without merit, I would love for various applications including pacmd, ip, systemd, etc., to have json or yaml output, but your initial example is flawed. It is possible to pass the output to awk only, with | awk '/inet/{gsub("/.*", ""); print $2; exit}' and obtain the same result without the use of grep, cut, and head. Awk is a full language, but a simple one, meant for text manipulation.

Loading...

Reply
1. kellyjonbrazil says:
  
  August 23, 2021 at 5:06 pm
  
  Fair point, but I still believe the point stands that the awk method is “write-once”. That is, it may not be too hard to write, but is difficult to read because there is no context unless you have a look at the underlying data. For example, I just tried that on my macOS and got an IPv6 address instead of IPv4.
  
  Loading...
  
  Reply
  1. Zaki J says:
    
    August 24, 2021 at 6:36 am
    
    On my Mac, the command ifconfig en0 | grep inet | awk ‘{print $2}’ | cut -d/ -f1 | head -n 1 does also return the IPv6 address. Using awk only on macOS you can shorten it to ifconfig en0 | awk ‘$1 == “inet” {print $2}’
    
    Loading...
Athanasios Anastasiou says:

August 25, 2021 at 12:48 am

I find it kind of surprising that this line of thinking resurfaces periodically. I found this idea interesting before, to the extent that I took the step of writing a similar set of tools, just to see what it would look like. I based it off of e-n-f’s junix which is very much inline with this article (and it was again, one of those times something like this was discussed before).

Was it useful? Yes. Was it “better”? Not for everything (i.e. I would like it as an option but not as a complete replacement).

https://github.com/aanastasiou/pyjunix
https://github.com/e-n-f/junix

Loading...

Reply
kellyjonbrazil says:

September 13, 2021 at 3:57 pm

Follow up: Here are some robust online discussions that have spawned from this article and jc:
– Introduction of JC on Reddit
– Hacker News discussion of this article
– Reddit discussion of this article
– JC mentioned on the LinuxUnplugged podcast
– JC featured on the PythonBytes podcast (15:40)

Loading...

Reply
Sandro says:

October 4, 2021 at 3:17 pm

Your command example:

ifconfig ens33 | jc –ifconfig | jq -r ‘.[].ipv4_addr’

Could be a lot shorter if you’re already passing in ‘ifconfig’:

jc ifconfig ens33 | jq -r ‘.[].ipv4_addr’

Just treat the first argument as the command and everything thereafter as its parameters.

Loading...

Reply
Andrew Reilly says:

October 10, 2021 at 3:49 pm

JSON is awful for a variety of reasons, and YAML is worse. Jq almost makes JSON tolerable, but brings some of its deficiencies into sharp relief, such as the need to use a different query syntax to follow paths with components that don’t look like C-identifiers, or the inability to do arithmetic on numbers that happen to have been quoted.

The real problem with your root-cause example is that Linux has given up on making tool-output trivially parseable (to the extent that they even have ifconfig at all). This works just fine as the answer to the first question on a handy BSD-flavoured system (both FreeBSD and macOS here):
ifconfig en0 | awk ‘/inet /{print $2}’
That’s because these systems still format ifconfig’s netmask as netmask 0xffffff00 or whatever, rather than slash form.

Yes, there are other questions that you could ask of ifconfig that would require a bit more parsing, but even then you’d probably find that awk would get the job done for you. Unless the tool you want to interrogate produces output in JSON format, at which point you’re probably out of luck.

Loading...

Reply
1. saper says:
  
  October 30, 2021 at 5:49 am
  
  FreeBSD has an interesting approach for this: they started integrating https://github.com/Juniper/libxo into the utilities themselves. For example:
  
  netstat -rn –libxo json | jq ‘.statistics.”route-information”.”route-table”.”rt-family”[1].”rt-entry”[1]’
  
  gives on one of my machines
  
  {
  “destination”: “default”,
  “gateway”: “fe80::1%re0”,
  “flags”: “UGS”,
  “flags_pretty”: [
  “up”,
  “gateway”,
  “static”
  ],
  “interface-name”: “re0”
  }
  
  Loading...
  
  Reply
Alhadis says:

September 15, 2022 at 11:24 pm

JSON works as an interchange format on the web because authors generally only have control over one endpoint (client/server). They also have the advantage of identifying content using MIME types (and out-of-band data stipulated by session state).

The advantages afforded by JSON’s simplicity are simply lost on terminal users. For one thing, users have full control over a pipeline’s input and output, so “standardising” an interchange format here doesn’t make sense. You’ll notice that the Unix toolchain already assumes a loose sort of “structure” in the form of fields and records (commonly lines and tokens of non-whitespace). cut(1) is a great example of a tool that Does This Well™. This is the format that tools should promulgate as machine-readable output, and many do: xargs, find, and several of Git’s porcelain commands use null-bytes and/or a fixed, well-documented text format for easier grepping.

Now, the problem I see so many developers attempting to “solve” with JSON tends to be an over-reliance on structured and typed data formats, even in contexts that don’t warrant nested data-structures and typed values. A common example is the perceived need for arrays when piping records would suffice (I say “record” instead of “line” to reflect the fact that record separators are often reconfigured; i.e., \0 for machine-readable output).

Of course, there are plenty of scenarios where parsing JSON output is both natural and inevitable; think RESTful API requests with curl(1) and friends. But there are many more where it’s simply overkill, and any shell-script that requires its user to install additional tooling is—simply put—doing it wrong. If you find awk(1) ill-equipped for the text processing that you’re attempting, that’s often a good sign that you should be using a dedicated scripting language like Perl or Python.

*drops the mic* ^D

Loading...

Reply
kellyjonbrazil says:

October 6, 2022 at 2:19 pm
For more tips and tricks on how to use JSON, jc, and jq in your scripts check out these posts:
Loading...
Reply
WB says:

October 9, 2023 at 4:58 am

This ifconfig example is a bit meh. I do sometimes do it (using ip a | grep inet probably, but more for manual inspection.

For scripting just use: $ ip -json a

Loading...

Reply
1. kellyjonbrazil says:
  
  October 9, 2023 at 7:02 am
  
  Sure, but that is not cross platform. Also, at the time this post was written, major Linux distributions didn’t have the latest version of ip that had better JSON support.
  
  Loading...
  
  Reply
Pingback: Bringing the Unix philosophy to the 21st century – OSnews
Unix says:

December 11, 2023 at 6:37 am

I feel that both examples, on how to get an IP, both are god awful. That is for majority of computer users. May i propose something like this instead, ai what is my ip, that in my opinion is much more elegant and beautiful and above all useful to human beings.

Loading...

Reply
Mmanu Chaturvedi says:

July 9, 2024 at 11:52 pm

I liked jc quite a bit and use it for parsing ufw‘s output. I wrote a utility for finding duplicate files which leverages the Unix philosophy instead of reimplementing features already present in other commands:

https://github.com/m-chaturvedi/undupes

Loading...

Reply
knut says:

September 13, 2024 at 3:00 am

Amen. I think the next logical step is to integrate handling of structured data into the shell – in other words, https://www.nushell.sh/. This eliminates the need to learn multiple partially overlapping mini-languages (like POSIX shell, awk, jq) for many everyday use cases (though, of course, there will always be cases where a “real” language like Python or Ruby is the best choice).

Loading...

Reply
c0rked says:

November 21, 2024 at 6:53 am

Having JC parse rsync –stats would be great!

Loading...

Reply