Jello: The JQ Alternative for Pythonistas

Try the jello web demo!

I’m a big fan of using structured data at the command line. So much so that I’ve written a couple of utilities to promote JSON in the CLI:

Typically I use jq to filter and process the JSON output into submission until I get what I want. But if you’re anything like me, you spend a lot of time googling how to do what you want in jq because the syntax can get a little out of hand. In fact, I keep notes with example jq queries I’ve used before in case I need those techniques again.

jq is great for simple things, but sometimes when I want to iterate through a deeply nested structure with arrays of objects I find python’s list and dictionary syntax easier to comprehend.

Hello jello

That’s why I created jello. jello works similarly to jq but uses the python interpreter, so you can iterate with loops, comprehensions, variables, expressions, etc. just like you would in a full-fledged python script.

The nice thing about jello is that it removes a lot of the boilerplate code you would need to ingest and output the JSON or JSON Lines data so you can focus on the logic.

Let’s take the following output from jc -ap:

$ jc -ap
{
  "name": "jc",
  "version": "1.9.2",
  "description": "jc cli output JSON conversion tool",
  "author": "Kelly Brazil",
  "author_email": "kellyjonbrazil@gmail.com",
  "parser_count": 50,
  "parsers": [
    {
      "name": "airport",
      "argument": "--airport",
      "version": "1.0",
      "description": "airport -I command parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "darwin"
      ],
      "magic_commands": [
        "airport -I"
      ]
    },
    {
      "name": "airport_s",
      "argument": "--airport-s",
      "version": "1.0",
      "description": "airport -s command parser",
      "author": "Kelly Brazil",
      "author_email": "kellyjonbrazil@gmail.com",
      "compatible": [
        "darwin"
      ],
      "magic_commands": [
        "airport -s"
      ]
    },
    ...
]

Let’s say I want a list of the parser names that are compatible with macOS. Here is a jq query that will get down to that level:

$ jc -a | jq '[.parsers[] | select(.compatible[] | contains("darwin")) | .name]' 
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

This is not too terribly bad, but you need to be careful about bracket and parenthesis placements. Here’s the same query in jello:

$ jc -a | jello '[parser["name"] for parser in _["parsers"] if "darwin" in parser["compatible"]]'
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

As you can see, jello gives you the JSON or JSON Lines input as a dictionary or list of dictionaries assigned to ‘_‘. Then you process it as you’d like using standard python syntax. jello automatically takes care of slurping input and printing valid JSON or JSON Lines depending on the value of the last expression.

The example above is not quite as terse as using jq, but it’s more readable to someone who is familiar with python list comprehension. As with any programming language, there are multiple ways to skin a cat. We can also do a similar query with a for loop:

$ jc -a | jello '\
result = []
for parser in _["parsers"]:
  for k, v in parser.items():
    if "darwin" in v:
      result.append(parser["name"])
result'
[
  "airport",
  "airport_s",
  "arp",
  "crontab",
  "crontab_u",
  "csv",
  ...
]

Advanced JSON Processing

These are very simple examples and jq syntax might be ok here (though I prefer python syntax). But what if we try to do something more complex? Let’s take one of the advanced examples from the excellent jq tutorial by Matthew Lincoln.

Under Grouping and Counting, Matthew describes an advanced jq filter against a sample Twitter dataset that includes JSON Lines data. There he describes the following query:

“We can now create a table of users. Let’s create a table with columns for the user id, user name, followers count, and a column of their tweet ids separated by a semicolon.”

https://programminghistorian.org/en/lessons/json-and-jq

Here is the final jq query:

$ cat twitterdata.jlines | jq -s 'group_by(.user) | 
                                 .[] | 
                                 {
                                   user_id: .[0].user.id, 
                                   user_name: .[0].user.screen_name, 
                                   user_followers: .[0].user.followers_count, 
                                   tweet_ids: [.[].id | tostring] | join(";")
                                 }'
...
{
  "user_id": 47073035,
  "user_name": "msoltanm",
  "user_followers": 63,
  "tweet_ids": "619172275741298700"
}
{
  "user_id": 2569107372,
  "user_name": "SlavinOleg",
  "user_followers": 35,
  "tweet_ids": "501064198973960200;501064202794971140;501064214467731460;501064215759568900;501064220121632800"
}
{
  "user_id": 2369225023,
  "user_name": "SkogCarla",
  "user_followers": 10816,
  "tweet_ids": "501064217667960800"
}
{
  "user_id": 2477475030,
  "user_name": "bennharr",
  "user_followers": 151,
  "tweet_ids": "501064201503113200"
}
{
  "user_id": 42226593,
  "user_name": "shirleycolleen",
  "user_followers": 2114,
  "tweet_ids": "619172281294655500;619172179960328200"
}
...

This is a fantastic query! It’s actually deceptively simple looking – it takes quite a few paragraphs for Matthew to describe how it works and there are some tricky brackets, braces, and parentheses in there that need to be set just right. Let’s see how we could tackle this task with jello using standard python syntax:

$ cat twitterdata.jlines | jello -l '\
user_ids = set()
for tweet in _:
    user_ids.add(tweet["user"]["id"])
result = []
for user in user_ids:
    user_profile = {}
    tweet_ids = []
    for tweet in _:
        if tweet["user"]["id"] == user:
            user_profile.update({
                "user_id": user,
                "user_name": tweet["user"]["screen_name"],
                "user_followers": tweet["user"]["followers_count"]})
            tweet_ids.append(str(tweet["id"]))
    user_profile["tweet_ids"] = ";".join(tweet_ids)
    result.append(user_profile)
result'
...
{"user_id": 2696111005, "user_name": "EGEVER142", "user_followers": 1433, "tweet_ids": "619172303654518784"}
{"user_id": 42226593, "user_name": "shirleycolleen", "user_followers": 2114, "tweet_ids": "619172281294655488;619172179960328192"}
{"user_id": 106948003, "user_name": "MrKneeGrow", "user_followers": 172, "tweet_ids": "501064228627705857"}
{"user_id": 18270633, "user_name": "ahhthatswhy", "user_followers": 559, "tweet_ids": "501064204661850113"}
{"user_id": 14331818, "user_name": "edsu", "user_followers": 4220, "tweet_ids": "615973042443956225;618602288781860864"}
{"user_id": 2569107372, "user_name": "SlavinOleg", "user_followers": 35, "tweet_ids": "501064198973960192;501064202794971136;501064214467731457;501064215759568897;501064220121632768"}
{"user_id": 22668719, "user_name": "nodehyena", "user_followers": 294, "tweet_ids": "501064222772445187"}
...

So there’s 17 lines of python… again not as terse as jq, but for pythonistas this is probably a lot easier to understand what is going on. This is a pretty simple and naive implementation – there are probably much better approaches that are shorter, simpler, faster, etc. but the point is I can come back six months from now and understand what is going on if I need to debug or tweak it.

Just for fun, let’s pipe this result through jtbl to see what it looks like:

   user_id  user_name          user_followers  tweet_ids
----------  ---------------  ----------------  ----------------------------------------------------------------------------------------------
...
2481812382  SadieODoyle                    42  501064200035516416
2696111005  EGEVER142                    1433  619172303654518784
  42226593  shirleycolleen               2114  619172281294655488;619172179960328192
 106948003  MrKneeGrow                    172  501064228627705857
  18270633  ahhthatswhy                   559  501064204661850113
  14331818  edsu                         4220  615973042443956225;618602288781860864
2569107372  SlavinOleg                     35  501064198973960192;501064202794971136;501064214467731457;501064215759568897;501064220121632768
  22668719  nodehyena                     294  501064222772445187
  23598003  victoriasview                1163  501064228288364546
 851336634  20mUsa                      15643  50106414
...

Very cool! Find more examples at https://github.com/kellyjonbrazil/jello. I hope you find jello useful in your command line pipelines.

Try the jello web demo!

Published by kellyjonbrazil

I'm a cybersecurity and cloud computing nerd.

One thought on “Jello: The JQ Alternative for Pythonistas

Leave a Reply

RSS
Follow by Email
LinkedIn
LinkedIn
Share
%d bloggers like this: