Processing JSON Data to CSV with JQ
The tool jq
is a popular and flexible tool for working with JSON data. Haiku Depot Server is able to provide a large data payload containing all the package data. We’ll set an environment variable with the URL to this package data;
export HDS_URL="https://depot.haiku-os.org/__pkg/all-haikuports_x86_64-en.json.gz"
To look at the data, one can use curl
together with jq
;
curl -L "${HDS_URL}" | gzip -d | jq . | less
The data is more complex and voluminous, but the interesting parts of the data for this article are shown in the structure below;
{
"items": [
{
"name": "burnitnow",
"derivedRating": 3.5
},
{
"name": "sanity",
"derivedRating": 3.6600000858306885
},
{
"name": "gish",
"derivedRating": 0
},
{
"name": "zziplib"
}
]
}
The challenge is to use jq
to produce a CSV output that contains all packages that have a value for derivedRating
ordered by the highest rating first and then the name of the package. Here is how this is possible.
curl -L "${HDS_URL}" \
| gzip -d \
| jq -r '["package-name", "rating"], (.items|sort_by(-(.derivedRating//0),.name)|.[]|select(.derivedRating)|[.name, .derivedRating]) | @csv'
Using the test data above the output is;
"package-name","rating"
"sanity",3.6600000858306885
"burnitnow",3.5
"gish",0
The jq
expression is composed of a pair of headers coupled with an expression to extract the data rows. The rows are then “piped” into @csv
to produce the CSV. Abridged this expression has this form;
[ ...headers... ], ( ... ) | @csv
Broken up the jq
expression extracting the data rows is;
.items
- Extracts the array underitems
in the JSON data.sort_by(-(.derivedRating//0),.name)
- Sorts the rows of data by thederivedRating
value, or should there be none,0
. A second sorting is on the name..[]
- I think this means to then process each item in the array individually.select(.derivedRating)
- Include only those tuples with a value forderivedRating
.[.name, .derivedRating]
- Map each tuple to an array with thename
and thederivedRating
value.
The need to summarise and work with JSON data comes up quite frequently and jq
is a useful tool to be able to work with raw data in this format.