Processing JSON Data to CSV with JQ
The tool jq is a popular and flexible tool for working with JSON data. Haiku Depot Server is able to provide a large data payload containing all the package data. We’ll set an environment variable with the URL to this package data;
export HDS_URL="https://depot.haiku-os.org/__pkg/all-haikuports_x86_64-en.json.gz"
To look at the data, one can use curl together with jq;
curl -L "${HDS_URL}" | gzip -d | jq . | less
The data is more complex and voluminous, but the interesting parts of the data for this article are shown in the structure below;
{
"items": [
{
"name": "burnitnow",
"derivedRating": 3.5
},
{
"name": "sanity",
"derivedRating": 3.6600000858306885
},
{
"name": "gish",
"derivedRating": 0
},
{
"name": "zziplib"
}
]
}
The challenge is to use jq to produce a CSV output that contains all packages that have a value for derivedRating ordered by the highest rating first and then the name of the package. Here is how this is possible.
curl -L "${HDS_URL}" \
| gzip -d \
| jq -r '["package-name", "rating"], (.items|sort_by(-(.derivedRating//0),.name)|.[]|select(.derivedRating)|[.name, .derivedRating]) | @csv'
Using the test data above the output is;
"package-name","rating"
"sanity",3.6600000858306885
"burnitnow",3.5
"gish",0
The jq expression is composed of a pair of headers coupled with an expression to extract the data rows. The rows are then “piped” into @csv to produce the CSV. Abridged this expression has this form;
[ ...headers... ], ( ... ) | @csv
Broken up the jq expression extracting the data rows is;
.items- Extracts the array underitemsin the JSON data.sort_by(-(.derivedRating//0),.name)- Sorts the rows of data by thederivedRatingvalue, or should there be none,0. A second sorting is on the name..[]- I think this means to then process each item in the array individually.select(.derivedRating)- Include only those tuples with a value forderivedRating.[.name, .derivedRating]- Map each tuple to an array with thenameand thederivedRatingvalue.
The need to summarise and work with JSON data comes up quite frequently and jq is a useful tool to be able to work with raw data in this format.