2019-01-28

Jam and HaikuDepot Code Generation

A desktop application, HaikuDepot downloads bulk data from the Haiku Depot Server (HDS) application-server as compressed JSON payloads. The schema for the payloads is defined by a JSON document in the HDS source code. A third-party maven plugin is used to generate server-side Java stubs from the JSON schema and Python scripts generate C++ stubs for the HaikuDepot side. Those C++ stubs are then copied into the Haiku source tree.

Some time ago, it was suggested that it would be desirable if the C++ sources were not generated-and-copied into the Haiku source tree but instead that the Haiku build-chain actually generated the C++ code as part of the build. Making this happen would require getting a bit more knowledge of Jam, the tool that orchestrates the build-chain of the Haiku operating system. Because Jam is not so frequently used any more, there is not a great deal of searchable information about how to “get stuff done” with it.

Basics of Build Dependencies

Jam works a bit like Make. Jam is invoked with a target and that target is the product of a number of input file-objects. If any of the input file-objects are missing or are stale then Jam will try to generate them using a set of rules. Here is a simplistic scenario illustrated in a diagram;

Dependencies

The prog executable is composed, via linking, from a number of object files that are themselves generated by compiling C-source files. Note that one of the C-source files is itself generated by using lex tool on a lex input file. Jam will use files’ modified timestamps in order to determine if an intermediate file is stale and needs to be re-generated. For example, if bronze.c is newer than bronze.o then the build system knows that bronze.o needs to be re-generated through compilation.

A non-generating dependency also exists with header-files. The bronze.h file is not generating bronze.c, but bronze.c is dependent on bronze.h and any change in the .h file should cause the .c files to be re-compiled.

Requirements

A JSON schema as input file needs to be processed with a Python script in order to generate C++ header .h and implementation .cpp files.
The generated C++ header and implementation files need to be included in the build process that yields the final application executable.
The generated files should be written to the build files rather than the source files.
A change in the Python script or a change in the source JSON schema file should result in a re-generation of the C++ header and implementation files.

Solution Files

The transcribing of the JSON schema into C++ header and implementation files needs to happen over more than one schema (Packages, Repositories, …) and the build also generates parsers as well, but to keep this blog-entry from getting out of hand, I only consider the generation of the model stubs for the Repository bulk download. Here is a diagram showing the interplay between the build files;

Dependencies

The actual generated sources consist of a number of .h and .cpp files that are generated by one execution of the Python script. In order to represent this bunch of files as a unit of dependency, the dummy.dat file is introduced. The dummy.dat is written to the build directory and its absence or stale date (in relation to the Python or JSON schema file) triggers the Python script to execute and to generate the sources afresh along with updating the dummy file. Once the generated sources are written, the Jam build system already knows that it needs to turn those into .o files and any of the other sources that have dependencies on the generated .h files will also be re-compiled.

Jam supports the notion of a NOTFILE for the purposes of creating an “object” in the build process. A NOTFILE is a build target or intermediate target that is not actually a file and just exists to connect other objects in the build chain. In this case it turns out to be better having the dummy object as an actual file because its absence as well as its last modified timestamp come into play in triggering re-generation of dependent file objects.

Solution Rules and Actions

To facilitate build-orchestration, Jam provides Rules and Actions. A Rule provides the relation between the inputs and outputs and the Actions provide the Commands that should be executed as part of applying a Rule. There are built-in Commands as well as Actions and it is also possible to run operating-system command-line tools inside Actions as well.

The changes to the Jam build system are carried in the following resources;

The Jam Rules and Actions that drive the production of the generated sources from the JSON schema
The HaikuDepot Build Jamfile that builds the application

Written in natural language, the following happens;

Rule HdsSchemaGenModel links the dummy file, the python script and the input schema.
1. The dummy file depends on the python script and the input schema
2. Locate the python script and input schema so the build system can find them.
3. Using action HdsSchemaGenModel1 Run the python script to generate the output C++ files
4. Using action HdsSchemaGenTouch, update the dummy file’s last modified timestamp
rule HdsSchemaGenAppSrcDependsOnGeneration links the generated source files to the dummy file
1. The generated C++ source files depend on the dummy file
2. Locate the generated C++ source files alongside the dummy file so the build system can find them.
Make all of the other source files for the application depend on all of the generated C++ header files. This way, if any of the generated header files changes, the dependent application-logic classes will be re-compiled.

Curious Things to Know About Jam

Rule arguments and Action arguments seem to work a bit differently to each other. Arguments to the Rule are indexed $(1), $(2) … each of which refers to an indexed argument passed to the Rule. Consider the following invocation of a Rule;

HdsSchemaGenModel $(dumpExportRepositoryModelTargetFile) : dumpexportrepository.json : jsonschema2cppmodel.py ;

Here the following arguments would be assigned;

Argument	Value
`$(1)`	`$(dumpExportRepositoryModelTargetFile)`
`$(2)`	`dumpexportrepository.json`
`$(3)`	`jsonschema2cppmodel.py`

Actions however have targets and sources that are separated by a colon and are then indexed in either the targets or the sources. Consider the following Action invocation;

HdsSchemaGenModel1 $(1) : $(2) $(3) $(1:D) ;

Consider that this action is run with the tokens replaced in the Rule as;

HdsSchemaGenModel1 /tmp/dummy.dat : dumpexportrepository.json jsonschema2cppmodel.py /tmp ;

The targets are on the left and are refrenced as $(1[<index>]) and the sources are on the right and are referenced as $(2[<index>]). In this example, the following arguments could be referenced;

Argument	Value
`$(1[1])`	`/tmp/dummy.dat`
`$(2[1])`	`dumpexportrepository.json`
`$(2[2])`	`jsonschema2cppmodel.py`
`$(2[3])`	`/tmp`

You may have noticed the use of the :D modifier on the argument above. This modifier will return the directory of the variable. There are many others that can swap file-extensions and also perform other transformations of the variable.

When you run the Jam tool, there are a number of -d switch-variants that can turn on various levels of debug trace to find out what is going on with the Jam build. This is very helpful to understand what is going on at build time.

Summary

This arrangement seems to work well. Hopefully this blog-entry will help out anybody else wanting to orchestrate some code-generation logic with the Jam tooling.