About
Side Projects
Blog
2024-03-07

The Value in an Artifical Primary Key

The task of database design is not an uncommon one in the career of a software engineer. This process involves understanding and modelling the entities in a “domain” and the relationship between them. For example, a set of recipies collected into books might have ingredients, a difficulty level, an author and some steps. Here we can see the entities; book, author, recipie, ingredient and step. Understanding these entities, their attributes and the relationships between them can be a “make or break” task for a software engineering project.

An instance of a database table needs to be identified by an attribute called the “primary key”. The primary key could be an “artificial identifier” such as a unique number in a sequence or a string of random characters such as ba10a67d-e00b-4bfb-af78-8685102b1cd9 (typically a UUID) or could be some “natural identifier” such as a driver’s license number or a currency code.

Typically other tables then reference the primary key of a table using an attribute in their own table that is termed the “foreign key”.

In general, designing with an artificial identifier instead of a natural identifier is the best choice. Although there are always exceptions to the rule, I have always employed artificial keys in my designs.

In a recent situation with Haiku Depot Server (HDS) the advantages of an artificial identifier were made clear and, I’m capturing this here in case I need to refer back to it.

HDS keeps a list of languages in a Postgres table called natural_language. This has an artificial primary key of an bigint. Originally the table had a column code which was used to store an ISO-3166-1 language code such as en, zh or jp. More recently this list has had to be extended with two additional properties; the country code and script code. An example country code would be NZ and an example script code would be Latn.

The original code no longer makes sense and can no longer uniquely identify a language. It is the three components together that uniquely identify the logical language.

If the original design had used the code, the natural identifier, as the primary key then many other tables would have been related to this table using the code as well. However now the code is to be dropped out and the three new attributes will be added in its place (language_code, country_code and script_code), any relationships based on the code alone would no longer work.

Luckily, an artificial key was used and so the foreign keys in other tables will continue to work as was originally intended. The use of the artificial key has allowed the logical attributes of the table to evolve over time without breaking the overall database model.

It is very tempting for an inexperienced database designer to see a natural identifier for an entity and opt to employ it as the primary key because it seems convenient, less work to implement and easier to read. In general (and there will always be exceptions) this is best avoided because of falling into the trap outlined above of making some aspects of database evolution more difficult later.