foreign_keys

Masking Foreign Keys

TLDR; Masking a foreign key is hard. There is no silver buller but this chapter describes multiple strategies you can apply depending on your context.

First of all : Avoid Natural Keys at all costs

This is the best advice you will find here.

A primary or a foreign key should not have any meaning outside of the database itself. Therefore you should not have to anonymize it.

If you really need to anonymize a primary/foreign key in a table, this means that it is a natural key (as opposed to a surrogate key).

Natural keys are problematic for many reasons:

  • they can change over time (like email addresses or product codes), forcing cascading updates throughout related tables
  • they’re often not truly unique in practice, even seemingly unique values like SSNs can have duplicates or exceptions
  • they tend to be longer and more complex than simple integers
  • they make joins slower and indexes larger
  • they can contain sensitive information that you might not want exposed in URLs or logs.
  • they may change whenever business rules evolve, requiring database restructuring.

Surrogate keys (i.e. auto-incrementing integers) avoid these issues by providing stable, meaningless identifiers that never need to change.

If you have implemented surrogate keys, you don’t need to read this chapter any further ;-)

Example

Let’s consider this (dumb) example

CREATE TABLE author (
  id SERIAL UNIQUE,
  name TEXT
);

CREATE TABLE book (
  id SERIAL UNIQUE,
  name TEXT,
  fk_author_id INTEGER,
  FOREIGN KEY (fk_author_id) REFERENCES author(id)
);

ON UPDATE CASCADE

For static masking, the best solution is add the ON UPDATE CASCADE action to the foreign key :

ALTER TABLE book DROP CONSTRAINT book_fk_author_name_fkey;

ALTER TABLE book
ADD CONSTRAINT book_fk_author_name_fkey
    FOREIGN KEY (fk_author_name)
    REFERENCES author(name)
    ON UPDATE CASCADE
;

And then define a masking rule only for the referenced column

SECURITY LABEL FOR anon ON COLUMN author.name
  IS 'MASKED WITH FUNCTION anon.dummy_name()';

Now whenever the author table will be anonymized, the book.fk_author_name column will be altered automatically. Therefore you should not declare a masking rule on book.fk_author_name !

Pseudonymization

However the ON UPDATE CASCADE action is only triggered when the real data is actually altered, so it will not work for dynamic masking, backup masking and replica masking.

In these situations, the best approach is to use a pseudonymization function

First of all declared that the foerign key is deferrable:

ALTER TABLE book DROP CONSTRAINT book_fk_author_name_fkey;

ALTER TABLE book
ADD CONSTRAINT book_fk_author_name_fkey
    FOREIGN KEY (fk_author_name)
    REFERENCES author(name)
    DEFERRABLE
;

Then declare a similar masking rule for each column:

SECURITY LABEL FOR anon ON COLUMN author.name
  IS 'MASKED WITH FUNCTION pg_catalog.md5(name)';

SECURITY LABEL FOR anon ON COLUMN book.fk_author_name
  IS 'MASKED WITH FUNCTION pg_catalog.md5(fk_author_name)';

However keep in mind that that Pseudonymization Is Not Anonymization !