how-to/0-masking_data_with_postgresql_anonymizer


title: “Data Masking with PostgreSQL Anonymizer” subtitle: “A practical guide” author: “DALIBO”

date: “Feb. 2023”

Welcome to Paul’s Boutique !

This is a 4 hours workshop that demonstrates various anonymization techniques using the PostgreSQL Anonymizer extension.

The Story

Paul's boutique

Paul’s boutique has a lot of customers. Paul asks his friend Pierre, a Data Scientist, to make some statistics about his clients : average age, etc…

Pierre wants a direct access to the database in order to write SQL queries.


Jack is an employee of Paul. He’s in charge of relationship with the various suppliers of the shop.

Paul respects his suppliers privacy. He needs to hide the personnal information to Pierre, but Jack needs read and write access the real data.

Objectives

Using the simple example above, we will learn:

  • How to write masking rules
  • The difference between static and dynamic masking
  • Implementing advanced masking techniques

About PostgreSQL Anonymizer


postgresql_anonymizer is an extension to mask or replace personally identifiable information (PII) or commercially sensitive data from a PostgreSQL database.


The project has a declarative approach of anonymization. This means you can declare the masking rules using the PostgreSQL Data Definition Language (DDL) and specify your anonymization strategy inside the table definition itself.


Once the maskings rules are defined, you can access the anonymized data in 4 different ways:

About GDPR

This presentation does not go into the details of the GPDR act and the general concepts of anonymization.

For more information about it, please refer to the talk below:

Requirements

In order to make this workshop, you will need:

  • A Linux VM ( preferably Debian 11 bullseye or Ubuntu 22.04)
  • A PostgreSQL instance ( preferably PostgreSQL 14 )
  • The PostgreSQL Anonymizer (anon) extension, installed and initialized by a superuser
  • A database named “boutique” owned by a superuser called “paul”
  • A role “pierre” and a role “jack”, both allowed to connect to the database “boutique”

::: tip

and download the image below:
and download the image below:
and download the image below:
image below:

:::

docker pull registry.gitlab.com/dalibo/postgresql_anonymizer:stable

::: tip

to learn how to install the extension in your PostgreSQL instance.
to learn how to install the extension in your PostgreSQL instance.
to learn how to install the extension in your PostgreSQL instance.
extension in your PostgreSQL instance.

:::


The Roles

We will with 3 differents users:

CREATE ROLE paul LOGIN SUPERUSER PASSWORD 'CHANGEME';
CREATE ROLE pierre LOGIN PASSWORD 'CHANGEME';
CREATE ROLE jack LOGIN PASSWORD 'CHANGEME';

::: tip

paul.
paul.
paul.
paul.
paul.
paul.
paul.
paul.
ul`.

:::


::: Tip Setup a .pgpass file to simplify the connections ! :::

cat > ~/.pgpass << EOL
*:*:boutique:paul:CHANGEME
*:*:boutique:pierre:CHANGEME
*:*:boutique:jack:CHANGEME
EOL
chmod 0600 ~/.pgpass

The Sample database

We will work on a database called “boutique”:

CREATE DATABASE boutique OWNER paul;

We need to activate the anon library inside that database:

ALTER DATABASE boutique
  SET session_preload_libraries = 'anon';

Authors

This workshop is a collective work from Damien Clochard, Be Hai Tran, Florent Jardin, Frédéric Yhuel.

License

This document is distributed under the PostgreSQL license.

The source is available at

https://gitlab.com/dalibo/postgresql_anonymizer/-/tree/master/docs/how-to

Credits

  • Cover photo by Alex Conchillos from Pexels (CC Zero)
  • “Paul’s Boutique” is the second studio album by American hip hop group Beastie Boys, released on July 25, 1989 by Capitol Records