privacy_by_default
Privacy By Default
Disclaimer
This feature is considered in beta and not ready for production until version 2.O is published.
Use with care.
Principle
The GDPR regulation (and other privacy laws) introduces the concept of data protection by default. In a nutshell, it means that by default, organisations should ensure that data is processed with the highest privacy protection so that by default personal data isn’t made accessible to an indefinite number of persons.
By applying this principle to anonymization, we end up with the idea of privacy by default which basically means that all columns of all tables should be masked by default, without having to declare a masking rule for each of them.
To enable this feature, simply set the option anon.privacy_by_default
to on
.
Example
Imagine a database named foo
with a basic table containing HTTP logs:
# SELECT * FROM access_logs LIMIT 1;
date_open | ip_addr | url | browser_agent
---------------------+-----------------+------------+------------------------------
2009-01-08 00:00:00 | 192.168.100.128 | /home.html | Mozilla/5.0 (Windows; en_US)
(1 row)
Now let’s activate privacy by default:
ALTER DATABASE foo SET anon.privacy_by_default = True;
The setting will be applied for the next sessions and we can now anonymize the table without writing any masking rule.
# SELECT anon.anonymize_database();
anonymize_database
--------------------
t
# SELECT * FROM access_logs LIMIT 1;
date_open | ip_addr | url | browser_agent
-----------+---------+-----+---------------
| | | unkown
Unmasking columns
As we can see, when the anon.privacy_by_default
is defined all the values will
be replaced by the column’s default value or NULL. The entire dataset is
destroyed.
Now instead of writing rules to mask the sensible columns, we will write rules to unmask the ones we want to allow.
For instance, let’s say that we want to keep the authentic value of the url
field, we can simply “unmask” the column like this:
SECURITY LABEL FOR anon ON COLUMN access_logs.url
IS 'NOT MASKED';
This can also be achieved by a masking rule that will replace the value with itself:
SECURITY LABEL FOR anon ON COLUMN access_logs.url
IS 'MASKED WITH VALUE url';
Now we’d like to unmask the date_open
field in the anonymized dataset but
we need to generalize the dates to keep only the year:
SECURITY LABEL FOR anon ON COLUMN access_logs.date_open
IS 'MASKED WITH FUNCTION make_date(EXTRACT(year FROM date_open)::INT,1,1)';
Caveat: Add a DEFAULT to the NOT NULL columns
It is a bit ironic that the anon.privacy_by_default
parameter is not
enabled by default. This reason is simple: activating this option may or may
not lead to contraint violations depending on the columns constraints placed
in the database model.
Let’s say we want to add a NOT NULL
constraint on the date_open
column:
ALTER TABLE public.access_logs
ALTER COLUMN date_open
SET NOT NULL;
Now if we try to anonymize the table, we get the following violation:
SELECT anon.anonymize_table('public.access_logs') as test4;
ERROR: Cannot mask a "NOT NULL" column with a NULL value
HINT: If privacy_by_design is enabled, add a default value to the column
The solution here is simply to define a default value and this value will be
used for the privacy_by_default
mechanism.
ALTER TABLE public.access_logs
ALTER COLUMN date_open
SET DEFAULT now();
Other constraints (foreign keys, UNIQUE, CHECK, etc.) should work fine without a DEFAULT value.