custom_fake_data

Custom Fake Data

This extension is delivered with a small set of fake data by default. For each fake function ( fake_email(), fake_first_name()) we provide only 1000 unique values, and they are only in English.

Here’s how you can create your own set of fake data!

Alternative fake data packages

The projet is offering alternative fake datasets (currently only French). You can download the zip file containing the dataset and load it into the extension like this:

  1. Go to https://gitlab.com/dalibo/postgresql_anonymizer/-/packages

  2. Click on “data”

  3. Choose your prefered zip file and download it on your server

  4. Unzip the file into a folder (for example /path/to/custom_csv_files/)

  5. Run SELECT anon.init('/path/to/custom_csv_files/')

Generate your own fake dataset

As an example, here’s a python script that will generate fake data for you:

https://gitlab.com/dalibo/postgresql_anonymizer/-/blob/master/python/populate.py

To produce 5000 emails in French & German, you’d call the scripts like this:

populate.py --table email --locales fr,de --lines 5000

This will output the fake data in CSV format.

Use populate.py --help for more details about the script parameters.

You can load the fake data directly into the extension like this:

TRUNCATE anon.email;

COPY anon.email
FROM
PROGRAM 'populate.py --table email --locales fr,de --lines 5000';

SELECT setval('anon.email_oid_seq', max(oid))
FROM anon.email;

CLUSTER anon.email;

IMPORTANT : This script is provided as an example, it is not officially supported.

Load your own fake data

If you want to use your own dataset, you can import custom CSV files with :

SELECT anon.init('/path/to/custom_csv_files/')

Look at the data folder to find the format of the CSV files.

Using the PostgreSQL Faker extension

If you need more specialized fake data sets, please read the Advanced Faking section.