Locale and Encoding Settings

By default, CPK clusters are created with the locale en_US and UTF-8 encoding. This is set when the database is initialized and cannot be changed. However, it is possible to create a new database with different locale and encoding settings, provided they are available in the container.

Our containers are built with ICU support which uses the external ICU library. This offers many different locale and language options and should meet most of your needs. CPK also offers LIBC support, but because LIBC uses the locales provided by the operating system, the locales available in the database container may differ from the ones on your operating system. You can use the following command to check the available locales in the database container, replacing $POD-NAME with the pod name in your environment:

kubectl -n postgres-operator exec -c database $POD-NAME -- locale -a

Hint

For a full list of locale options available in the database, use the query SELECT * FROM pg_collation, or the command \dOS+ in psql.

Configuration Methods

There are two methods you can use to create a new database in your CPK cluster: the SQL command CREATE DATABASE or the createdb utility. These methods are effectively the same, except that the createdb utility will call psql for us, which some users find more convenient. If you have ever created a new database in PostgreSQL before, then you are already familiar with at least one of these methods.

For both methods, you'll first need to identify the primary pod so you can execute your commands against the database. To make things easier, you can store this information in an environment variable. For example, using a cluster named hippo:

Bash:

PRIMARY_POD=$(kubectl -n postgres-operator get pods --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o jsonpath='{.items[*].metadata.labels.statefulset\.kubernetes\.io/pod-name}')

Powershell:

$env:PRIMARY_POD=(kubectl -n postgres-operator get pods --selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' -o jsonpath='{.items[*].metadata.labels.statefulset\.kubernetes\.io/pod-name}')

Now, you can inspect the environment variable to see which Pod is the current primary:

Bash:

echo $PRIMARY_POD

Powershell:

echo $env:PRIMARY_POD

This should yield something similar to:

hippo-instance1-hltn-0

Now that your environment variable is set, let's create some databases!

Method #1: CREATE DATABASE

The first method you can use is the CREATE DATABASE command. For this example, you'll create a database named rhino using the ICU locale "Japanese" (ja) with UTF8 encoding.

Info

For ICU locales, it is recommended to use Unicode encodings like UTF-8 whenever possible.

First, exec into the primary pod using your environment variable and connect to the database via psql:

Bash:

kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- psql

Powershell:

kubectl -n postgres-operator exec -it "$env:PRIMARY_POD" -- psql

Next, run the CREATE DATABASE command to create the database rhino with your desired settings:

postgres=# CREATE DATABASE rhino LOCALE_PROVIDER 'icu' ICU_LOCALE 'ja' ENCODING 'UTF8' TEMPLATE 'template0' ;
CREATE DATABASE

Info

Notice that you are using template0 to create the database instead of template1. This is because copying from template0 allows you to choose different locale and encoding settings, whereas copying from template1 will use the same parameters that were set when the database was initialized.

Once the database has been created, make sure the locale and encoding settings are correct. This information is stored in the system catalog pg_database which can be queried using the \l command:

postgres=# \l

                                                       List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | ICU Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
 hippo     | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =Tc/postgres         +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres+
           |          |          |                 |             |             |            |           | hippo=CTc/postgres
 postgres  | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           |
 rhino     | postgres | UTF8     | icu             | en_US.utf-8 | en_US.utf-8 | ja         |           |
 template0 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
(5 rows)

Success! From this list, you can see that database rhino was created with the ICU locale ja and UTF8 encoding. You can also see your other databases, postgres and hippo, which were created with the default settings when the cluster was initialized.

Method #2: createdb

With the second method, you'll use the createdb utility to create a new database elephant using a locale provided by LIBC. For this database, you'll set your locale to British English (en_GB) and change the encoding to LATIN1.

Using your environment variable from before, run the createdb command in the primary pod, setting the --locale and --encoding flags to reflect your choices. Remember, since you're changing the locale and encoding settings, you will use template0 instead of template1 to create the database:

Bash:

kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 --locale 'en_GB' --encoding 'LATIN1' elephant

Powershell:

kubectl -n postgres-operator exec -it "$env:PRIMARY_POD" -- createdb -T template0 --locale 'en_GB' --encoding 'LATIN1' elephant

Now, check the system catalog pg_database to make sure the database was created with the correct settings:

Bash:

kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- psql -c '\l'

Powershell:

kubectl -n postgres-operator exec -it "$env:PRIMARY_POD" -- psql -c '\l'

This will yield something similar to:

Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)

                                                       List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | ICU Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
 elephant  | postgres | LATIN1   | libc            | en_GB       | en_GB       |            |           |
 hippo     | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =Tc/postgres         +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres+
           |          |          |                 |             |             |            |           | hippo=CTc/postgres
 postgres  | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           |
 rhino     | postgres | UTF8     | icu             | en_US.utf-8 | en_US.utf-8 | ja         |           |
 template0 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
(6 rows)

Here, you can see that database elephant was created with the locale en_GB and LATIN1 encoding.

Troubleshooting

If the locale and encoding settings you have chosen do not match, you will see an error message like the following:

$ kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 --locale 'en_HK' lion

Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)
createdb: error: database creation failed: ERROR:  encoding "UTF8" does not match locale "en_HK"
DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
command terminated with exit code 1

Based on the error message, you can see that the locale en_HK requires LATIN1 encoding instead of the default UTF8 encoding. To resolve this error, add the appropriate encoding option to your command:

Bash:

kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 --locale 'en_HK' --encoding 'LATIN1' lion

Powershell:

kubectl -n postgres-operator exec -it "$env:PRIMARY_POD" -- createdb -T template0 --locale 'en_HK' --encoding 'LATIN1' lion

This time, you do not see an error. Check the system catalog pg_database and make sure your database lion was created with the correct locale and encoding settings:

postgres=# \l

                                                       List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | ICU Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
 elephant  | postgres | LATIN1   | libc            | en_GB       | en_GB       |            |           |
 hippo     | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =Tc/postgres         +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres+
           |          |          |                 |             |             |            |           | hippo=CTc/postgres
 lion      | postgres | LATIN1   | libc            | en_HK       | en_HK       |            |           |
 postgres  | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           |
 rhino     | postgres | UTF8     | icu             | en_US.utf-8 | en_US.utf-8 | ja         |           |
 template0 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | en_US.utf-8 | en_US.utf-8 |            |           | =c/postgres          +
           |          |          |                 |             |             |            |           | postgres=CTc/postgres
(7 rows)

Success! The database lion was created with your desired settings.

Considerations

Setting --locale is equivalent to specifying --lc-collate, --lc-ctype, and --icu-locale to the same value. Some locales are only valid for ICU and must be set with --icu-locale . This table in the Postgres documentation shows which character sets are only valid for ICU and must be set with --icu-locale.

The other locale settings lc_messages, lc_monetary, lc_numeric, and lc_time are not fixed per database and are not set by this command. If you want to make them the default for a specific database, you can use ALTER DATABASE ... SET .