Locale and Encoding Settings
By default, CPK clusters are created with the locale en_US
and UTF-8
encoding. This is set when the database is initialized and cannot be changed. However, it is possible to create a new database with different locale and encoding settings, provided they are available in the container.
Our containers are built with ICU support which uses the external ICU library. This offers many different locale and language options and should meet most of your needs. CPK also offers LIBC support, but because LIBC uses the locales provided by the operating system, the locales available in the database container may differ from the ones on your operating system. You can use the following command to check the available locales in the database container, replacing $POD-NAME
with the pod name in your environment:
kubectl -n postgres-operator exec -c database $POD-NAME -- locale -a
Hint
For a full list of locale options available in the database, use the query SELECT * FROM pg_collation
, or the command \dOS+
in psql.
Configuration Methods
There are two methods you can use to create a new database in your CPK cluster: the SQL command CREATE DATABASE
or the createdb
utility. These methods are effectively the same, except that the createdb
utility will call psql
for us, which some users find more convenient. If you have ever created a new database in PostgreSQL before, then you are already familiar with at least one of these methods.
For both methods, you'll first need to identify the primary pod so you can execute your commands against the database. To make things easier, you can store this information in an environment variable. For example, using a cluster named hippo
:
PRIMARY_POD=$(kubectl -n postgres-operator get pods \
--selector='postgres-operator.crunchydata.com/cluster=hippo,postgres-operator.crunchydata.com/role=master' \
-o jsonpath='{.items[*].metadata.labels.statefulset\.kubernetes\.io/pod-name}')
Now, you can inspect the environment variable to see which Pod is the current primary:
echo $PRIMARY_POD
This should yield something similar to:
hippo-instance1-hltn-0
Now that your environment variable is set, let's create some databases!
Method #1: CREATE DATABASE
The first method you can use is the CREATE DATABASE
command. For this example, you'll create a database named rhino
using the ICU locale "Japanese" (ja
) with UTF8
encoding.
Info
For ICU locales, it is recommended to use Unicode encodings like UTF-8 whenever possible.
First, exec into the primary pod using your environment variable and connect to the database via psql
:
kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- psql
Next, run the CREATE DATABASE
command to create the database rhino
with your desired settings:
postgres=# CREATE DATABASE rhino LOCALE_PROVIDER 'icu' ICU_LOCALE 'ja' ENCODING 'UTF8' TEMPLATE 'template0' ;
CREATE DATABASE
Info
Notice that you are using template0
to create the database instead of template1
. This is because copying from template0
allows you to choose different locale and encoding settings, whereas copying from template1
will use the same parameters that were set when the database was initialized.
Once the database has been created, make sure the locale and encoding settings are correct. This information is stored in the system catalog pg_database
which can be queried using the \l
command:
postgres=# \l
List of databases
Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
hippo | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =Tc/postgres +
| | | | | | | | postgres=CTc/postgres+
| | | | | | | | hippo=CTc/postgres
postgres | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | |
rhino | postgres | UTF8 | icu | en_US.utf-8 | en_US.utf-8 | ja | |
template0 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
(5 rows)
Success! From this list, you can see that database rhino
was created with the ICU locale ja
and UTF8
encoding. You can also see your other databases, postgres
and hippo
, which were created with the default settings when the cluster was initialized.
Method #2: createdb
With the second method, you'll use the createdb
utility to create a new database elephant
using a locale provided by LIBC. For this database, you'll set your locale to British English (en_GB
) and change the encoding to LATIN1
.
Using your environment variable from before, run the createdb
command in the primary pod, setting the --locale
and --encoding
flags to reflect your choices. Remember, since you're changing the locale and encoding settings, you will use template0
instead of template1
to create the database:
kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 \
--locale 'en_GB' --encoding 'LATIN1' elephant
Now, check the system catalog pg_database
to make sure the database was created with the correct settings:
$ kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- psql -c '\l'
Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)
List of databases
Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
elephant | postgres | LATIN1 | libc | en_GB | en_GB | | |
hippo | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =Tc/postgres +
| | | | | | | | postgres=CTc/postgres+
| | | | | | | | hippo=CTc/postgres
postgres | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | |
rhino | postgres | UTF8 | icu | en_US.utf-8 | en_US.utf-8 | ja | |
template0 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
(6 rows)
Here, you can see that database elephant
was created with the locale en_GB
and LATIN1
encoding.
Troubleshooting
If the locale and encoding settings you have chosen do not match, you will see an error message like the following:
$ kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 --locale 'en_HK' lion
Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)
createdb: error: database creation failed: ERROR: encoding "UTF8" does not match locale "en_HK"
DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1".
command terminated with exit code 1
Based on the error message, you can see that the locale en_HK
requires LATIN1
encoding instead of the default UTF8
encoding. To resolve this error, add the appropriate encoding option to your command:
kubectl -n postgres-operator exec -it "$PRIMARY_POD" -- createdb -T template0 \
--locale 'en_HK' --encoding 'latin1' lion
This time, you do not see an error. Check the system catalog pg_database
and make sure your database lion
was created with the correct locale and encoding settings:
postgres=# \l
List of databases
Name | Owner | Encoding | Locale Provider | Collate | Ctype | ICU Locale | ICU Rules | Access privileges
-----------+----------+----------+-----------------+-------------+-------------+------------+-----------+-----------------------
elephant | postgres | LATIN1 | libc | en_GB | en_GB | | |
hippo | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =Tc/postgres +
| | | | | | | | postgres=CTc/postgres+
| | | | | | | | hippo=CTc/postgres
lion | postgres | LATIN1 | libc | en_HK | en_HK | | |
postgres | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | |
rhino | postgres | UTF8 | icu | en_US.utf-8 | en_US.utf-8 | ja | |
template0 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | libc | en_US.utf-8 | en_US.utf-8 | | | =c/postgres +
| | | | | | | | postgres=CTc/postgres
(7 rows)
Success! The database lion
was created with your desired settings.
Considerations
Setting --locale
is equivalent to specifying --lc-collate
, --lc-ctype
, and --icu-locale
to the same value. Some locales are only valid for ICU and must be set with --icu-locale
. This table in the Postgres documentation shows which character sets are only valid for ICU and must be set with --icu-locale
.
The other locale settings lc_messages
, lc_monetary
, lc_numeric
, and lc_time
are not fixed per database and are not set by this command. If you want to make them the default for a specific database, you can use ALTER DATABASE ... SET
.