datatypes

Supporting additional data types

Communication with the database mostly happens in a text format. When you include an integer value in a query, you use to_string to convert it to that text format. When you get a query result field “as a float,” it converts from the text format to a floating-point type. These conversions are everywhere in libpqxx.

The conversion sydstem supports many built-in types, but it is also extensible. You can “teach” libpqxx (in the scope of your own application) to convert additional types of values to and from PostgreSQL’s string format.

This is massively useful, but it’s not for the faint of heart. You’ll need to specialise some templates. And, the API for doing this can change with any major libpqxx release.

Converting types

In your application, a conversion is driven entirely by a C++ type you specify. The value’s SQL type has nothing to do with it, nor is there anything in the string that would identify its type.

So, if you’ve SELECTed a 64-bit integer from the database, and you try to convert it to a C++ “short,” one of two things will happen: either the number is small enough to fit in your short (and it just works), or else it throws a conversion exception.

Or, your database table might have a text column, but a given field may contain a string that looks just like a number. You can convert that value to an integer type just fine. Or to a floating-point type. All that matters to the conversion is the actual value, and the type.

In some cases the templates for these conversions can tell the type from the arguments you pass them:

    auto x = to_string(99);

In other cases you may need to instantiate template explicitly:

    auto y = from_string<int>("99");

Supporting a new type

Let’s say you have some other SQL type which you want to be able to store in, or retrieve from, the database. What would it take to support that?

Sometimes you do not need complete support. You might need a conversion to a string but not from a string, for example. The conversion is defined at compile time, so don’t be too afraid to be incomplete. If you leave out one of these steps, it’s not going to crash at run time or mess up your data. The worst that can happen is that your code won’t build.

So what do you need for a complete conversion?

First off, of course, you need a C++ type. It may be your own, but it doesn’t have to be. It could be a type from a third-party library, or even one from the standard library that libpqxx does not yet support.

You also specialise the pqxx::type_name variable to specify the type’s name. This is important for all code which mentions your type in human-readable text, such as error messages.

Then, does your type have a built-in null value? You specialise the pqxx::nullness template to specify the details.

Finally, you specialise the pqxx::string_traits template. This is where you define the actual conversions.

Let’s go through these steps one by one.

Your type

You’ll need a type for which the conversions are not yet defined, because the C++ type is what determines the right conversion. One type, one set of conversions.

The type doesn’t have to be one that you create. The conversion logic was designed such that you can build it around any type. So you can just as easily build a conversion for a type that’s defined somewhere else. There’s no need to include any special methods or other members inside it. That’s also how libpqxx can support converting built-in types like int.

By the way, if the type is an enum, you don’t need to do any of this. Just invoke the preprocessor macro PQXX_DECLARE_ENUM_CONVERSION, from the global namespace near the top of your translation unit, and pass the type as an argument.

The library also provides specialisations for std::optional<T>, std::shared_ptr<T>, and std::unique_ptr<T>. If you have conversions for T, you’ll also have conversions for those.

Specialise type_name

When errors happen during conversion, libpqxx will compose error messages for the user. Sometimes these will include the name of the type that’s being converted.

To tell libpqxx the name of each type, there’s a template variable called pqxx::type_name. For any given type T, it should have a specialisation that provides that T’s human-readable name:

    namespace pqxx
    {
    template<> std::string const type_name<T>{"T"};
    }

(Yes, this means that you need to define something inside the pqxx namespace. Future versions of libpqxx may move this into a separate namespace.)

Define this early on in your translation unit, before any code that might cause libpqxx to need the name. That way, the libpqxx code which needs to know the type’s name can see your definition.

Specialise nullness

A struct template pqxx::nullness defines whether your type has a natural “null value” built in. If so, it also provides member functions for producing and recognising null values.

The simplest scenario is also the most common: most types don’t have a null value built in. In that case, derive your nullness traits from pqxx::no_null:

    namespace pqxx
    {
    template<> struct nullness<T> : pqxx::no_null<T> {};
    }

(Here again you’re defining this in the pqxx namespace.)

If your type does have a natural null value, the definition gets a little more complex:

    namespace pqxx
    {
    template<> struct nullness<T>
    {
      static constexpr bool has_null{true};
      static constexpr bool always_null{false};

      static bool is_null(T const &value)
      {
        // Return whether "value" is null.
        return ...;
      }

      [[nodiscard]] static T null()
      {
        // Return a null value.
        return ...;
      }
    };
    }

You may be wondering why there’s a function to produce a null value, but also a function to check whether a value is null. Why not just compare the value to the result of null()? Because two null values may not be equal. T may have several different null values. Or it may override the comparison operator, similar to SQL where NULL is not equal to NULL.

As a third case, your type may be one that always represents a null value. This is the case for std::nullptr_t and std::nullopt_t. In that case, you set nullness<TYPE>::always_null to true (as well as has_null of course), and you won’t need to define any actual conversions.

Specialise string_traits

This part is more work. (You can skip it for types that are always null, but those will be rare.) Specialise the pqxx::string_traits template:

    namespace pqxx
    {
    template<> struct string_traits<T>
    {
      static T from_string(std::string_view text);
      static zview to_buf(char *begin, char *end, T const &value);
      static char *into_buf(char *begin, char *end, T const &value);
      static std::size_t size_buffer(T const &value) noexcept;
    };
    }

You’ll also need to write those member functions, or as many of them as needed to get your code to build.

from_string

We start off simple: from_string parses a string as a value of T, and returns that value.

The string may not be zero-terminated; it’s just the string_view from beginning to end (exclusive). In your tests, cover cases where the string does not end in a zero byte.

It’s perfectly possible that the string isn’t actually a T value. Mistakes happen. In that case, throw a pqxx::conversion_error.

(Of course it’s also possible that you run into some other error, so it’s fine to throw different exceptions as well. But when it’s definitely “this is not the right format for a T,” throw conversion_error.)

to_buf

In this function, you convert a value of T into a string that the postgres server will understand.

The caller will provide you with a buffer where you can write the string, if you need it: from begin to end exclusive. It’s a half-open interval, so don’t access *end.

If the buffer is insufficient for you to do the conversion, throw a pqxx::conversion_overrun. It doesn’t have to be exact: you can be a little pessimistic and demand a bit more space than you need. Just be sure to throw the exception if there’s any risk of overrunning the buffer.

You don’t have to use the buffer for this function though. For example, pqxx::string_traits<bool>::to_buf returns a compile-time constant string and ignores the buffer.

Even if you do use the buffer, your string does not have to start at the beginning of the buffer. For example, the integer conversions start by writing the least significant digit to the end of the buffer, and then writes the more significant digits before it. It was just more convenient.

Return a pqxx::zview. This is basically a std::string_view, but with one difference: a zview guarantees that there will be a valid zero byte right after the string_view. The zero byte is not counted as part of its size, but it will be there.

Expressed in code, this rule must hold:

    void invariant(zview z)
    {
      assert(z[std::size(z)] == 0);
    }

Make sure you write your trailing zero before the end. If the trailing zero doesn’t fit in the buffer, then there’s just not enough room to perform the conversion.

Beware of locales when converting. If you use standard library features like sprintf, they may obey whatever locale is currently set on the system. That means that a simple integer like 1000000 may come out as “1000000” on your system, but as “1,000,000” on mine, or as “1.000.000” for somebody else, and on an Indian system it may be “1,00,000”. Values coming from or going to the database should be in non-localised formats. You can use libpqxx functions for those conversions: pqxx::from_string, pqxx::to_string, pqxx::to_buf.

into_buf

This is a stricter version of to_buf. All the same requirements apply, but in addition you must write your string into the buffer provided, starting exactly at begin.

That’s why this function returns just a simple pointer: the address right behind the trailing zero. If the caller wants to use the string, they can find it at begin. If they want to write a different value into the rest of the buffer, they can start at the location you returned.

size_buffer

Here you estimate how much buffer space you need for converting a T to a string. Be precise if you can, but pessimistic if you must. It’s usually better to waste a few unnecessary bytes than to spend a lot of time computing the exact buffer space you need. And failing the conversion because you under-budgeted the buffer is worst of all.

Include the trailing zero in the buffer size. If your to_buf takes more space than just what’s needed to store the result, include that too.

Make size_buffer a constexpr function if you can. It can allow the caller to allocate the buffer on the stack, with a size known at compile time.

Optional: Specialise is_unquoted_safe

When converting arrays or composite values to strings, libpqxx may need to quote values and escape any special characters. This takes time.

Some types though, such as integral or floating-point types, can never have any special characters such as quotes, commas, or backslashes in their string representations. In such cases, there’s no need to quote or escape such values in arrays or composite types.

If your type is like that, you can tell libpqxx about this by defining:

    namespace pqxx
    {
    template<> inline constexpr bool is_unquoted_safe<MY_TYPE>{true};
    }

The code that converts this type of field to strings in an array or a composite type can then use a simpler, more efficient variant of the code. It’s always safe to leave this out; it’s just an optimisation for when you’re completely sure that it’s safe.

Do not do this if a string representation of your type may contain a comma; semicolon; parenthesis; brace; quote; backslash; newline; or any other character that might need escaping.

Optional: Specialise param_format

This one you don’t generally need to worry about. Read on if you’re writing a type which represents raw binary data, or if you’re writing a template where some specialisations may contain raw binary data.

When you call parameterised statements, or prepared statements with parameters, libpqxx needs to your parameters on to libpq, the underlying C-level PostgreSQL client library.

There are two formats for doing that: text and binary. In the first, we represent all values as strings, and the server then converts them into its own internal binary representation. That’s what the string conversions are all about, and it’s what we do for almost all types of parameters.

But we do it differently when the parameter is a contiguous series of raw bytes and the corresponding SQL type is BYTEA. There is a text format for those, but we bypass it for efficiency. The server can use the binary data in the exact same form, without any conversion or extra processing. The binary data is also twice as compact during transport.

(People sometimes ask why we can’t just treat all types as binary. However the general case isn’t so clear-cut. The binary formats are not documented, there are no guarantees that they will be platform-independent or that they will remain stable, and there’s no really solid way to detect when we might get the format wrong. But also, the conversions aren’t necessarily as straightforward and efficient as they sound. So, for the general case, libpqxx sticks with the text formats. Raw binary data alone stands out as a clear win.)

Long story short, the machinery for passing parameters needs to know: is this parameter a binary string, or not? In the normal case it can assume “no,” and that’s what it does. The text format is always a safe choice; we just try to use the binary format where it’s faster.

The param_format function template is what makes the decision. We specialise it for types which may be binary strings, and use the default for all other types.

“Types which may be binary”? You might think we know whether a type is a binary type or not. But there are some complications with generic types.

Templates like std::shared_ptr, std::optional, and so on act like “wrappers” for another type. A std::optional<T> is binary if T is binary. Otherwise, it’s not. If you’re building support for a template of this nature, you’ll probably want to implement param_format for it.

The decision to use binary format is made based on a given object, not necessarily based on the type in general. Look at std::variant. If you have a std::variant type which can hold an int or a binary string, is that a binary parameter? We can’t decide without knowing the individual object.

Containers are another hard case. Should we pass std::vector<T> in binary? Even when T is a binary type, we don’t currently have any way to pass an array in binary format, so we always pass it as text.