F.41. tablefunc - functions that return tables (crosstab and others)
  The
  
   tablefunc
  
  module includes various functions that return
  tables (that is, multiple rows).  These functions are useful both in their
  own right and as examples of how to write C functions that return
  multiple rows.
 
  This module is considered
  
   "
   
    trusted
   
   "
  
  , that is, it can be
  installed by non-superusers who have
  
   CREATE
  
  privilege
  on the current database.
 
F.41.1. Functions Provided #
   
    Table F.31
   
   summarizes the functions provided
   by the
   
    tablefunc
   
   module.
  
    
     Table F.31. 
     
      tablefunc
     
     Functions
    
   
       F.41.1.1. 
       
        normal_rand
       
       
        #
       
      
     normal_rand(int numvals, float8 mean, float8 stddev) returns setof float8
    
     normal_rand
    
    produces a set of normally distributed random
     values (Gaussian distribution).
   
    
     
      numvals
     
    
    is the number of values to be returned
     from the function.
    
     
      mean
     
    
    is the mean of the normal
     distribution of values and
    
     
      stddev
     
    
    is the standard
     deviation of the normal distribution of values.
   
For example, this call requests 1000 values with a mean of 5 and a standard deviation of 3:
test=# SELECT * FROM normal_rand(1000, 5, 3);
     normal_rand
----------------------
     1.56556322244898
     9.10040991424657
     5.36957140345079
   -0.369151492880995
    0.283600703686639
       .
       .
       .
     4.82992125404908
     9.71308014517282
     2.49639286969028
(1000 rows)
  
       F.41.1.2. 
       
        crosstab(text)
       
       
        #
       
      
     crosstab(text sql) crosstab(text sql, int N)
    The
    
     crosstab
    
    function is used to produce
    
     "
     
      pivot
     
     "
    
    displays, wherein data is listed across the page rather than down.
    For example, we might have data like
   
row1 val11 row1 val12 row1 val13 ... row2 val21 row2 val22 row2 val23 ...
which we wish to display like
row1 val11 val12 val13 ... row2 val21 val22 val23 ... ...
    The
    
     crosstab
    
    function takes a text parameter that is an SQL
    query producing raw data formatted in the first way, and produces a table
    formatted in the second way.
   
    The
    
     
      sql
     
    
    parameter is an SQL statement that produces
    the source set of data. This statement must return one
    
     row_name
    
    column, one
    
     category
    
    column, and one
    
     value
    
    column.
    
     
      N
     
    
    is an
    obsolete parameter, ignored if supplied (formerly this had to match the
    number of output value columns, but now that is determined by the
    calling query).
   
For example, the provided query might produce a set something like:
row_name cat value ----------+-------+------- row1 cat1 val1 row1 cat2 val2 row1 cat3 val3 row1 cat4 val4 row2 cat1 val5 row2 cat2 val6 row2 cat3 val7 row2 cat4 val8
    The
    
     crosstab
    
    function is declared to return
    
     setof
    record
    
    , so the actual names and types of the output columns must be
    defined in the
    
     FROM
    
    clause of the calling
    
     SELECT
    
    statement, for example:
   
SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
   This example produces a set something like:
           <== value  columns  ==>
 row_name   category_1   category_2
----------+------------+------------
  row1        val1         val2
  row2        val5         val6
   
    The
    
     FROM
    
    clause must define the output as one
    
     row_name
    
    column (of the same data type as the first result
    column of the SQL query) followed by N
    
     value
    
    columns
    (all of the same data type as the third result column of the SQL query).
    You can set up as many output value columns as you wish.  The names of the
    output columns are up to you.
   
    The
    
     crosstab
    
    function produces one output row for each
    consecutive group of input rows with the same
    
     row_name
    
    value.  It fills the output
    
     value
    
    columns, left to right, with the
    
     value
    
    fields from these rows.  If there
    are fewer rows in a group than there are output
    
     value
    
    columns, the extra output columns are filled with nulls; if there are
    more rows, the extra input rows are skipped.
   
    In practice the SQL query should always specify
    
     ORDER BY 1,2
    
    to ensure that the input rows are properly ordered, that is, values with
    the same
    
     row_name
    
    are brought together and
    correctly ordered within the row.  Notice that
    
     crosstab
    
    itself does not pay any attention to the second column of the query
    result; it's just there to be ordered by, to control the order in which
    the third-column values appear across the page.
   
Here is a complete example:
CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');
SELECT *
FROM crosstab(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)
   
    You can avoid always having to write out a
    
     FROM
    
    clause to
    define the output columns, by setting up a custom crosstab function that
    has the desired output row type wired into its definition.  This is
    described in the next section.  Another possibility is to embed the
    required
    
     FROM
    
    clause in a view definition.
   
Note
     See also the
     
      
       \crosstabview
      
     
     command in
     
      psql
     
     , which provides functionality similar
     to
     
      crosstab()
     
     .
    
       F.41.1.3. 
       
        crosstab
        
         
          N
         
        
        (text)
       
       
        #
       
      
     
          N
         
        
        (text)
       
crosstabN(text sql)
   
    The
    
     crosstab
     
      
    functions are examples of how
     to set up custom wrappers for the general
    
       N
      
     
    
     crosstab
    
    function,
     so that you need not write out column names and types in the calling
    
     SELECT
    
    query.  The
    
     tablefunc
    
    module includes
    
     crosstab2
    
    ,
    
     crosstab3
    
    , and
    
     crosstab4
    
    , whose output row types are defined as
   
CREATE TYPE tablefunc_crosstab_N AS (
    row_name TEXT,
    category_1 TEXT,
    category_2 TEXT,
        .
        .
        .
    category_N TEXT
);
   
    Thus, these functions can be used directly when the input query produces
    
     row_name
    
    and
    
     value
    
    columns of type
    
     text
    
    , and you want 2, 3, or 4 output values columns.
     In all other ways they behave exactly as described above for the
     general
    
     crosstab
    
    function.
   
For instance, the example given in the previous section would also work as
SELECT * FROM crosstab3( 'select rowid, attribute, value from ct where attribute = ''att2'' or attribute = ''att3'' order by 1,2');
    These functions are provided mostly for illustration purposes. You
     can create your own return types and functions based on the
     underlying
    
     crosstab()
    
    function.  There are two ways
     to do it:
   
- 
      Create a composite type describing the desired output columns, similar to the examples in contrib/tablefunc/tablefunc--1.0.sql. Then define a unique function name accepting onetextparameter and returningsetof your_type_name, but linking to the same underlyingcrosstabC function. For example, if your source data produces row names that aretext, and values that arefloat8, and you want 5 value columns:CREATE TYPE my_crosstab_float8_5_cols AS ( my_row_name text, my_category_1 float8, my_category_2 float8, my_category_3 float8, my_category_4 float8, my_category_5 float8 ); CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text) RETURNS setof my_crosstab_float8_5_cols AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
- 
      Use OUTparameters to define the return type implicitly. The same example could also be done this way:CREATE OR REPLACE FUNCTION crosstab_float8_5_cols( IN text, OUT my_row_name text, OUT my_category_1 float8, OUT my_category_2 float8, OUT my_category_3 float8, OUT my_category_4 float8, OUT my_category_5 float8) RETURNS setof record AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
       F.41.1.4. 
       
        crosstab(text, text)
       
       
        #
       
      
     crosstab(text source_sql, text category_sql)
    The main limitation of the single-parameter form of
    
     crosstab
    
    is that it treats all values in a group alike, inserting each value into
    the first available column.  If you want the value
    columns to correspond to specific categories of data, and some groups
    might not have data for some of the categories, that doesn't work well.
    The two-parameter form of
    
     crosstab
    
    handles this case by
    providing an explicit list of the categories corresponding to the
    output columns.
   
    
     
      source_sql
     
    
    is an SQL statement that produces the
    source set of data.  This statement must return one
    
     row_name
    
    column, one
    
     category
    
    column, and one
    
     value
    
    column. It may also have one or more
    
     "
     
      extra
     
     "
    
    columns.
    The
    
     row_name
    
    column must be first. The
    
     category
    
    and
    
     value
    
    columns must be the last two columns, in that order.  Any columns between
    
     row_name
    
    and
    
     category
    
    are treated as
    
     "
     
      extra
     
     "
    
    .
    The
    
     "
     
      extra
     
     "
    
    columns are expected to be the same for all rows
    with the same
    
     row_name
    
    value.
   
    For example,
    
     
      source_sql
     
    
    might produce a set
    something like:
   
SELECT row_name, extra_col, cat, value FROM foo ORDER BY 1; row_name extra_col cat value ----------+------------+-----+--------- row1 extra1 cat1 val1 row1 extra1 cat2 val2 row1 extra1 cat4 val4 row2 extra2 cat1 val5 row2 extra2 cat2 val6 row2 extra2 cat3 val7 row2 extra2 cat4 val8
    
     
      category_sql
     
    
    is an SQL statement that produces
    the set of categories. This statement must return only one column.
    It must produce at least one row, or an error will be generated.
    Also, it must not produce duplicate values, or an error will be
    generated.
    
     
      category_sql
     
    
    might be something like:
   
SELECT DISTINCT cat FROM foo ORDER BY 1;
    cat
  -------
    cat1
    cat2
    cat3
    cat4
   
    The
    
     crosstab
    
    function is declared to return
    
     setof
    record
    
    , so the actual names and types of the output columns must be
    defined in the
    
     FROM
    
    clause of the calling
    
     SELECT
    
    statement, for example:
   
SELECT * FROM crosstab('...', '...')
    AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
   
This will produce a result something like:
                  <==  value  columns   ==>
row_name   extra   cat1   cat2   cat3   cat4
---------+-------+------+------+------+------
  row1     extra1  val1   val2          val4
  row2     extra2  val5   val6   val7   val8
   
    The
    
     FROM
    
    clause must define the proper number of output
    columns of the proper data types.  If there are
    
     
      N
     
    
    columns in the
    
     
      source_sql
     
    
    query's result, the first
    
     
      N
     
    
    -2 of them must match up with the first
    
     
      N
     
    
    -2 output columns.  The remaining output columns
    must have the type of the last column of the
    
     
      source_sql
     
    
    query's result, and there must be exactly as many of them as there
    are rows in the
    
     
      category_sql
     
    
    query's result.
   
    The
    
     crosstab
    
    function produces one output row for each
    consecutive group of input rows with the same
    
     row_name
    
    value.  The output
    
     row_name
    
    column, plus any
    
     "
     
      extra
     
     "
    
    columns, are copied from the first row of the group.  The output
    
     value
    
    columns are filled with the
    
     value
    
    fields from rows having matching
    
     category
    
    values.  If a row's
    
     category
    
    does not match any output of the
    
     
      category_sql
     
    
    query, its
    
     value
    
    is ignored.  Output
    columns whose matching category is not present in any input row
    of the group are filled with nulls.
   
    In practice the
    
     
      source_sql
     
    
    query should always
    specify
    
     ORDER BY 1
    
    to ensure that values with the same
    
     row_name
    
    are brought together.  However,
    ordering of the categories within a group is not important.
    Also, it is essential to be sure that the order of the
    
     
      category_sql
     
    
    query's output matches the specified
    output column order.
   
Here are two complete examples:
create table sales(year int, month int, qty int); insert into sales values(2007, 1, 1000); insert into sales values(2007, 2, 1500); insert into sales values(2007, 7, 500); insert into sales values(2007, 11, 1500); insert into sales values(2007, 12, 2000); insert into sales values(2008, 1, 1000); select * from crosstab( 'select year, month, qty from sales order by 1', 'select m from generate_series(1,12) m' ) as ( year int, "Jan" int, "Feb" int, "Mar" int, "Apr" int, "May" int, "Jun" int, "Jul" int, "Aug" int, "Sep" int, "Oct" int, "Nov" int, "Dec" int ); year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec ------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------ 2007 | 1000 | 1500 | | | | | 500 | | | | 1500 | 2000 2008 | 1000 | | | | | | | | | | | (2 rows)
CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
  'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
  'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
       rowid text,
       rowdt timestamp,
       temperature int4,
       test_result text,
       test_startdate timestamp,
       volts float8
);
 rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
 test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
 test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
   
    You can create predefined functions to avoid having to write out
    the result column names and types in each query.  See the examples
    in the previous section.  The underlying C function for this form
    of
    
     crosstab
    
    is named
    
     crosstab_hash
    
    .
   
       F.41.1.5. 
       
        connectby
       
       
        #
       
      
     
connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])
   
    The
    
     connectby
    
    function produces a display of hierarchical
    data that is stored in a table.  The table must have a key field that
    uniquely identifies rows, and a parent-key field that references the
    parent (if any) of each row.
    
     connectby
    
    can display the
    sub-tree descending from any row.
   
Table F.32 explains the parameters.
     
      Table F.32. 
      
       connectby
      
      Parameters
     
    
| Parameter | Description | 
|---|---|
| 
           relname
           | Name of the source relation | 
| 
           keyid_fld
           | Name of the key field | 
| 
           parent_keyid_fld
           | Name of the parent-key field | 
| 
           orderby_fld
           | Name of the field to order siblings by (optional) | 
| 
           start_with
           | Key value of the row to start at | 
| 
           max_depth
           | Maximum depth to descend to, or zero for unlimited depth | 
| 
           branch_delim
           | String to separate keys with in branch output (optional) | 
    The key and parent-key fields can be any data type, but they must be
     the same type.  Note that the
    
     
      start_with
     
    
    value must be
     entered as a text string, regardless of the type of the key field.
   
    The
    
     connectby
    
    function is declared to return
    
     setof
     record
    
    , so the actual names and types of the output columns must be
     defined in the
    
     FROM
    
    clause of the calling
    
     SELECT
    
    statement, for example:
   
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
    AS t(keyid text, parent_keyid text, level int, branch text, pos int);
   
    The first two output columns are used for the current row's key and
     its parent row's key; they must match the type of the table's key field.
     The third output column is the depth in the tree and must be of type
    
     integer
    
    .  If a
    
     
      branch_delim
     
    
    parameter was
     given, the next output column is the branch display and must be of type
    
     text
    
    .  Finally, if an
    
     
      orderby_fld
     
    
    parameter was given, the last output column is a serial number, and must
     be of type
    
     integer
    
    .
   
    The
    
     "
     
      branch
     
     "
    
    output column shows the path of keys taken to
     reach the current row.  The keys are separated by the specified
    
     
      branch_delim
     
    
    string.  If no branch display is
     wanted, omit both the
    
     
      branch_delim
     
    
    parameter
     and the branch column in the output column list.
   
    If the ordering of siblings of the same parent is important,
     include the
    
     
      orderby_fld
     
    
    parameter to
     specify which field to order siblings by.  This field can be of any
     sortable data type.  The output column list must include a final
     integer serial-number column, if and only if
    
     
      orderby_fld
     
    
    is specified.
   
    The parameters representing table and field names are copied as-is
     into the SQL queries that
    
     connectby
    
    generates internally.
     Therefore, include double quotes if the names are mixed-case or contain
     special characters.  You may also need to schema-qualify the table name.
   
In large tables, performance will be poor unless there is an index on the parent-key field.
    It is important that the
    
     
      branch_delim
     
    
    string
     not appear in any key values, else
    
     connectby
    
    may incorrectly
     report an infinite-recursion error.  Note that if
    
     
      branch_delim
     
    
    is not provided, a default value
     of
    
     ~
    
    is used for recursion detection purposes.
   
Here is an example:
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text);
 keyid | parent_keyid | level |       branch
-------+--------------+-------+---------------------
 row2  |              |     0 | row2
 row4  | row2         |     1 | row2~row4
 row6  | row4         |     2 | row2~row4~row6
 row8  | row6         |     3 | row2~row4~row6~row8
 row5  | row2         |     1 | row2~row5
 row9  | row5         |     2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int);
 keyid | parent_keyid | level
-------+--------------+-------
 row2  |              |     0
 row4  | row2         |     1
 row6  | row4         |     2
 row8  | row6         |     3
 row5  | row2         |     1
 row9  | row5         |     2
(6 rows)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text, pos int);
 keyid | parent_keyid | level |       branch        | pos
-------+--------------+-------+---------------------+-----
 row2  |              |     0 | row2                |   1
 row5  | row2         |     1 | row2~row5           |   2
 row9  | row5         |     2 | row2~row5~row9      |   3
 row4  | row2         |     1 | row2~row4           |   4
 row6  | row4         |     2 | row2~row4~row6      |   5
 row8  | row6         |     3 | row2~row4~row6~row8 |   6
(6 rows)
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int, pos int);
 keyid | parent_keyid | level | pos
-------+--------------+-------+-----
 row2  |              |     0 |   1
 row5  | row2         |     1 |   2
 row9  | row5         |     2 |   3
 row4  | row2         |     1 |   4
 row6  | row4         |     2 |   5
 row8  | row6         |     3 |   6
(6 rows)
   
F.41.2. Author #
Joe Conway