ltree
PostgreSQL 9.4.10 Documentation | |||
---|---|---|---|
Prev | Up | Appendix F. Additional Supplied Modules | Next |
This module implements a data type ltree for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.
F.21.1. Definitions
A label is a sequence of alphanumeric characters and underscores (for example, in C locale the characters A-Za-z0-9_ are allowed). Labels must be less than 256 bytes long.
Examples: 42 , Personal_Services
A label path is a sequence of zero or more labels separated by dots, for example L1.L2.L3 , representing a path from the root of a hierarchical tree to a particular node. The length of a label path must be less than 65Kb, but keeping it under 2Kb is preferable. In practice this is not a major limitation; for example, the longest label path in the DMOZ catalog ( http://www.dmoz.org ) is about 240 bytes.
Example: Top.Countries.Europe.Russia
The ltree module provides several data types:
-
ltree stores a label path.
-
lquery represents a regular-expression-like pattern for matching ltree values. A simple word matches that label within a path. A star symbol ( * ) matches zero or more labels. For example:
foo Match the exact label path foo *.foo.* Match any label path containing the label foo *.foo Match any label path whose last label is foo
Star symbols can also be quantified to restrict how many labels they can match:
*{n} Match exactly n labels *{n,} Match at least n labels *{n,m} Match at least n but not more than m labels *{,m} Match at most m labels - same as *{0,m}
There are several modifiers that can be put at the end of a non-star label in lquery to make it match more than just the exact match:
@ Match case-insensitively, for example a@ matches A * Match any label with this prefix, for example foo* matches foobar % Match initial underscore-separated words
The behavior of % is a bit complicated. It tries to match words rather than the entire label. For example foo_bar% matches foo_bar_baz but not foo_barbaz . If combined with * , prefix matching applies to each word separately, for example foo_bar%* matches foo1_bar2_baz but not foo1_br2_baz .
Also, you can write several possibly-modified labels separated with | (OR) to match any of those labels, and you can put ! (NOT) at the start to match any label that doesn't match any of the alternatives.
Here's an annotated example of lquery :
Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain a. b. c. d. e.
This query will match any label path that:
-
begins with the label Top
-
and next has zero to two labels before
-
a label beginning with the case-insensitive prefix sport
-
then a label not matching football nor tennis
-
and then ends with a label beginning with Russ or exactly matching Spain .
-
-
ltxtquery represents a full-text-search-like pattern for matching ltree values. An ltxtquery value contains words, possibly with the modifiers @ , * , % at the end; the modifiers have the same meanings as in lquery . Words can be combined with & (AND), | (OR), ! (NOT), and parentheses. The key difference from lquery is that ltxtquery matches words without regard to their position in the label path.
Here's an example ltxtquery :
Europe & Russia*@ & !Transportation
This will match paths that contain the label Europe and any label beginning with Russia (case-insensitive), but not paths containing the label Transportation . The location of these words within the path is not important. Also, when % is used, the word can be matched to any underscore-separated word within a label, regardless of position.
Note: ltxtquery allows whitespace between symbols, but ltree and lquery do not.
F.21.2. Operators and Functions
Type ltree has the usual comparison operators = , <> , < , > , <= , >= . Comparison sorts in the order of a tree traversal, with the children of a node sorted by label text. In addition, the specialized operators shown in Table F-14 are available.
Table F-14. ltree Operators
Operator | Returns | Description |
---|---|---|
ltree @> ltree | boolean | is left argument an ancestor of right (or equal)? |
ltree <@ ltree | boolean | is left argument a descendant of right (or equal)? |
ltree ~ lquery | boolean | does ltree match lquery ? |
lquery ~ ltree | boolean | does ltree match lquery ? |
ltree ? lquery[] | boolean | does ltree match any lquery in array? |
lquery[] ? ltree | boolean | does ltree match any lquery in array? |
ltree @ ltxtquery | boolean | does ltree match ltxtquery ? |
ltxtquery @ ltree | boolean | does ltree match ltxtquery ? |
ltree || ltree | ltree | concatenate ltree paths |
ltree || text | ltree | convert text to ltree and concatenate |
text || ltree | ltree | convert text to ltree and concatenate |
ltree[] @> ltree | boolean | does array contain an ancestor of ltree ? |
ltree <@ ltree[] | boolean | does array contain an ancestor of ltree ? |
ltree[] <@ ltree | boolean | does array contain a descendant of ltree ? |
ltree @> ltree[] | boolean | does array contain a descendant of ltree ? |
ltree[] ~ lquery | boolean | does array contain any path matching lquery ? |
lquery ~ ltree[] | boolean | does array contain any path matching lquery ? |
ltree[] ? lquery[] | boolean | does ltree array contain any path matching any lquery ? |
lquery[] ? ltree[] | boolean | does ltree array contain any path matching any lquery ? |
ltree[] @ ltxtquery | boolean | does array contain any path matching ltxtquery ? |
ltxtquery @ ltree[] | boolean | does array contain any path matching ltxtquery ? |
ltree[] ?@> ltree | ltree | first array entry that is an ancestor of ltree ; NULL if none |
ltree[] ?<@ ltree | ltree | first array entry that is a descendant of ltree ; NULL if none |
ltree[] ?~ lquery | ltree | first array entry that matches lquery ; NULL if none |
ltree[] ?@ ltxtquery | ltree | first array entry that matches ltxtquery ; NULL if none |
The operators <@ , @> , @ and ~ have analogues ^<@ , ^@> , ^@ , ^~ , which are the same except they do not use indexes. These are useful only for testing purposes.
The available functions are shown in Table F-15 .
Table F-15. ltree Functions
Function | Return Type | Description | Example | Result |
---|---|---|---|---|
subltree(ltree, int start, int end)
|
ltree | subpath of ltree from position start to position end -1 (counting from 0) | subltree('Top.Child1.Child2',1,2) | Child1 |
subpath(ltree, int offset, int len)
|
ltree | subpath of ltree starting at position offset , length len . If offset is negative, subpath starts that far from the end of the path. If len is negative, leaves that many labels off the end of the path. | subpath('Top.Child1.Child2',0,2) | Top.Child1 |
subpath(ltree, int offset)
|
ltree | subpath of ltree starting at position offset , extending to end of path. If offset is negative, subpath starts that far from the end of the path. | subpath('Top.Child1.Child2',1) | Child1.Child2 |
nlevel(ltree)
|
integer | number of labels in path | nlevel('Top.Child1.Child2') | 3 |
index(ltree a, ltree b)
|
integer | position of first occurrence of b in a ; -1 if not found | index('0.1.2.3.5.4.5.6.8.5.6.8','5.6') | 6 |
index(ltree a, ltree b, int offset)
|
integer | position of first occurrence of b in a , searching starting at offset ; negative offset means start -offset labels from the end of the path | index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4) | 9 |
text2ltree(text)
|
ltree | cast text to ltree | ||
ltree2text(ltree)
|
text | cast ltree to text | ||
lca(ltree, ltree, ...)
|
ltree | lowest common ancestor, i.e., longest common prefix of paths (up to 8 arguments supported) | lca('1.2.2.3','1.2.3.4.5.6') | 1.2 |
lca(ltree[])
|
ltree | lowest common ancestor, i.e., longest common prefix of paths | lca(array['1.2.2.3'::ltree,'1.2.3']) | 1.2 |
F.21.3. Indexes
ltree supports several types of indexes that can speed up the indicated operators:
-
B-tree index over ltree : < , <= , = , >= , >
-
GiST index over ltree : < , <= , = , >= , > , @> , <@ , @ , ~ , ?
Example of creating such an index:
CREATE INDEX path_gist_idx ON test USING GIST (path);
-
GiST index over ltree[] : ltree[] <@ ltree , ltree @> ltree[] , @ , ~ , ?
Example of creating such an index:
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
Note: This index type is lossy.
F.21.4. Example
This example uses the following data (also available in file contrib/ltree/ltreetest.sql in the source distribution):
CREATE TABLE test (path ltree); INSERT INTO test VALUES ('Top'); INSERT INTO test VALUES ('Top.Science'); INSERT INTO test VALUES ('Top.Science.Astronomy'); INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics'); INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology'); INSERT INTO test VALUES ('Top.Hobbies'); INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy'); INSERT INTO test VALUES ('Top.Collections'); INSERT INTO test VALUES ('Top.Collections.Pictures'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies'); INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts'); CREATE INDEX path_gist_idx ON test USING gist(path); CREATE INDEX path_idx ON test USING btree(path);
Now, we have a table test populated with data describing the hierarchy shown below:
Top / | \ Science Hobbies Collections / | \ Astronomy Amateurs_Astronomy Pictures / \ | Astrophysics Cosmology Astronomy / | \ Galaxies Stars Astronauts
We can do inheritance:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science'; path ------------------------------------ Top.Science Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (4 rows)
Here are some examples of path matching:
ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*'; path ----------------------------------------------- Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts (7 rows) ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (3 rows)
Here are some examples of full text search:
ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Hobbies.Amateurs_Astronomy (4 rows) ltreetest=> SELECT path FROM test WHERE path @ 'Astro* & !pictures@'; path ------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology (3 rows)
Path construction using functions:
ltreetest=> SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test WHERE path <@ 'Top.Science.Astronomy'; ?column? ------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology (3 rows)
We could simplify this by creating a SQL function that inserts a label at a specified position in a path:
CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' LANGUAGE SQL IMMUTABLE; ltreetest=> SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top.Science.Astronomy'; ins_label ------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology (3 rows)
F.21.5. Authors
All work was done by Teodor Sigaev (
<
teodor@stack.net
>
) and
Oleg Bartunov (
<
oleg@sai.msu.su
>
). See
http://www.sai.msu.su/~megera/postgres/gist/
for
additional information. Authors would like to thank Eugeny Rodichev for
helpful discussions. Comments and bug reports are welcome.