ST_ClusterDBSCAN

Name

ST_ClusterDBSCAN — Windowing function that returns integer id for the cluster each input geometry is in based on 2D implementation of Density-based spatial clustering of applications with noise (DBSCAN) algorithm.

Synopsis

integer ST_ClusterDBSCAN ( geometry winset geom , float8 eps , integer minpoints ) ;

Description

Returns cluster number for each input geometry, based on a 2D implementation of the Density-based spatial clustering of applications with noise (DBSCAN) algorithm. Unlike ST_ClusterKMeans , it does not require the number of clusters to be specified, but instead uses the desired distance ( eps ) and density( minpoints ) parameters to construct each cluster.

An input geometry will be added to a cluster if it is either:

  • A "core" geometry, that is within eps distance (Cartesian) of at least minpoints input geometries (including itself) or

  • A "border" geometry, that is within eps distance of a core geometry.

Note that border geometries may be within eps distance of core geometries in more than one cluster; in this case, either assignment would be correct, and the border geometry will be arbitrarily asssigned to one of the available clusters. In these cases, it is possible for a correct cluster to be generated with fewer than minpoints geometries. When assignment of a border geometry is ambiguous, repeated calls to ST_ClusterDBSCAN will produce identical results if an ORDER BY clause is included in the window definition, but cluster assignments may differ from other implementations of the same algorithm.

[Note]

Input geometries that do not meet the criteria to join any other cluster will be assigned a cluster number of NULL.

Availability: 2.3.0 - requires GEOS

Examples

Assigning a cluster number to each polygon within 50 meters of each other. Require at least 2 polygons per cluster

within 50 meters at least 2 per cluster. singletons have NULL for cid

SELECT name, ST_ClusterDBSCAN(geom, eps := 50, minpoints := 2) over () AS cid
FROM boston_polys
WHERE name > '' AND building > ''
	AND ST_DWithin(geom,
        ST_Transform(
            ST_GeomFromText('POINT(-71.04054 42.35141)', 4326), 26986),
           500);

                name                 | bucket
-------------------------------------+--------
 Manulife Tower                      |      0
 Park Lane Seaport I                 |      0
 Park Lane Seaport II                |      0
 Renaissance Boston Waterfront Hotel |      0
 Seaport Boston Hotel                |      0
 Seaport Hotel & World Trade Center  |      0
 Waterside Place                     |      0
 World Trade Center East             |      0
 100 Northern Avenue                 |      1
 100 Pier 4                          |      1
 The Institute of Contemporary Art   |      1
 101 Seaport                         |      2
 District Hall                       |      2
 One Marina Park Drive               |      2
 Twenty Two Liberty                  |      2
 Vertex                              |      2
 Vertex                              |      2
 Watermark Seaport                   |      2
 Blue Hills Bank Pavilion            |   NULL
 World Trade Center West             |   NULL
(20 rows)

Combining parcels with the same cluster number into a single geometry. This uses named argument calling

SELECT cid, ST_Collect(geom) AS cluster_geom, array_agg(parcel_id) AS ids_in_cluster FROM (
    SELECT parcel_id, ST_ClusterDBSCAN(geom, eps := 0.5, minpoints := 5) over () AS cid, geom
    FROM parcels) sq
GROUP BY cid;