ST_ClusterDBSCAN

Name

ST_ClusterDBSCAN — Windowing function that returns integer id for the cluster each input geometry is in based on 2D implementation of Density-based spatial clustering of applications with noise (DBSCAN) algorithm.

Synopsis

integer ST_ClusterDBSCAN ( geometry winset geom , float8 eps , integer minpoints ) ;

Description

Returns cluster number for each input geometry, based on a 2D implementation of the Density-based spatial clustering of applications with noise (DBSCAN) algorithm. Unlike ST_ClusterKMeans , it does not require the number of clusters to be specified, but instead uses the desired distance ( eps ) and density( minpoints ) parameters to construct each cluster.

An input geometry will be added to a cluster if it is either:

  • A "core" geometry, that is within eps distance (Cartesian) of at least minpoints input geometries (including itself) or

  • A "border" geometry, that is within eps distance of a core geometry.

Note that border geometries may be within eps distance of core geometries in more than one cluster; in this case, either assignment would be correct, and the border geometry will be arbitrarily asssigned to one of the available clusters. In these cases, it is possible for a correct cluster to be generated with fewer than minpoints geometries. When assignment of a border geometry is ambiguous, repeated calls to ST_ClusterDBSCAN will produce identical results if an ORDER BY clause is included in the window definition, but cluster assignments may differ from other implementations of the same algorithm.

[Note]

Input geometries that do not meet the criteria to join any other cluster will be assigned a cluster number of NULL.

Availability: 2.3.0 - requires GEOS

Examples

Assigning a cluster number to each parcel point:

SELECT parcel_id, ST_ClusterDBSCAN(geom, eps := 0.5, minpoints := 5) over () AS cid
FROM parcels;

Combining parcels with the same cluster number into a single geometry. This uses named argument calling

SELECT cid, ST_Collect(geom) AS cluster_geom, array_agg(parcel_id) AS ids_in_cluster FROM (
    SELECT parcel_id, ST_ClusterDBSCAN(geom, eps := 0.5, minpoints := 5) over () AS cid, geom
    FROM parcels) sq
GROUP BY cid;