ST_ClusterKMeans

Name

ST_ClusterKMeans — Window function that returns a cluster id for each input geometry using the K-means algorithm.

Synopsis

integer ST_ClusterKMeans ( geometry winset geom , integer number_of_clusters ) ;

Description

Returns 2D distance based K-means cluster number for each input geometry. The distance used for clustering is the distance between the centroids for 2D geometries, and distance between bounding box centers for 3D geometries. For POINT inputs, M coordinate will be treated as weight of input and has to be larger than 0.

Enhanced: 3.1.0 Support for 3D geometries and weights

Availability: 2.3.0

Examples

Generate dummy set of parcels for examples

CREATE TABLE parcels AS
SELECT lpad((row_number() over())::text,3,'0') As parcel_id, geom,
('{residential, commercial}'::text[])[1 + mod(row_number()OVER(),2)] As type
FROM
    ST_Subdivide(ST_Buffer('SRID=3857;LINESTRING(40 100, 98 100, 100 150, 60 90)'::geometry,
    40, 'endcap=square'),12) As geom;

Original Parcels

Parcels color-coded by cluster number (cid)

SELECT ST_ClusterKMeans(geom, 5) OVER() AS cid, parcel_id, geom
FROM parcels;
-- result
 cid | parcel_id |   geom
-----+-----------+---------------
   0 | 001       | 0103000000...
   0 | 002       | 0103000000...
   1 | 003       | 0103000000...
   0 | 004       | 0103000000...
   1 | 005       | 0103000000...
   2 | 006       | 0103000000...
   2 | 007       | 0103000000...
(7 rows)

 -- Partitioning parcel clusters by type
SELECT ST_ClusterKMeans(geom,3) over (PARTITION BY type) AS cid, parcel_id, type
FROM parcels;
-- result
 cid | parcel_id |    type
-----+-----------+-------------
   1 | 005       | commercial
   1 | 003       | commercial
   2 | 007       | commercial
   0 | 001       | commercial
   1 | 004       | residential
   0 | 002       | residential
   2 | 006       | residential
(7 rows)
 -- Clustering points around antimeridian can be done in 3D XYZ CRS, EPSG:4978:

SELECT ST_ClusterKMeans(ST_Transform(ST_Force3D(geom), 4978), 3) over () AS cid, parcel_id, type
FROM parcels;
-- result
┌─────┬───────────┬─────────────┐
│ cid │ parcel_id │    type     │
├─────┼───────────┼─────────────┤
│   1 │ 001       │ commercial  │
│   2 │ 002       │ residential │
│   0 │ 003       │ commercial  │
│   1 │ 004       │ residential │
│   0 │ 005       │ commercial  │
│   2 │ 006       │ residential │
│   0 │ 007       │ commercial  │
└─────┴───────────┴─────────────┘
(7 rows)