Sql Bigquery:在ST_GEOGPOINT数组中使用ST_CLUSTERDBSCAN

Sql Bigquery:在ST_GEOGPOINT数组中使用ST_CLUSTERDBSCAN,sql,google-bigquery,geolocation,Sql,Google Bigquery,Geolocation,我想使用ST_CLUSTERDBSCAN对地质点进行聚类。 Bigquery页面中的示例如下: WITH Geos as (SELECT 1 as row_id, st_geogfromtext('point empty') as geo UNION ALL SELECT 2, st_geogfromtext('multipoint(1 1, 2 2, 4 4, 5 2)') UNION ALL SELECT 3, st_geogfromtext('point(14 15)'

我想使用ST_CLUSTERDBSCAN对地质点进行聚类。 Bigquery页面中的示例如下:

WITH Geos as
  (SELECT 1 as row_id, st_geogfromtext('point empty') as geo UNION ALL
    SELECT 2, st_geogfromtext('multipoint(1 1, 2 2, 4 4, 5 2)') UNION ALL
    SELECT 3, st_geogfromtext('point(14 15)') UNION ALL
    SELECT 4, st_geogfromtext('linestring(40 1, 42 34, 44 39)') UNION ALL
    SELECT 5, st_geogfromtext('polygon((40 2, 40 1, 41 2, 40 2))'))
SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM
Geos ORDER BY row_id
+--------+-----------------------------------+-------------+
| row_id |                geo                | cluster_num |
+--------+-----------------------------------+-------------+
|      1 |          GEOMETRYCOLLECTION EMPTY |        NULL |
|      2 |    MULTIPOINT(1 1, 2 2, 5 2, 4 4) |           0 |
|      3 |                      POINT(14 15) |           1 |
|      4 |    LINESTRING(40 1, 42 34, 44 39) |           2 |
|      5 | POLYGON((40 2, 40 1, 41 2, 40 2)) |           2 |
+--------+-----------------------------------+-------------+
在我的代码中,我有一个集合在一起的点数组。 然而,这似乎没有工作,即使我在结果中看到多点

我的代码:

    ST_CLUSTERDBSCAN(ST_UNION_AGG(buyer_geo_point), 1e4, 2) OVER () AS cluster_num ,
    ST_UNION_AGG(buyer_geo_point)
结果为空或值完全错误:

null
POINT(-41.5320687976469 -20.3600487114797)

null
MULTIPOINT(-39.0833794 -5.9597183, -39.00682744 -5.73228798)

null
POINT(-40.224447061747 -17.3677128083793)

null
POINT(-40.10711168 -18.08920528)
32  
null
POINT(-41.10854564 -21.47675214)

null
POINT(-51.11207578 -20.64520046)

117
MULTIPOINT(-38.08106136 -11.94490164, -38.06814822 -11.94196154)
    
117
MULTIPOINT(-38.07860266 -11.94484066, -38.0786308 -11.9448231, -38.0787098 -11.9447567, -38.0786912 -11.9447861, -38.0676091 -11.9453678)

null
MULTIPOINT(-39.98731268 -14.8426174, -39.98782804 -14.84623434)
更新: 我找到了一个解决方案,将每个簇上的点标记出来

WITH  merchant_cluster as (SELECT  
        pl_gl.merchant_id, 
        ST_CLUSTERDBSCAN(buyer_geo_point, 1e3, 1) OVER (Partition by merchant_id) as clusters ,
        buyer_geo_point
    FROM `geo-info-table` as geo
    LEFT JOIN `merchants-table` as m on geo.merchant_id = m.user_id
    LEFT JOIN `adresses-table` as add on m.user_id = add.user_id
)
SELECT merchant_id, STRUCT(ARRAY_AGG(IFNULL(clusters,-1)) as cluster_id, ARRAY_AGG(buyer_geo_point) as point) FROM merchant_cluster 
GROUP BY merchant_id
试试下面

select cluster_num, ST_UNION_AGG(buyer_geo_point) geo_cluster 
from (
  select buyer_geo_point,
    ST_CLUSTERDBSCAN(buyer_geo_point, 1e4, 2) OVER () AS cluster_num
  from `project.dataset.table`
)
group by cluster_num
我尝试使用问题中暴露的点来模拟您的数据,并使用上述代码得到以下结果(注意-我使用了ST_CLUSTERDBSCAN(buyer_geo_point,200000,1),因为点集非常小)

下面是该结果的可视化-每个簇指定了单独的颜色


谢谢。我想发现集群,并用集群id标记每个点。我在一所大学的帮助下想出了如何做到这一点。我将用解决方案更新我的问题。你的解决方案有效,与我们的解决方案类似。不同之处在于,我需要为每个商户加入,因此需要覆盖(按商户id划分)。我想了解为什么我的第一种方法不起作用。PS:您正在使用的地图可视化工具是什么?这是Goliath—不仅仅是可视化工具—而是BigQuery的IDE—该套件的一部分。该套件中的另一个工具是Magnus-Workflow Automator。支持所有BigQuery、云存储和大多数Google API,以及多个简单实用程序类型的任务,如BigQuery任务、导出到存储任务、循环任务等,并具有高级调度、触发等功能。披露:我是这些工具的创建者和Potens团队的领导者(见我的SO档案)