Python 如何根据大小筛选DBSCAN生成的群集？_Python_Machine Learning_Scikit Learn_Unsupervised Learning_Dbscan

Python 如何根据大小筛选DBSCAN生成的群集？

python machine-learning scikit-learn

Python 如何根据大小筛选DBSCAN生成的群集？,python,machine-learning,scikit-learn,unsupervised-learning,dbscan,Python,Machine Learning,Scikit Learn,Unsupervised Learning,Dbscan,我已经应用了DBSCAN对数据集执行聚类，该数据集由点云中每个点的X、Y和Z坐标组成。我只想绘制少于100个点的簇。这就是我到目前为止所做的： clustering = DBSCAN(eps=0.1, min_samples=20, metric='euclidean').fit(only_xy) plt.scatter(only_xy[:, 0], only_xy[:, 1], c=clustering.labels_, cmap='rainbow') clusters = c

我已经应用了DBSCAN对数据集执行聚类，该数据集由点云中每个点的X、Y和Z坐标组成。我只想绘制少于100个点的簇。这就是我到目前为止所做的：

clustering = DBSCAN(eps=0.1, min_samples=20, metric='euclidean').fit(only_xy)
plt.scatter(only_xy[:, 0], only_xy[:, 1],
        c=clustering.labels_, cmap='rainbow')
clusters = clustering.components_
#Store the labels
labels = clustering.labels_

#Then get the frequency count of the non-negative labels
counts = np.bincount(labels[labels>=0])

print(counts)

Output: 
[1278  564  208   47   36   30  191   54   24   18   40  915   26   20
   24  527   56  677   63   57   61 1544  512   21   45  187   39  132
   48   55  160   46   28   18   55   48   35   92   29   88   53   55
   24   52  114   49   34   34   38   52   38   53   69]

因此，我已经找到了每个簇中的点数，但我不确定如何仅选择点数小于100的簇

您可以找到计数小于100的标签索引：

ls, cs = np.unique(labels,return_counts=True)
dic = dict(zip(ls,cs))
idx = [i for i,label in enumerate(labels) if dic[label] <100 and label >= 0]

from collections import Counter
labels_with_morethan100=[label for (label,count) in Counter(clustering.labels_).items() if count>100]
clusters_biggerthan100= clustering.components_[np.isin(clustering.labels_[clustering.labels_>=0], labels_with_morethan100)]

我认为如果您运行此代码，您可以获得标签，以及大小超过100的集群的集群组件：

ls, cs = np.unique(labels,return_counts=True)
dic = dict(zip(ls,cs))
idx = [i for i,label in enumerate(labels) if dic[label] <100 and label >= 0]

from collections import Counter
labels_with_morethan100=[label for (label,count) in Counter(clustering.labels_).items() if count>100]
clusters_biggerthan100= clustering.components_[np.isin(clustering.labels_[clustering.labels_>=0], labels_with_morethan100)]

你还想继续吗？在这里发布比解决这个问题需要更长的时间。