Python 使用sklearn在弧度距离矩阵上进行DBSCAN?
我希望对几个时间戳(以分钟为单位)进行聚类。 到目前为止,我所做的是: 1) 将点转换为弧度Python 使用sklearn在弧度距离矩阵上进行DBSCAN?,python,numpy,scipy,scikit-learn,data-mining,Python,Numpy,Scipy,Scikit Learn,Data Mining,我希望对几个时间戳(以分钟为单位)进行聚类。 到目前为止,我所做的是: 1) 将点转换为弧度 #points containing time value in minutes points = [100, 200, 600, 659, 700] def convert_to_radian(x): return((x / (24 * 60)) * 2 * pi) rad_function = np.vectorize(convert_to_radian) points_rad = ra
#points containing time value in minutes
points = [100, 200, 600, 659, 700]
def convert_to_radian(x):
return((x / (24 * 60)) * 2 * pi)
rad_function = np.vectorize(convert_to_radian)
points_rad = rad_function(points)
2) 生成距离矩阵
#generate distance matrix from each point
dist = points_rad[None,:] - points_rad[:, None]
3) 指定距每个点的最短距离
dist[((dist > pi) & (dist <= (2*pi)))] = dist[((dist > pi) & (dist <= (2*pi)))] -(2*pi)
dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] = dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] + (2*pi)
dist = abs(dist)
dist[((dist>pi)&(dist-pi)&(dist(-2*pi))&(dist(-2*pi))&(dist好吧,在深入挖掘之后,我意识到我可以简单地将DBSCAN metric设置为“预计算”,使用.fit()
方法并传递我的距离矩阵。对于那些感兴趣的人,这里是源代码:
import numpy as np
from math import pi
from sklearn.cluster import DBSCAN
#points containing time value in minutes
points = [100, 200, 600, 659, 700]
def convert_to_radian(x):
return((x / (24 * 60)) * 2 * pi)
rad_function = np.vectorize(convert_to_radian)
points_rad = rad_function(points)
#generate distance matrix from each point
dist = points_rad[None,:] - points_rad[:, None]
#Assign shortest distances from each point
dist[((dist > pi) & (dist <= (2*pi)))] = dist[((dist > pi) & (dist <= (2*pi)))] -(2*pi)
dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] = dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] + (2*pi)
dist = abs(dist)
#check dist
print(dist)
#using default values, set metric to 'precomputed'
db = DBSCAN(eps=((100 / (24*60)) * 2 * pi ), min_samples = 2, metric='precomputed')
#check db
print(db)
db.fit(dist)
#get labels
labels = db.labels_
#get number of clusters
no_clusters = len(set(labels)) - (1 if -1 in labels else 0)
print('No of clusters:', no_clusters)
print('Cluster 0 : ', np.nonzero(labels == 0)[0])
print('Cluster 1 : ', np.nonzero(labels == 1)[0])
我认为你的距离矩阵不是一个距离矩阵。为什么你想把一个简单的线性测量值转换成一个角度?(顺便说一句,大多数时钟的刻度盘上都有12小时,而不是24小时)@Anony Mouse是的!我设法弄明白了!@TomMorris我想区分上午9点和晚上9点
[[ 0. 0.43633231 2.18166156 2.43909763 2.61799388]
[ 0.43633231 0. 1.74532925 2.00276532 2.18166156]
[ 2.18166156 1.74532925 0. 0.25743606 0.43633231]
[ 2.43909763 2.00276532 0.25743606 0. 0.17889625]
[ 2.61799388 2.18166156 0.43633231 0.17889625 0. ]]
DBSCAN(algorithm='auto', eps=0.4363323129985824, leaf_size=30,
metric='precomputed', min_samples=2, p=None, random_state=None)
No of clusters: 2
Cluster 0 : [0 1]
Cluster 1 : [2 3 4]