Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于度量对列表中的元素进行聚类_Python_Python 3.x_Machine Learning_Cluster Analysis - Fatal编程技术网

Python 基于度量对列表中的元素进行聚类

Python 基于度量对列表中的元素进行聚类,python,python-3.x,machine-learning,cluster-analysis,Python,Python 3.x,Machine Learning,Cluster Analysis,我有一个字典列表,这些字典是关键字及其向量距离,我正在尝试应用聚类技术对它们进行分组 # data = [{"key": "str1", "weight": float value}, ...] # distances = [item['weight'] for item in data] distances = [0.004906579754566209, 0.008361678408906337, 0.010228429212122

我有一个字典列表,这些字典是关键字及其向量距离,我正在尝试应用聚类技术对它们进行分组

# data = [{"key": "str1", "weight": float value}, ...]
# distances = [item['weight'] for item in data]
distances = [0.004906579754566209, 0.008361678408906337, 0.010228429212122636, 0.013671005756098031, 0.013671005756098031, 0.013713535105272179]

mean_distances_differences = mean([j-i for i, j in zip(distances[:-1], distances[1:])])
我计算了列表中两个连续元素之间差异的平均值。如果两个元素之间的距离小于平均值,我想对它们进行聚类,因此结果将是

[[0.004906579754566209], [0.008361678408906337], [0.010228429212122636], [0.013671005756098031, 0.013671005756098031, 0.013713535105272179]]
在这里,我想我不能使用knn,因为我不知道会出现多少簇。所以我试过这样做

distances = [item['weight'] for item in data]
mean_distances_differences = mean([j-i for i, j in zip(distances[:-1], distances[1:])])
distances_new = distances
required_list = []
while distances_new:
    temp = []
    if len(distances_new) == 1:
        temp = distances_new
        required_list.append(temp)
        break
    else:
        for i,j in zip(distances_new[:-1], distances_new[1:]):
            if j-1 < mean_distances_differences:
                temp.append(i)
            else:
                break
        distances_new = [_i for _i in distances_new if _i not in temp]
    required_list.append(temp)

有什么办法吗?

你可以使用diff来计算距离,我取绝对值,因为我不确定距离是否会被排序:

import numpy as np
distance_diff = abs(np.diff(distances))
如果不确定距离是否大于某个值,则会将小于阈值的连续元素组合在一起:

np.cumsum(distance_diff > abs(np.mean(distance_diff)))]

array([1, 2, 3, 3, 3])
因此,剩下的就是提供一个起始组0:

np.hstack([0,np.cumsum(distance_diff > abs(np.mean(distance_diff)))])

array([0, 1, 2, 3, 3, 3])

您可以使用diff来计算距离,我采用绝对值,因为我不确定距离是否会被排序:

import numpy as np
distance_diff = abs(np.diff(distances))
如果不确定距离是否大于某个值,则会将小于阈值的连续元素组合在一起:

np.cumsum(distance_diff > abs(np.mean(distance_diff)))]

array([1, 2, 3, 3, 3])
因此,剩下的就是提供一个起始组0:

np.hstack([0,np.cumsum(distance_diff > abs(np.mean(distance_diff)))])

array([0, 1, 2, 3, 3, 3])