Python 大量列表的快速比较_Python_List_Data Structures

Python 大量列表的快速比较

python list data-structures

Python 大量列表的快速比较,python,list,data-structures,Python,List,Data Structures,比较列表列表之前已经发布过，但是我正在使用的python环境不能完全集成numpy中的所有方法和类。我也不能进口熊猫我试图比较一个大列表中的列表，并得出大约8-10个列表，这些列表与大列表中的所有其他列表大致相同如果我有的话，我的方法很好用。基本上，你的目标是一个聚类操作（即通过K

比较列表列表之前已经发布过，但是我正在使用的python环境不能完全集成numpy中的所有方法和类。我也不能进口熊猫

我试图比较一个大列表中的列表，并得出大约8-10个列表，这些列表与大列表中的所有其他列表大致相同

如果我有的话，我的方法很好用。基本上，你的目标是一个聚类操作（即通过K 我不确定你所说的“不能完全集成numpy中的所有方法和类”是什么意思，但如果scikit learn可用，你可以使用它的。如果这是不可能的，一个简单版本的K-means算法就是，你可以使用它

以下是使用scikit学习的k-means方法：

# 100 lists of length 10 = 100 points in 10 dimensions
from random import random
big_list = [[random() for i in range(10)] for j in range(100)]

# compute eight representative points
from sklearn.cluster import KMeans
model = KMeans(n_clusters=8)
model.fit(big_list)
centers = model.cluster_centers_
print(centers.shape)  # (8, 10)

# this is the sum of square distances of your points to the cluster centers
# you can adjust n_clusters until this is small enough for your purposes.
sum_sq_dists = model.inertia_

例如，从这里可以找到每个簇中距离其中心最近的点，并将其视为平均值。如果没有你试图解决的问题的更多细节，很难确定。但是，像这样的聚类方法将是解决您在问题中所述问题的最有效的方法。

您能不能使用numpy？你说你不能，但看起来你的代码确实在使用numpy！使用

j不是k

是错误的。使用

j！=k

。什么是平面阵列？正如OP中提到的，我“无法完全集成numpy中的所有方法和类”。我可以使用一些numpy方法和类，但不是全部。在其中一条评论中，我注意到我不能使用

numpy.allclose

Daniel，感谢您指出错误。我据此编辑了代码您是否考虑过在

中实现

Python

扩展？谢谢！我非常感谢您的反馈。为了澄清numpy主题，我使用了一个带有特定python解释器的相当复杂的python虚拟环境设置。有些方法/类（如numpy.subtract）工作得很好，而其他方法/类（如numpy.all）工作得不好。是虚拟环境和Python解释器导致了这些问题，并且没有快速简单的解决方法。我会看看是否可以导入scikit learn（祈祷吧！）。再次感谢！您正在使用pypy+numpy吗？在这种情况下，scikit learn和其他工具将无法工作，而您必须从头开始编写这类算法。再次感谢所有反馈。不幸的是，我无法导入scikit learn，所以我想我必须从头开始编写这个算法

# 100 lists of length 10 = 100 points in 10 dimensions
from random import random
big_list = [[random() for i in range(10)] for j in range(100)]

# compute eight representative points
from sklearn.cluster import KMeans
model = KMeans(n_clusters=8)
model.fit(big_list)
centers = model.cluster_centers_
print(centers.shape)  # (8, 10)

# this is the sum of square distances of your points to the cluster centers
# you can adjust n_clusters until this is small enough for your purposes.
sum_sq_dists = model.inertia_