比较Python中两行之间相同列元素的数量_Python_Arrays_Sorting_Numpy_Elements

比较Python中两行之间相同列元素的数量

python arrays sorting numpy

比较Python中两行之间相同列元素的数量,python,arrays,sorting,numpy,elements,Python,Arrays,Sorting,Numpy,Elements,我正在尝试编写一个基本脚本，它将帮助我找到行之间有多少类似的列。信息非常简单，例如： array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0]) 我必须在列表的所有排列之间执行这个脚本，所以第1行与第2行比较，第1行与第3行比较，等等任何帮助都将不胜感激。您的标题问题可以通过基本的numpy技巧解决。假设您有一个二维numpy数组a，并且希望比较行m和n： row_m = a[m, :] # this selects row index m and

我正在尝试编写一个基本脚本，它将帮助我找到行之间有多少类似的列。信息非常简单，例如：

array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0])

我必须在列表的所有排列之间执行这个脚本，所以第1行与第2行比较，第1行与第3行比较，等等

任何帮助都将不胜感激。

您的标题问题可以通过基本的numpy技巧解决。假设您有一个二维numpy数组

，并且希望比较行

和

：

row_m = a[m, :] # this selects row index m and all column indices, thus: row m
row_n = a[n, :]
shared = row_m == row_n # this compares row_m and row_n element-by-element storing each individual result (True or False) in a separate cell, the result thus has the same shape as row_m and row_n
overlap = shared.sum() # this sums over all elements in shared, since False is encoded as 0 and True as 1 this returns the number of shared elements.

将此配方应用于所有行对的最简单方法是广播：

 first = a[:, None, :] # None creates a new dimension to make space for a second row axis
 second = a[None, :, :] # Same but new dim in first axis
 # observe that axes 0 and 1 in these two array are arranged as for a distance map
 # a binary operation between arrays so layed out will trigger broadcasting, i.e. numpy will compute all possible pairs in the appropriate positions
 full_overlap_map = first == second # has shape nrow x nrow x ncol
 similarity_table = full_overlap_map.sum(axis=-1) # shape nrow x nrow

如果您可以依赖于所有行都是二进制值，“相似列”计数很简单

def count_sim_cols(row0, row1):
    return np.sum(row0*row1)

如果有更广泛的价值范围的可能性，你只需要用比较代替产品

def count_sim_cols(row0, row1):
     return np.sum(row0 == row1)

如果您想要“相似性”的公差，比如说

tol

，一些较小的值，这只是

def count_sim_cols(row0, row1):
    return np.sum(np.abs(row0 - row1) < tol)

您的示例所需的输出在哪里？你说的第三排是什么？我只看到两排。您的代码无效。如何定义“相似”？

sim_counts = {}
for i in xrange(n):
    for j in xrange(i + 1, n):
        sim_counts[(i, j)] = count_sim_cols(X[i], X[j])