比较Python中两行之间相同列元素的数量

比较Python中两行之间相同列元素的数量,python,arrays,sorting,numpy,elements,Python,Arrays,Sorting,Numpy,Elements,我正在尝试编写一个基本脚本,它将帮助我找到行之间有多少类似的列。信息非常简单,例如: array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0]) 我必须在列表的所有排列之间执行这个脚本,所以第1行与第2行比较,第1行与第3行比较,等等 任何帮助都将不胜感激。您的标题问题可以通过基本的numpy技巧解决。假设您有一个二维numpy数组a,并且希望比较行m和n: row_m = a[m, :] # this selects row index m and

我正在尝试编写一个基本脚本,它将帮助我找到行之间有多少类似的列。信息非常简单,例如:

array = np.array([0 1 0 0 1 0 0], [0 0 1 0 1 1 0])
我必须在列表的所有排列之间执行这个脚本,所以第1行与第2行比较,第1行与第3行比较,等等


任何帮助都将不胜感激。

您的标题问题可以通过基本的numpy技巧解决。假设您有一个二维numpy数组
a
,并且希望比较行
m
n

row_m = a[m, :] # this selects row index m and all column indices, thus: row m
row_n = a[n, :]
shared = row_m == row_n # this compares row_m and row_n element-by-element storing each individual result (True or False) in a separate cell, the result thus has the same shape as row_m and row_n
overlap = shared.sum() # this sums over all elements in shared, since False is encoded as 0 and True as 1 this returns the number of shared elements.
将此配方应用于所有行对的最简单方法是广播:

 first = a[:, None, :] # None creates a new dimension to make space for a second row axis
 second = a[None, :, :] # Same but new dim in first axis
 # observe that axes 0 and 1 in these two array are arranged as for a distance map
 # a binary operation between arrays so layed out will trigger broadcasting, i.e. numpy will compute all possible pairs in the appropriate positions
 full_overlap_map = first == second # has shape nrow x nrow x ncol
 similarity_table = full_overlap_map.sum(axis=-1) # shape nrow x nrow

如果您可以依赖于所有行都是二进制值,“相似列”计数很简单

def count_sim_cols(row0, row1):
    return np.sum(row0*row1)
如果有更广泛的价值范围的可能性,你只需要用比较代替产品

def count_sim_cols(row0, row1):
     return np.sum(row0 == row1)
如果您想要“相似性”的公差,比如说
tol
,一些较小的值,这只是

def count_sim_cols(row0, row1):
    return np.sum(np.abs(row0 - row1) < tol)

您的示例所需的输出在哪里?你说的第三排是什么?我只看到两排。您的代码无效。如何定义“相似”?
sim_counts = {}
for i in xrange(n):
    for j in xrange(i + 1, n):
        sim_counts[(i, j)] = count_sim_cols(X[i], X[j])