Python—多个列表的成对交集，然后将所有重复项相加_Python_List_Intersection_Overlap_Pairwise

Python—多个列表的成对交集，然后将所有重复项相加

python list

Python—多个列表的成对交集，然后将所有重复项相加,python,list,intersection,overlap,pairwise,Python,List,Intersection,Overlap,Pairwise,大家好, 我有上面的列表，我想显示每两个列表的两两相交/重叠（#重复整数），如下面的格式。有人知道如何做到这一点吗？（任何方法都很好，但奇怪的是，迭代/循环是实现这一点的唯一方法吗？）真正的目标对我来说更难，我需要把每两个列表中所有重复的数字加起来。例如，列表a和列表b在数字2和3中重复，因此这里需要5。最终目标如下： a b c d a 3 2 2 2 b 2 3 2 1 c 2 2 3 1 d 2 1 1 3 下面是一个在numpy数组X上执行

大家好,

我有上面的列表，我想显示每两个列表的两两相交/重叠（#重复整数），如下面的格式。有人知道如何做到这一点吗？（任何方法都很好，但奇怪的是，迭代/循环是实现这一点的唯一方法吗？）

真正的目标对我来说更难，我需要把每两个列表中所有重复的数字加起来。例如，列表a和列表b在数字2和3中重复，因此这里需要5。最终目标如下：

   a  b  c  d
a  3  2  2  2
b  2  3  2  1
c  2  2  3  1
d  2  1  1  3

下面是一个在numpy数组X上执行成对操作的实现。这种方法假设操作是对称的，以提高速度

   a  b  c  d
a  6  5  3  4
b  5  9  6  3
c  3  6  7  1
d  4  3  1  10

我们可以定义一个函数来显示两个集合中的重叠量，并将其放入成对函数中

from itertools import combinations_with_replacement
import numpy as np

def pairwise(X, operation):        
    # Initialise precomputed matrix
    precomputed = np.zeros((X.shape[0], X.shape[0]), dtype='int')
    # Initialise iterator over objects in X
    iterator    = combinations_with_replacement(range(X.shape[0]), 2)
    # Perform the operation on each pair
    for i, j in iterator:
        precomputed[i, j] = operation(X[i], X[j])           
    # Make symmetric and return
    return precomputed + precomputed.T - np.diag(np.diag(precomputed))

def overlap(x, y):
    return len(set(x) & set(y))

请注意，此解决方案需要一个numpy数组，因此对于您的示例，我们需要在将数据输入函数之前修改数据

from itertools import combinations_with_replacement
import numpy as np

def pairwise(X, operation):        
    # Initialise precomputed matrix
    precomputed = np.zeros((X.shape[0], X.shape[0]), dtype='int')
    # Initialise iterator over objects in X
    iterator    = combinations_with_replacement(range(X.shape[0]), 2)
    # Perform the operation on each pair
    for i, j in iterator:
        precomputed[i, j] = operation(X[i], X[j])           
    # Make symmetric and return
    return precomputed + precomputed.T - np.diag(np.diag(precomputed))

def overlap(x, y):
    return len(set(x) & set(y))

这就产生了结果

X = np.array([a, b, c, d])
print(pairwise(X, overlap))

您可以将4个列表放在一个dict中，将它们转换为集合，使用

itertools.combinations_with_replacement

生成2个列表和4个列表之间的所有组合，将它们放入一个dict中，该dict由键组合的

frozenset

索引，值为值集相交的长度，并在嵌套循环中打印输出：

[[3 2 2 2]
 [2 3 2 1]
 [2 2 3 1]
 [2 1 1 3]]

这将产生：

from itertools import combinations_with_replacement
d = {'a': [1,2,3], 'b': [2,3,4], 'c': [1,2,4], 'd': [1,3,6]}
s = {frozenset([a[0], b[0]]): len(a[1] & b[1]) for a, b in combinations_with_replacement([(k, set(v)) for k, v in d.items()], 2)}
print(' ', ' '.join(d))
for i in d:
    print(i, end=' ')
    for j in d:
        print(s[frozenset([i, j])], end=' ')
    print()

通过使用方法

np.inad

：

带R：

import numpy as np
R = []
A = np.array([a,b,c,d])
for x in A:
    A = np.array([a,b,c,d])
    ind = np.in1d(A,x).reshape(np.size(A,0),np.size(A,1))
    A[~ind] = 0
    R.append(A.sum(1))

R = np.vstack(R)