Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在对列表中查找唯一的对_Python_List - Fatal编程技术网

Python 在对列表中查找唯一的对

Python 在对列表中查找唯一的对,python,list,Python,List,我有一个(大)整数列表,例如 a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ] 大多数对将出现两次,其中整数的顺序无关紧要(即,[1,2]相当于[2,1])。现在我想找到只出现一次的对,并得到一个布尔列表来指示它。对于上面的例子 b = [False, False, False, True, False] 由于a通常很大,我希望避免显式循环。可能会建议映射到frozensets,但我不确定这是否过火了

我有一个(大)整数列表,例如

a = [
    [1, 2],
    [3, 6],
    [2, 1],
    [3, 5],
    [3, 6]
    ]
大多数对将出现两次,其中整数的顺序无关紧要(即,
[1,2]
相当于
[2,1]
)。现在我想找到只出现一次的对,并得到一个布尔列表来指示它。对于上面的例子

b = [False, False, False, True, False]
由于
a
通常很大,我希望避免显式循环。可能会建议映射到
frozenset
s,但我不确定这是否过火了

ctr = Counter(frozenset(x) for x in a)
b = [ctr[frozenset(x)] == 1 for x in a]

我们可以使用计数器获取每个列表的计数(将列表转到frozenset以忽略顺序),然后检查每个列表是否只出现一次。

您可以从头到尾扫描列表,同时将遇到的对映射到它们的第一个位置。无论何时处理一对,都要检查是否以前遇到过它。如果是这种情况,那么b中第一次遭遇的索引和当前遭遇的索引都必须设置为False。否则,我们只需将当前索引添加到遇到的对的映射中,而不更改关于b的任何内容。b最初将启动所有
True
。为了保持wrt
[1,2]
[2,1]
的等价性,我首先简单地对这对进行排序,以获得一个稳定的表示。代码如下所示:

def proc(a):
  b = [True] * len(a) # Better way to allocate this
  filter = {}
  idx = 0
  for p in a:
    m = min(p)
    M = max(p)
    pp = (m, M)
    if pp in filter:
      # We've found the element once previously
      # Need to mark both it and the current value as "False"
      # If we encounter pp multiple times, we'll set the initial
      # value to False multiple times, but that's not an issue
      b[filter[pp]] = False
      b[idx] = False
    else:
      # This is the first time we encounter pp, so we just add it
      # to the filter for possible later encounters, but don't affect
      # b at all.
      filter[pp] = idx
    idx++
  return b
时间复杂度是
O(len(a))
,这很好,但是空间复杂度也是
O(len(a))
(对于
过滤器而言),所以这可能不是太大。根据您的灵活性,您可以使用近似过滤器,例如Bloom过滤器

#-*- coding : utf-8 -*-
a = [[1, 2], [3, 6], [2, 1], [3, 5], [3, 6]]
result = filter(lambda el:(a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1),a)
bool_res = [ (a.count([el[0],el[1]]) + a.count([el[1],el[0]]) == 1) for el in a]
print result
print bool_res
威奇给出:

[[3, 5]]
[False, False, False, True, False]

这里有一个使用NumPy的解决方案,比建议的
frozenset
解决方案快10倍:

a=numpy.array(a)
a、 排序(轴=1)
b=numpy.ascontiguousarray(a).视图(
numpy.dtype((numpy.void,a.dtype.itemsize*a.shape[1]))
)
_,inv,ct=numpy.unique(b,return\u inverse=True,return\u counts=True)
打印(ct[inv]==1)
  • 排序速度很快,可以确保原始数组中的边
    [i,j]
    [j,i]
    彼此识别。比
    frozenset
    s或
    tuple
    s快得多

  • 罗的灵感来源于

不同阵列大小的速度比较:

这个情节是用

从集合导入计数器
进口numpy
导入性能图
财政司司长(a):
ctr=计数器(a中x的冻结集(x)
b=[ctr[frozenset(x)]==1表示a中的x]
返回b
def与_numpy(a):
a=numpy.array(a)
a、 排序(轴=1)
b=numpy.ascontiguousarray(a).视图(
numpy.dtype((numpy.void,a.dtype.itemsize*a.shape[1]))
)
_,inv,ct=numpy.unique(b,return\u inverse=True,return\u counts=True)
res=ct[inv]==1
返回res
perfplot.save(
“out.png”,
setup=lambda n:numpy.random.randint(0,10,size=(n,2)),
内核=[fs,带_numpy],
标签=[“frozenset”,“numpy”],
n_范围=[2**k表示范围(15)中的k],
xlabel=“len(a)”,
)
使用字典查找O(n)解决方案

a = [ [1, 2], [3, 6], [2, 1], [3, 5], [3, 6] ]

dict = {}
boolList = []

# Iterate through a
for i in range (len(a)):

    # Assume that this element is not a duplicate
    # This 'True' is added to the corresponding index i of boolList
    boolList += [True]

    # Set elem to the current pair in the list
    elem = a[i]

    # If elem is in ascending order, it will be entered into the map as is
    if elem[0] <= elem[1]:
        key = repr(elem)
    # If not, change it into ascending order so keys can easily be compared
    else:
        key = repr( [ elem[1] ] + [ elem[0] ])

    # If this pair has not yet been seen, add it as a key to the dictionary
    # with the value a list containing its index in a.
    if key not in dict:
        dict[key] = [i]
    # If this pair is a duploicate, add the new index to the dict. The value
    # of the key will contain a list containing the indeces of that pair in a.
    else:
        # Change the value to contain the new index
        dict[key] += [i]

        # Change boolList for this to True for this index
        boolList[i] = False

        # If this is the first duplicate for the pair, make the first
        # occurrence of the pair into a duplicate as well.
        if len(dict[key]) <= 2:
            boolList[ dict[key][0] ] = False

print a
print boolList
a=[[1,2]、[3,6]、[2,1]、[3,5]、[3,6]]
dict={}
布利斯特=[]
#遍历
对于范围内的i(len(a)):
#假设此元素不是重复的
#此“True”被添加到boolList的相应索引i中
boolList+=[真]
#将elem设置为列表中的当前对
elem=a[i]
#如果元素按升序排列,它将按原样输入地图

如果元素[0]非常干净,它将返回
b=[True,False,False,True,False]
,这与示例输出不同。除了
tuple(sorted(x))
之外,您还可以使用
frozenset(x)
frozenset
比排序的tuple执行得更好。感谢@tobias_k的建议,修正。大约快一倍:
ctr=Counter(map(frozenset,a));b=numpy.array(ctr.values())==1
。您可能需要调整您的答案。如果
a=[[1,1]]
bool\u res
应该是
[True]
,但您的方法会产生
[False]