Python 查找在数组中只出现一次的项_Python_Arrays_Tuples_Python 2.7

Python 查找在数组中只出现一次的项

python arrays python-2.7

Python 查找在数组中只出现一次的项,python,arrays,tuples,python-2.7,Python,Arrays,Tuples,Python 2.7,我有一个二维数组。在这种情况下，每个行向量都被视为感兴趣的数量。我要做的是返回作为一个数组出现一次的所有行，以及作为第二个数组出现多次的所有行例如，如果数组为： a=[[1,1,1,0], [1,1,1,0], [5,1,6,0], [3,2,1,0], [4,4,1,0], [5,1,6,0]] 我想返回两个数组： nonsingles=[[1,1,1,0], [1,1,1,0], [5,1,6,0], [5,1,6,0]] singles= [[3,2,1,0], [4,4,1,0]]

我有一个二维数组。在这种情况下，每个行向量都被视为感兴趣的数量。我要做的是返回作为一个数组出现一次的所有行，以及作为第二个数组出现多次的所有行

例如，如果数组为：

a=[[1,1,1,0], [1,1,1,0], [5,1,6,0], [3,2,1,0], [4,4,1,0], [5,1,6,0]]

我想返回两个数组：

nonsingles=[[1,1,1,0], [1,1,1,0], [5,1,6,0], [5,1,6,0]]
singles= [[3,2,1,0], [4,4,1,0]]

保持秩序是很重要的。我为此编写的代码如下：

def singles_nonsingles(array):
#returns the elements that occur only once, and the elements
#that occur more than once in the array
singles=[]
nonsingles=[]
arrayhash=map(tuple, array)

for x in arrayhash:
    if (arrayhash.count(x)==1):
        singles.append(x)

    if (arrayhash.count(x)>1):
        nonsingles.append(x)

nonsingles=array(nonsingles)
singles=array(singles)

return {'singles':singles, 'nonsingles':nonsingles}

现在，我很高兴地说这是可行的，但不高兴地说它非常慢，因为我拥有的典型阵列是30000（行）x10个元素/行=300000个元素。有人能给我一些关于如何加速的建议吗？？很抱歉，如果这个问题很简单，我是Python新手。另外，如果对Python 2.7有帮助的话，我正在使用Numpy/Scipy。

我认为您的问题在于您正在对

列表进行测试。这具有O（n）性能
构建一个dict
然后用它来计算每一行的处理方法应该更快
编辑：代码中有一个不必要的enumerate（）
；我把它剥掉了
from collections import defaultdict

def singles_nonsingles(array):
    #returns the elements that occur only once, and the elements
    #that occur more than once in the array
    singles=[]
    nonsingles=[]
    d = defaultdict(int)

    t = [tuple(row) for row in array]

    for row in t:
        d[row] += 1

    for row in t:
        if d[row] == 1:
            singles.append(row)
        else:
            nonsingles.append(row)

    return {'singles':singles, 'nonsingles':nonsingles}

以下是仅返回唯一行的版本：
from collections import defaultdict

def singles_nonsingles(array):
    #returns the elements that occur only once, and the elements
    #that occur more than once in the array
    singles=[]
    nonsingles=[]
    d = defaultdict(int)
    already_seen = set()

    t = [tuple(row) for row in array]

    for row in t:
        d[row] += 1

    for row in t:
        if row in already_seen:
            continue
        if d[row] == 1:
            singles.append(row)
        else:
            nonsingles.append(row)
        already_seen.add(row)

    return {'singles':singles, 'nonsingles':nonsingles}


a=[[1,1,1,0], [1,1,1,0], [5,1,6,0], [3,2,1,0], [4,4,1,0], [5,1,6,0]]

x = singles_nonsingles(a)
print("Array: " + str(a))
print(x)

在Python 2.7或更高版本中，您可以使用collections.Counter
来计算出现的次数：
def unique_items(iterable):
    tuples = map(tuple, iterable)
    counts = collections.Counter(tuples)
    unique = []
    non_unique = []
    for t in tuples:
        if counts[t] == 1:
            unique.append(t)
        else:
            non_unique.append(t)
    return unique, non_unique

第一个只返回不重复的单个/无单个数组的列表，第二个返回重复的列表
def comp (multi):
    from collections import defaultdict

    res = defaultdict(int)

    for vect in multi:
        res[tuple(vect)] += 1

    singles = []
    no_singles = []

    for k in res:
        if res[k] > 1:
            no_singles.append(list(k))
        elif res[k] == 1:
            singles.append(list(k))

    return singles, no_singles

def count_w_repetitions(multi):
    from collections import defaultdict

    res = defaultdict(int)

    for vect in multi:
        res[tuple(vect)] += 1

    singles = []
    no_singles = []

    for k in res:
        if res[k] == 1:
            singles.append(list(k))
        else:
            for i in xrange(res[k]):
                no_singles.append(list(k))


    return singles, no_singles

您是否需要将所有重复项都包含在非单个项中
？返回=[[1,1,1,0]，[5,1,6,0]
还不够吗？在循环数组时，应该将行转换为元组，而不是创建t
。这将需要更少的内存，因为不需要同时将array
和t
存储在内存中。这也将消除第二个循环中混乱的枚举的需要。@AaronDufour，我不确定我是否同意。我们在这个函数中所做的任何事情都不会释放数组，所以唯一的问题是我们将在函数中使用多少内存。我们可以重写它，这样我们就不需要构建t
，这将释放一些内存。（我们将保存对d
中唯一行的元组的引用，而t
包含每个行的每个元组
。通过不构建t
我们将避免保存重复的行
元组。）最后，我不确定您为什么会找到枚举（）
令人困惑，因为这是获取索引和值的常用Python习惯用法。我必须承认，enumerate（）
根本不需要。我所做的只是用它来查找当前行，但是当前行在循环中始终作为行
可用。谢谢Sven！那要快得多。不过，我不完全清楚这是为什么。看起来我们都在实现一个循环，对数据进行类似的测试。为了好奇，你能解释一下为什么速度会快得多吗？？如果不使用循环本身，计数器是如何实现的？这在python 3.2中不起作用。当map返回一个迭代器时，您在计数时使用迭代器。您可以将对map（）
的调用封装在对list（）
的调用中，或者使用列表理解：tuples=[tuple（x）for x in iterable]
@Mike，正如我在回答中解释的，您的问题是您正在对列表进行测试。它的时间复杂度为O（n），其中n是列表中的项目数，因此列表越大，速度越慢。一个collections.Counter（）
基本上是一个dict，任何一个都可以在O（1）时间内查找内容（无论数据结构中有多少项，查找内容所需的时间几乎相同）。
from itertools import compress,imap

def has_all_unique(a):
   return len(a) == len(frozenset(a)) 

uniq = map( has_all_unique,a)
singles = list(compress(a,uniq))
notuniq = imap(lambda x: not x,uniq)
nonsingles = list(compress(a,notuniq))