在Python中减去两个列表_Python_List_Collections

在Python中减去两个列表

python list collections

在Python中减去两个列表,python,list,collections,Python,List,Collections,在Python中，如何减去两个非唯一、无序的列表？假设我们有a=[0,1,2,1,0]和b=[0,1,1]我想做一些像c=a-b的事情，让c是[2,0]或[0,2]顺序对我来说并不重要。如果a不包含b中的所有元素，则会引发异常注意，这与集合不同！我对找出a和b中元素集的差异不感兴趣，我感兴趣的是a和b中元素的实际集合之间的差异我可以使用for循环来实现这一点，在a中查找b的第一个元素，然后从b和a中删除该元素，等等。但我对此不感兴趣，按On^2的时间顺序进行操作将非常低效，而在log n t

在Python中，如何减去两个非唯一、无序的列表？假设我们有a=[0,1,2,1,0]和b=[0,1,1]我想做一些像c=a-b的事情，让c是[2,0]或[0,2]顺序对我来说并不重要。如果a不包含b中的所有元素，则会引发异常

注意，这与集合不同！我对找出a和b中元素集的差异不感兴趣，我感兴趣的是a和b中元素的实际集合之间的差异

我可以使用for循环来实现这一点，在a中查找b的第一个元素，然后从b和a中删除该元素，等等。但我对此不感兴趣，按On^2的时间顺序进行操作将非常低效，而在log n time中执行此操作应该没有问题。

您可以尝试以下操作：

class mylist(list):

    def __sub__(self, b):
        result = self[:]
        b = b[:]
        while b:
            try:
                result.remove(b.pop())
            except ValueError:
                raise Exception("Not all elements found during subtraction")
        return result


a = mylist([0, 1, 2, 1, 0] )
b = mylist([0, 1, 1])

>>> a - b
[2, 0]

from collections import Counter
a = Counter([0, 1, 2, 1, 0])
b = Counter([0, 1, 1])
c = a - b  # ignores items in b missing in a

print(list(c.elements()))  # -> [0, 2]

您必须定义[1,2,3]-[5,6]应该输出什么，我想您需要[1,2,3]，这就是我忽略ValueError的原因

编辑：

现在我看到，如果a不包含所有元素，您需要一个异常，添加它而不是传递ValueError。

我不确定for循环的反对理由是什么：Python中没有multiset，因此无法使用内置容器来帮助您

在我看来，如果可能的话，一行中的任何内容都可能非常复杂，难以理解。追求可读性和亲昵。Python不是C:

我知道这不是您想要的，但它简单明了：

for x in b:
  a.remove(x)

或者如果b的成员可能不在a中，则使用：

for x in b:
  if x in a:
    a.remove(x)

要使用列表理解：

[i for i in a if not i in b or b.remove(i)]

我会成功的。但在这个过程中，它会改变b。但我同意jkp和Dyno Fu的观点，使用for循环会更好

也许有人可以创造一个更好的例子，使用列表理解，但仍然是亲吻

为了证明jkp的观点“一行中的任何内容都可能非常复杂，难以理解”，我创建了一个单行程序。请不要修改我下来，因为我知道这不是一个解决方案，你实际上应该使用。这只是为了演示的目的

其思想是将a中的值逐个相加，只要该值相加的总次数小于该值在a中的总次数减去在b中的总次数：

[ value for counter,value in enumerate(a) if a.count(value) >= b.count(value) + a[counter:].count(value) ]

恐怖！但也许有人可以改进它？它甚至没有bug吗

编辑：看到Devin Jeanpierre关于使用字典数据结构的评论，我想到了以下一行：

sum([ [value]*count for value,count in {value:a.count(value)-b.count(value) for value in set(a)}.items() ], [])

更好，但仍然不可读。

Python2.7和3.2添加了类，这是一个字典子类，将元素映射到元素的出现次数。这可以用作多重集。您可以这样做：

class mylist(list):

    def __sub__(self, b):
        result = self[:]
        b = b[:]
        while b:
            try:
                result.remove(b.pop())
            except ValueError:
                raise Exception("Not all elements found during subtraction")
        return result


a = mylist([0, 1, 2, 1, 0] )
b = mylist([0, 1, 1])

>>> a - b
[2, 0]

from collections import Counter
a = Counter([0, 1, 2, 1, 0])
b = Counter([0, 1, 1])
c = a - b  # ignores items in b missing in a

print(list(c.elements()))  # -> [0, 2]

同样，如果要检查b中的每个元素是否都在a中，请执行以下操作：

但是，由于您一直在使用2.5，您可以尝试导入它，并在失败时定义自己的版本。这样，如果有最新的版本，你一定会得到它，如果没有，你会退回到工作版本。如果将来将if转换为C实现，您还将从速度改进中获益

try:
   from collections import Counter
except ImportError:
    class Counter(dict):
       ...

您可以找到当前的Python源代码。

我试图找到一个更优雅的解决方案，但我能做的最好的事情基本上与Dyno Fu所说的相同：

from copy import copy

def subtract_lists(a, b):
    """
    >>> a = [0, 1, 2, 1, 0]
    >>> b = [0, 1, 1]
    >>> subtract_lists(a, b)
    [2, 0]

    >>> import random
    >>> size = 10000
    >>> a = [random.randrange(100) for _ in range(size)]
    >>> b = [random.randrange(100) for _ in range(size)]
    >>> c = subtract_lists(a, b)
    >>> assert all((x in a) for x in c)
    """
    a = copy(a)
    for x in b:
        if x in a:
            a.remove(x)
    return a

Python2.7+和3.0有一个称为multiset的函数。Python 2.5的文档链接为：

from operator import itemgetter
from heapq import nlargest
from itertools import repeat, ifilter

class Counter(dict):
    '''Dict subclass for counting hashable objects.  Sometimes called a bag
    or multiset.  Elements are stored as dictionary keys and their counts
    are stored as dictionary values.

    >>> Counter('zyzygy')
    Counter({'y': 3, 'z': 2, 'g': 1})

    '''

    def __init__(self, iterable=None, **kwds):
        '''Create a new, empty Counter object.  And if given, count elements
        from an input iterable.  Or, initialize the count from another mapping
        of elements to their counts.

        >>> c = Counter()                           # a new, empty counter
        >>> c = Counter('gallahad')                 # a new counter from an iterable
        >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
        >>> c = Counter(a=4, b=2)                   # a new counter from keyword args

        '''        
        self.update(iterable, **kwds)

    def __missing__(self, key):
        return 0

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.

        >>> Counter('abracadabra').most_common(3)
        [('a', 5), ('r', 2), ('b', 2)]

        '''        
        if n is None:
            return sorted(self.iteritems(), key=itemgetter(1), reverse=True)
        return nlargest(n, self.iteritems(), key=itemgetter(1))

    def elements(self):
        '''Iterator over elements repeating each as many times as its count.

        >>> c = Counter('ABCABC')
        >>> sorted(c.elements())
        ['A', 'A', 'B', 'B', 'C', 'C']

        If an element's count has been set to zero or is a negative number,
        elements() will ignore it.

        '''
        for elem, count in self.iteritems():
            for _ in repeat(None, count):
                yield elem

    # Override dict methods where the meaning changes for Counter objects.

    @classmethod
    def fromkeys(cls, iterable, v=None):
        raise NotImplementedError(
            'Counter.fromkeys() is undefined.  Use Counter(iterable) instead.')

    def update(self, iterable=None, **kwds):
        '''Like dict.update() but add counts instead of replacing them.

        Source can be an iterable, a dictionary, or another Counter instance.

        >>> c = Counter('which')
        >>> c.update('witch')           # add elements from another iterable
        >>> d = Counter('watch')
        >>> c.update(d)                 # add elements from another counter
        >>> c['h']                      # four 'h' in which, witch, and watch
        4

        '''        
        if iterable is not None:
            if hasattr(iterable, 'iteritems'):
                if self:
                    self_get = self.get
                    for elem, count in iterable.iteritems():
                        self[elem] = self_get(elem, 0) + count
                else:
                    dict.update(self, iterable) # fast path when counter is empty
            else:
                self_get = self.get
                for elem in iterable:
                    self[elem] = self_get(elem, 0) + 1
        if kwds:
            self.update(kwds)

    def copy(self):
        'Like dict.copy() but returns a Counter instance instead of a dict.'
        return Counter(self)

    def __delitem__(self, elem):
        'Like dict.__delitem__() but does not raise KeyError for missing values.'
        if elem in self:
            dict.__delitem__(self, elem)

    def __repr__(self):
        if not self:
            return '%s()' % self.__class__.__name__
        items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
        return '%s({%s})' % (self.__class__.__name__, items)

    # Multiset-style mathematical operations discussed in:
    #       Knuth TAOCP Volume II section 4.6.3 exercise 19
    #       and at http://en.wikipedia.org/wiki/Multiset
    #
    # Outputs guaranteed to only include positive counts.
    #
    # To strip negative and zero counts, add-in an empty counter:
    #       c += Counter()

    def __add__(self, other):
        '''Add counts from two counters.

        >>> Counter('abbb') + Counter('bcc')
        Counter({'b': 4, 'c': 2, 'a': 1})


        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] + other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __sub__(self, other):
        ''' Subtract count, but keep only results with positive counts.

        >>> Counter('abbbc') - Counter('bccd')
        Counter({'b': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        result = Counter()
        for elem in set(self) | set(other):
            newcount = self[elem] - other[elem]
            if newcount > 0:
                result[elem] = newcount
        return result

    def __or__(self, other):
        '''Union is the maximum of value in either of the input counters.

        >>> Counter('abbb') | Counter('bcc')
        Counter({'b': 3, 'c': 2, 'a': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _max = max
        result = Counter()
        for elem in set(self) | set(other):
            newcount = _max(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result

    def __and__(self, other):
        ''' Intersection is the minimum of corresponding counts.

        >>> Counter('abbb') & Counter('bcc')
        Counter({'b': 1})

        '''
        if not isinstance(other, Counter):
            return NotImplemented
        _min = min
        result = Counter()
        if len(self) < len(other):
            self, other = other, self
        for elem in ifilter(self.__contains__, other):
            newcount = _min(self[elem], other[elem])
            if newcount > 0:
                result[elem] = newcount
        return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

我会用一种更简单的方式：

a_b = [e for e in a if not e in b ]

…正如威奇所写，这是错误的-只有当列表中的项目是唯一的时，它才起作用。如果是，最好使用

a_b = list(set(a) - set(b))

使a和b保持不变。是a-b的唯一集合。完成。

这里有一个相对较长但高效且可读的解决方案。正在播放

def list_diff(list1, list2):
    counts = {}
    for x in list1:
        try:
            counts[x] += 1
        except:
            counts[x] = 1
    for x in list2:
        try:
            counts[x] -= 1
            if counts[x] < 0:
                raise ValueError('All elements of list2 not in list2')
        except:
            raise ValueError('All elements of list2 not in list1') 
    result = []
    for k, v in counts.iteritems():
        result += v*[k] 
    return result

a = [0, 1, 1, 2, 0]
b = [0, 1, 1]
%timeit list_diff(a, b)
%timeit list_diff(1000*a, 1000*b)
%timeit list_diff(1000000*a, 1000000*b)
100000 loops, best of 3: 4.8 µs per loop
1000 loops, best of 3: 1.18 ms per loop
1 loops, best of 3: 1.21 s per loop

可以使用映射构造来执行此操作。它看起来很好，但是要注意，映射行本身将返回一个非的列表

这是一个多集的实现吗，那么…？@Devin，是的，多集本质上就是我想要的。注意，任何直接从无序列表中操作的东西都将是n^2 lena*lenb。为了有效地完成这项工作，您需要一个中间数据结构，例如，记录每个值的发生次数，或者首先对列表进行排序。如果你只处理小列表，那没关系。你为什么要对列表进行子类化？OP声明，如果a不包含b中的所有元素，这应该引发异常，所以ValueError不应该被沉默。@Devin:因为这个问题的标题是用Python减去两个列表？除了忽略异常之外，我实际上想要一个看起来很好的异常，尽管我想知道它的性能。我怀疑这是在开玩笑。子类化列表本身是一种很好的方法，可以保持内容的可读性，但不会使代码太混乱，甚至没有想到这一点。如果您更改了数据结构，它可能会更快—为什么要使用列表而不是元素计数的dict映射？至于子类化列表，它并没有特别消除混乱。真的，suba、b和a-b有什么不同？这个

困难在于你必须在任何地方使用MyList而不是List，这可能会让你很难找到它。否则，它通常只是糟糕的风格。在更复杂的情况下，例如重写uuu getitem uuuuu，行为是不可靠的，因为代码是在C中共享的，而不是在Python中共享的，因此需要做更多的工作。呵呵：是的，如果它真的必须看起来像一个列表理解：[a.removex for x In b]：如果在循环之前添加C=lista，然后从C中删除项，pIt总共将有三行。在我看来，这可能是最清晰易读的了。@jkp实际上，列表理解返回[None，None，None]，但对于大型列表来说，这是非常低效的，不是吗？@Kimvais:是的，但a将是[2，0]。为了不破坏b，可以添加c=listb并用b代替c，但仍然没有Dyno Fu的答案那么好。不幸的是，我坚持接受2.5作为答案，因为它是第一个提到集合的。Counter是Python对multiset的实现，虽然在我看来这是一个丑陋的答案…我只是想提一下，为了使代码正常工作，我对这个答案进行了彻底的修改，至少出现了两个错误和一个更微妙的错误-键而不是元素，并且自从10年过去了，Python2现在是EOL以来对它进行了更新。如果你想知道发生了什么变化，请检查。我想知道它的效率有多高，这当然取决于计数器类中发生的字典索引的巨大复杂性……如果b包含问题中所述的不在aAs中的元素，你的解决方案不会引发异常，这与集合不同！要求的是不具有唯一的a-b集合，而是使a-bA集合对象是不同的可散列对象的无序集合。这里的关键是不同的。别担心，我也突然想到了这一点，但你必须记住，一个集合只有不同的元素。根据Natalie的评论，记住这不会保持列表顺序。我现在明白你的意思了。事实上，我目前所写的答案并不是这个问题的正确答案。我认为dict insert/lookup是O1。我想这里就是这么说的。请注意这些运行的时间是如何线性增长的：%timeit list_diff1000*a，1000*b 1000循环，最佳3:1.26毫秒/循环%timeit list_diff10000*a，10000*b 100循环，最佳3:12.3毫秒/循环%timeit list_diff100000*a，100000*b 10循环，最佳3:125毫秒/循环%timeit list_diff1000000*a，1000000*b 1循环，最佳3:1.18秒每循环为难以阅读的格式道歉

c = [i for i in b if i not in a]

def list_diff(list1, list2):
    counts = {}
    for x in list1:
        try:
            counts[x] += 1
        except:
            counts[x] = 1
    for x in list2:
        try:
            counts[x] -= 1
            if counts[x] < 0:
                raise ValueError('All elements of list2 not in list2')
        except:
            raise ValueError('All elements of list2 not in list1') 
    result = []
    for k, v in counts.iteritems():
        result += v*[k] 
    return result

a = [0, 1, 1, 2, 0]
b = [0, 1, 1]
%timeit list_diff(a, b)
%timeit list_diff(1000*a, 1000*b)
%timeit list_diff(1000000*a, 1000000*b)
100000 loops, best of 3: 4.8 µs per loop
1000 loops, best of 3: 1.18 ms per loop
1 loops, best of 3: 1.21 s per loop

a = [1, 2, 3]
b = [2, 3]

map(lambda x:a.remove(x), b)
a