python等价于filter（）获取两个输出列表（即列表的分区）_Python_Filter_Data Partitioning

python等价于filter（）获取两个输出列表（即列表的分区）

python filter

python等价于filter（）获取两个输出列表（即列表的分区）,python,filter,data-partitioning,Python,Filter,Data Partitioning,假设我有一个列表和一个过滤函数。使用类似 >>> filter(lambda x: x > 10, [1,4,12,7,42]) [12, 42] 我可以得到符合条件的元素。我是否可以使用一个函数来输出两个列表，一个是匹配的元素，另一个是剩余的元素？我可以调用filter（）函数两次，但这有点难看：）编辑：元素的顺序应该保持不变，我可能会多次使用相同的元素。尝试以下方法： def partition(pred, iterable): trues = []

假设我有一个列表和一个过滤函数。使用类似

>>> filter(lambda x: x > 10, [1,4,12,7,42])
[12, 42]

我可以得到符合条件的元素。我是否可以使用一个函数来输出两个列表，一个是匹配的元素，另一个是剩余的元素？我可以调用

filter（）

函数两次，但这有点难看：）

编辑：元素的顺序应该保持不变，我可能会多次使用相同的元素。

尝试以下方法：

def partition(pred, iterable):
    trues = []
    falses = []
    for item in iterable:
        if pred(item):
            trues.append(item)
        else:
            falses.append(item)
    return trues, falses

用法：

>>> trues, falses = partition(lambda x: x > 10, [1,4,12,7,42])
>>> trues
[12, 42]
>>> falses
[1, 4, 7]

此外，还提出了以下实施建议：

配方来自Python3.x文档。在Python2.x中，

filterfalse

被称为

ifilterfalse

如果列表中没有重复元素，则可以使用set:

>>> a = [1,4,12,7,42]
>>> b = filter(lambda x: x > 10, [1,4,12,7,42])
>>> no_b = set(a) - set(b)
set([1, 4, 7])

或者你可以通过一个容易理解的列表：

>>> no_b = [i for i in a if i not in b]

注意：它不是一个函数，只是知道fitler（）的第一个结果，你就可以推断出与你的过滤条件不太相关的元素

from itertools import ifilterfalse

def filter2(predicate, iterable):
    return filter(predicate, iterable), list(ifilterfalse(predicate, iterable))

这是第二次编辑，但我认为这很重要：

 def partition(l, p):
     return reduce(lambda x, y: x[not p(y)].append(y) or x, l, ([], []))

第二个和第三个与迭代的上一个一样快，但代码更少。

我认为groupby在这里可能更相关：

例如，将列表拆分为奇数和偶数（也可以是任意数量的组）：

已经有很多好答案了。我喜欢用这个：

def partition( pred, iterable ):
    def _dispatch( ret, v ):
        if ( pred( v ) ):
            ret[0].append( v )
        else:
            ret[1].append( v )
        return ret
    return reduce( _dispatch, iterable, ( [], [] ) )

if ( __name__ == '__main__' ):
    import random
    seq = range( 20 )
    random.shuffle( seq )
    print( seq )
    print( partition( lambda v : v > 10, seq ) )

我正好有这个要求。我不喜欢itertools配方，因为它涉及两个单独的数据传递。以下是我的实现：

def filter_twoway(test, data):
    "Like filter(), but returns the passes AND the fails as two separate lists"
    collected = {True: [], False: []}
    for datum in data:
        collected[test(datum)].append(datum)
    return (collected[True], collected[False])

每个人似乎都认为他们的解决方案是最好的，所以我决定用timeit来测试所有的解决方案。我使用“def is_odd（x）：return x&1”作为谓词函数，使用“xrange（1000）”作为iterable。以下是我的Python版本：

Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

以下是我的测试结果：

Mark Byers
1000 loops, best of 3: 325 usec per loop

cldy
1000 loops, best of 3: 1.96 msec per loop

Dan S
1000 loops, best of 3: 412 usec per loop

TTimo
1000 loops, best of 3: 503 usec per loop

这些都是可比的。现在，让我们尝试使用Python文档中给出的示例

import itertools

def partition(pred, iterable,
              # Optimized by replacing global lookups with local variables
              # defined as default values.
              filter=itertools.ifilter,
              filterfalse=itertools.ifilterfalse,
              tee=itertools.tee):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

这似乎要快一点

100000 loops, best of 3: 2.58 usec per loop

itertools示例代码至少比所有参与者高出100倍！寓意是，不要一直重复发明轮子。

你可以看看解决方案：

def partition(predicate, values):
    """
    Splits the values into two sets, based on the return value of the function
    (True/False). e.g.:

        >>> partition(lambda x: x > 3, range(5))
        [0, 1, 2, 3], [4]
    """
    results = ([], [])
    for item in values:
        results[predicate(item)].append(item)
    return results

在我看来，这是这里提出的最优雅的解决方案

这部分没有文档，只有源代码可以在TL上找到；博士 [1]通过

是最简单也是最简单的最快的

确定不同方法的基准所建议的不同方法可以分类大致分为三类

通过

lis.append

直接操作列表，返回2元组名单

lis.append

由函数方法介导，返回2元组名单

使用

itertools

fine中给出的规范配方文档，返回一个2元组，粗略地说，是生成器

下面是这三种技术的普通实现，首先功能方法，然后是

itertools

，最后是两种不同的方法直接列表操作的实现，替代方法是使用

False

为零，

True

是一个技巧

请注意，这是Python3-因此

reduce

来自

functools

- 那个OP请求一个元组，比如

（肯定的，否定的）

，但是我的所有实现都返回

（否定、肯定）

$ ipython
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import functools
   ...: 
   ...: def partition_fu(p, l, r=functools.reduce):
   ...:     return r(lambda x, y: x[p(y)].append(y) or x, l, ([], []))
   ...: 

In [2]: import itertools
   ...: 
   ...: def partition_it(pred, iterable,
   ...:               filterfalse=itertools.filterfalse,
   ...:               tee=itertools.tee):
   ...:     t1, t2 = tee(iterable)
   ...:     return filterfalse(pred, t1), filter(pred, t2)
   ...: 

In [3]: def partition_li(p, l):
   ...:     a, b = [], []
   ...:     for n in l:
   ...:         if p(n):
   ...:             b.append(n)
   ...:         else:
   ...:             a.append(n)
   ...:     return a, b
   ...: 

In [4]: def partition_li_alt(p, l):
   ...:     x = [], []
   ...:     for n in l: x[p(n)].append(n)
   ...:     return x
   ...:

我们需要一个谓词来应用于我们的列表和列表（同样，松散地）（说话）操作的对象

In [5]: p = lambda n:n%2

In [6]: five, ten = range(50000), range(100000)

为了克服测试

itertools

方法中的问题，需要于报道 2013年10月31日6:17

胡说八道。您已经计算了构建

filterfalse

和

filter

中的生成器，但您尚未迭代通过输入或调用

pred

一次！优势

itertools

诀窍是它不会出现任何列表或外观在输入中比必要的位置更靠前。它两次调用

pred

通常需要的时间几乎是拜尔斯等人的两倍

我想到了一个空循环，它只是实例化了所有的夫妇由不同分区返回的两个ITerable中的元素的功能

首先，我们使用两个固定列表来了解隐含重载（使用非常方便的IPython魔术

%timeit

）

接下来，我们使用不同的实现，一个接一个

In [8]: %timeit for e, o in zip(*partition_fu(p, ten)): pass
53.9 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit for e, o in zip(*partition_it(p, ten)): pass
44.5 ms ± 3.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: %timeit for e, o in zip(*partition_li(p, ten)): pass
36.3 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [11]: %timeit for e, o in zip(*partition_li_alt(p, ten)): pass
37.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [12]:

评论最简单的方法也是最快的方法

使用

x[p（n）]

技巧是没有用的，嗯，因为你每走一步必须索引一个数据结构，给你一个轻微的惩罚-它是不过，如果你想说服一位衰退的幸存者，我还是很高兴知道 pythonizing的文化

功能性方法，在操作上等同于备选方案

append

实现速度慢约50%，可能是由于事实上，我们有一个额外的（w/r到谓词求值）函数调用每个列表元素

itertools

方法具有以下（常规）优点：❶ 不对可能较大的列表进行实例化并❷ 输入列表不可用如果您打破消费者循环，则完全处理，但当我们使用它会比较慢，因为需要在两个对象上应用谓词

tee的末端

在一边我爱上了

object.mutate（）或object

这个成语暴露于展示用功能性方法解决问题-恐怕迟早，我要滥用它

脚注

[1] 今天，2017年9月14日，大多数人接受并投票——但我当然对我的答案抱有最大的希望

用于附加到目标列表的简明代码

    def partition(cond,inputList):
        a,b= [],[]
        for item in inputList:
            target = a if cond(item) else b
            target.append(item)
        return a, b


    >>> a, b= partition(lambda x: x > 10,[1,4,12,7,42])
    >>> a
    [12, 42]
    >>> b
    [1, 4, 7]

现有的答案要么将iterable划分为两个列表，要么将其划分为两个生成器，效率低下。下面是一个实现，它将一个iterable有效地划分为两个生成器，即最多调用一次谓词函数

def partition(pred, iterable):
    trues = []
    falses = []
    for item in iterable:
        if pred(item):
            trues.append(item)
        else:
            falses.append(item)
    return trues, falses

$ ipython
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import functools
   ...: 
   ...: def partition_fu(p, l, r=functools.reduce):
   ...:     return r(lambda x, y: x[p(y)].append(y) or x, l, ([], []))
   ...: 

In [2]: import itertools
   ...: 
   ...: def partition_it(pred, iterable,
   ...:               filterfalse=itertools.filterfalse,
   ...:               tee=itertools.tee):
   ...:     t1, t2 = tee(iterable)
   ...:     return filterfalse(pred, t1), filter(pred, t2)
   ...: 

In [3]: def partition_li(p, l):
   ...:     a, b = [], []
   ...:     for n in l:
   ...:         if p(n):
   ...:             b.append(n)
   ...:         else:
   ...:             a.append(n)
   ...:     return a, b
   ...: 

In [4]: def partition_li_alt(p, l):
   ...:     x = [], []
   ...:     for n in l: x[p(n)].append(n)
   ...:     return x
   ...:

In [5]: p = lambda n:n%2

In [6]: five, ten = range(50000), range(100000)

In [7]: %timeit for e, o in zip(five, five): pass
4.21 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit for e, o in zip(*partition_fu(p, ten)): pass
53.9 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit for e, o in zip(*partition_it(p, ten)): pass
44.5 ms ± 3.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: %timeit for e, o in zip(*partition_li(p, ten)): pass
36.3 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [11]: %timeit for e, o in zip(*partition_li_alt(p, ten)): pass
37.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [12]:

    def partition(cond,inputList):
        a,b= [],[]
        for item in inputList:
            target = a if cond(item) else b
            target.append(item)
        return a, b


    >>> a, b= partition(lambda x: x > 10,[1,4,12,7,42])
    >>> a
    [12, 42]
    >>> b
    [1, 4, 7]

import collections
input_list = ['a','b','ana','beta','gamma']
filter_key = lambda x: len(x) == 1


## sorting code
cc = collections.defaultdict(list)
for item in input_list: cc[ filter_key(item) ].append( item )

print( cc )

This approach will also work for any number of categories generated by the `filter_key` function.