python中的连接列表
在python中,是将列表列表连接到单个列表(或迭代器)的简短语法吗 例如,我有一个如下的列表,我想迭代a、b和cpython中的连接列表,python,Python,在python中,是将列表列表连接到单个列表(或迭代器)的简短语法吗 例如,我有一个如下的列表,我想迭代a、b和c x = [["a","b"], ["c"]] 我能想出的最好办法如下 result = [] [ result.extend(el) for el in x] for el in result: print el 这就是所谓的扁平化,有很多实现: 尽管它只适用于1级深嵌套,但这又如何呢 >>> x = [["a","b"], ["c"]] &
x = [["a","b"], ["c"]]
我能想出的最好办法如下
result = []
[ result.extend(el) for el in x]
for el in result:
print el
这就是所谓的扁平化,有很多实现:
>>> x = [["a","b"], ["c"]]
>>> for el in sum(x, []):
... print el
...
a
b
c
从这些链接来看,最完整的快速优雅etc实现如下:
def flatten(l, ltypes=(list, tuple)):
ltype = type(l)
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
i += 1
return ltype(l)
def扁平化(l,ltypes=(列表,元组)):
l类型=类型(l)
l=列表(l)
i=0
而我
您所描述的是将列表展平,有了这一新知识,您可以在Google上找到许多解决方案(没有内置的展平方法)。以下是其中之一,来自:
遗憾的是,Python没有一种简单的方法来平展列表。试试这个:
def flatten(some_list):
for element in some_list:
if type(element) in (tuple, list):
for item in flatten(element):
yield item
else:
yield element
它将递归地展平列表;然后你就可以做了
result = []
[ result.extend(el) for el in x]
for el in flatten(result):
print el
始终存在减少(不推荐使用functools):
不幸的是,列表连接的加号运算符不能用作函数——或者幸运的是,如果您希望lambdas更难看,以提高可见性。如果您只深入一个层次,嵌套理解也可以:
>>> x = [["a","b"], ["c"]]
>>> [inner
... for outer in x
... for inner in outer]
['a', 'b', 'c']
在一行中,它变成:
>>> [j for i in x for j in i]
['a', 'b', 'c']
这对无限嵌套的元素递归工作:
def iterFlatten(root):
if isinstance(root, (list, tuple)):
for element in root:
for e in iterFlatten(element):
yield e
else:
yield root
结果:
>>> b = [["a", ("b", "c")], "d"]
>>> list(iterFlatten(b))
['a', 'b', 'c', 'd']
>>>b=[“a”,“b”,“c”)],“d”]
>>>清单(iterFlatten(b))
['a','b','c','d']
或递归操作:
def flatten(input):
ret = []
if not isinstance(input, (list, tuple)):
return [input]
for i in input:
if isinstance(i, (list, tuple)):
ret.extend(flatten(i))
else:
ret.append(i)
return ret
最矮的 聚会迟到了,但是 我是python新手,来自lisp背景。这就是我想到的(查看lulz的变量名称): 似乎有效。测试:
flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))
返回:
[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]
对于一级平坦,如果你关心速度,在我尝试过的所有条件下,这比前面的任何答案都要快。(也就是说,如果您需要结果作为列表。如果您只需要在运行中对其进行迭代,则链示例可能更好。)它通过预分配最终大小的列表并按切片复制零件(这是比任何迭代器方法都低级别的块复制)来工作:
带注释的已排序时间列表:
[(0.5391559600830078, 'flatten4b'), # join() above.
(0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a)
(0.5419249534606934, 'flatten4a'), # Similar, using zip()
(0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a))
(0.7472689151763916, 'flatten1'), # list(itertools.chain(*a))
(1.5468521118164062, 'flatten3'), # [i for j in a for i in j]
(26.696547985076904, 'flatten2')] # sum(a, [])
如果需要列表而不是生成器,请使用
list():
当我必须创建一个包含数组元素及其计数的字典时,我也遇到了类似的问题。答案是相关的,因为我将列表展平,获取所需的元素,然后进行分组和计数。我使用Python的map函数生成元素的元组以及数组上的count和groupby。请注意,groupby将数组元素本身作为keyfunc。作为一个相对较新的Python程序员,我发现它对我来说更容易理解,同时也是Pythonic的 在我讨论代码之前,这里是我必须首先展平的数据示例:
{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
"content_length" : 688, "open_calais_extract" : { "entities" : [
{"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 },
{"type" : "Company", "name" : "Associated Press", "rel_score" : 0.321 },
{"type" : "Country", "name" : "Indonesia", "rel_score" : 0.321 }, ... ]},
"title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42 ET",
"filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
"date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }
这是来自Mongo的查询结果。下面的代码将这些列表的集合展平
def flatten_list(items):
return sorted([entity['name'] for entity in [entities for sublist in
[item['open_calais_extract']['entities'] for item in items]
for entities in sublist])
首先,我将提取所有“entities”集合,然后对于每个entities集合,迭代字典并提取name属性 性能比较:
import itertools
import timeit
big_list = [[0]*1000 for i in range(1000)]
timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
制作:
>>> import itertools
>>> import timeit
>>> big_list = [[0]*1000 for i in range(1000)]
>>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
[3.016212113769325, 3.0148865239060227, 3.0126415732791028]
>>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
[3.019953987082083, 3.528754223385439, 3.02181439266457]
>>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
[1.812084445152557, 1.7702404451095965, 1.7722977998725362]
>>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[5.409658160700605, 5.477502077679354, 5.444318360412744]
>>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
[399.27587954973444, 400.9240571138051, 403.7521153804846]
这是在WindowsXP32位上的Python2.7.1上实现的,但是上面评论中的@temoto从_iterable获得的比map+extend
更快,因此它非常依赖于平台和输入
远离sum(big_list,[])
Ah,“sum(L,I)”是“reduce(plus_操作符,L,I)”的缩写。那有点酷。你的“最完整的优雅等”一点也不“优雅”!!查看itertools.chain的文档,了解真正的优雅@哈森j:我相信他对任意嵌套列表的意思是最好的。chain假设一个一致的、一个深度列表(这可能是所有问题所需的),但flatten处理[a,b,[c],[d,[e,f]],[[[g]]]]之类的事情。不幸的是,如果您使用pylab,这会中断,因为numpy的sum
被导入到全局命名空间中,而该函数无法以这种方式工作。GAH,我真不敢相信他们会反对它。无论如何,您不需要额外的空列表,这将很好地工作:reduce(lambda,b:a+b,x)版本的操作符被定义为操作符模块中的函数,它比lambda更快,也不那么难看:“functools.reduce(operator.add,[[1,2,3],[4,5]],[])。或者,个人使用sum(),我认为lambda方法非常漂亮。:-)如果要执行reduce,请使用reduce overextend
notadd
,以避免使用临时列表对内存进行垃圾处理。使用扩展函数包装extend
,该函数扩展后返回列表本身。无需列出()它!对于itertools.chain(*a)中的项目:对项目做一些事情,稍作解释也很好。结果=[];map(result.extend,a)比itertools.chain快约30%。但是chain.from_iterable比map+extend快一点点。[Python2.7,x86_64]这解释了*a
发生的情况:(它将a
的元素作为参数发送到链
,就像删除外部[
和]
)。如果要连接多个iterable,则chain.from\u iterable的速度要快得多。对我来说,从100个python列表中创建OpenGL顶点的ctypes数组(每个列表包含10个或100个顶点)时,速度大约快了50%。“*”运算符将iterable转换为一个中间元组,并传递给chain.Duplicate:,sum(ListofList,[])#shorter@递归较短但功能不同=性能更差,请参阅其他变体的注释以了解解释此小片段似乎是非递归扁平化的最快方法。需要更多的upvo
[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]
def join(a):
"""Joins a sequence of sequences into a single sequence. (One-level flattening.)
E.g., join([(1,2,3), [4, 5], [6, (7, 8, 9), 10]]) = [1,2,3,4,5,6,(7,8,9),10]
This is very efficient, especially when the subsequences are long.
"""
n = sum([len(b) for b in a])
l = [None]*n
i = 0
for b in a:
j = i+len(b)
l[i:j] = b
i = j
return l
[(0.5391559600830078, 'flatten4b'), # join() above.
(0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a)
(0.5419249534606934, 'flatten4a'), # Similar, using zip()
(0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a))
(0.7472689151763916, 'flatten1'), # list(itertools.chain(*a))
(1.5468521118164062, 'flatten3'), # [i for j in a for i in j]
(26.696547985076904, 'flatten2')] # sum(a, [])
from itertools import chain
x = [["a","b"], ["c"]]
y = list(chain(*x))
{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
"content_length" : 688, "open_calais_extract" : { "entities" : [
{"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 },
{"type" : "Company", "name" : "Associated Press", "rel_score" : 0.321 },
{"type" : "Country", "name" : "Indonesia", "rel_score" : 0.321 }, ... ]},
"title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42 ET",
"filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
"date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }
def flatten_list(items):
return sorted([entity['name'] for entity in [entities for sublist in
[item['open_calais_extract']['entities'] for item in items]
for entities in sublist])
import itertools
import timeit
big_list = [[0]*1000 for i in range(1000)]
timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
>>> import itertools
>>> import timeit
>>> big_list = [[0]*1000 for i in range(1000)]
>>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
[3.016212113769325, 3.0148865239060227, 3.0126415732791028]
>>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
[3.019953987082083, 3.528754223385439, 3.02181439266457]
>>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
[1.812084445152557, 1.7702404451095965, 1.7722977998725362]
>>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[5.409658160700605, 5.477502077679354, 5.444318360412744]
>>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
[399.27587954973444, 400.9240571138051, 403.7521153804846]