Python 展开并展平不规则的嵌套列表
我知道之前已经详细讨论过将嵌套列表展平的主题,但是我认为我的任务有点不同,我找不到任何信息 我正在编写一个scraper,作为输出,我得到一个嵌套列表。顶层列表元素应该成为电子表格形式的数据行。但是,由于嵌套列表通常具有不同的长度,因此我需要在展平列表之前展开它们 这里有一个例子。我有Python 展开并展平不规则的嵌套列表,python,nested-lists,Python,Nested Lists,我知道之前已经详细讨论过将嵌套列表展平的主题,但是我认为我的任务有点不同,我找不到任何信息 我正在编写一个scraper,作为输出,我得到一个嵌套列表。顶层列表元素应该成为电子表格形式的数据行。但是,由于嵌套列表通常具有不同的长度,因此我需要在展平列表之前展开它们 这里有一个例子。我有 [ [ "id1", [["x", "y", "z"], [1, 2]], ["a", "b", "c"]], [ "id2", [["x", "y", "z"], [1, 2, 3]],
[ [ "id1", [["x", "y", "z"], [1, 2]], ["a", "b", "c"]],
[ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b"]],
[ "id3", [["x", "y"], [1, 2, 3]], ["a", "b", "c", ""]] ]
我最终想要的结果是
[[ "id1", "x", "y", z, 1, 2, "", "a", "b", "c", ""],
[ "id2", "x", "y", z, 1, 2, 3, "a", "b", "", ""],
[ "id3", "x", "y", "", 1, 2, 3, "a", "b", "c", ""]]
然而,像这样的中间列表
[ [ "id1", [["x", "y", "z"], [1, 2, ""]], ["a", "b", "c", ""]],
[ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b", "", ""]],
[ "id3", [["x", "y", ""], [1, 2, 3]], ["a", "b", "c", ""]] ]
我可以简单地把它展平也可以
顶级列表元素(行)在每次迭代中构建,并附加到完整列表中。我想在最后转换完整列表更容易
嵌套元素的结构应该是相同的,但是我现在不能确定。我想我有一个问题,如果结构看起来像这样
[ [ "id1", [[x, y, z], [1, 2]], ["a", "b", "c"]],
[ "id2", [[x, y, z], [1, 2, 3]], ["bla"], ["a", "b"]],
[ "id3", [[x, y], [1, 2, 3]], ["a", "b", "c", ""]] ]
应该变成什么
[[ "id1", x, y, z, 1, 2, "", "", "a", "b", "c", ""],
[ "id2", x, y, z, 1, 2, 3, "bla", "a", "b", "", ""],
[ "id3", x, y, "", 1, 2, 3, "", "a", "b", "c", ""]]
谢谢你的评论,如果这是琐碎的,请原谅,我对Python还是比较陌生的
def recursive_pad(l, spacer=""):
# Make the function never modify it's arguments.
l = list(l)
is_list = lambda x: isinstance(x, list)
are_subelements_lists = map(is_list, l)
if not any(are_subelements_lists):
return l
# Would catch [[], [], "42"]
if not all(are_subelements_lists) and any(are_subelements_lists):
raise Exception("Cannot mix lists and non-lists!")
lengths = map(len, l)
if max(lengths) == min(lengths):
#We're already done
return l
# Pad it out
map(lambda x: list_pad(x, spacer, max(lengths)), l)
return l
def list_pad(l, spacer, pad_to):
for i in range(len(l), pad_to):
l.append(spacer)
if __name__ == "__main__":
print(recursive_pad([[[[["x", "y", "z"], [1, 2]], ["a", "b", "c"]], [[[x, y, z], [1, 2, 3]], ["a", "b"]], [[["x", "y"], [1, 2, 3]], ["a", "b", "c", ""]] ]))
编辑:事实上,我误解了你的问题。这段代码解决了一个稍有不同的问题实际上,对于结构不相同的一般情况,没有解决方案。 例如,一个普通算法将
[“bla”]
与[“a”、“b”、“c”]
匹配,结果将是
[ [ "id1", x, y, z, 1, 2, "", "a", "b", "c", "", "", ""],
[ "id2", x, y, z, 1, 2, 3, "bla", "", "", "", "a", "b"],
[ "id3", x, y, "", 1, 2, 3, "a", "b", "c", "", "", ""]]
但是,如果您知道将有许多行,每一行都以一个ID开头,后跟一个嵌套的列表结构,那么下面的算法应该可以工作:
import itertools
def normalize(l):
# just hack the first item to have only lists of lists or lists of items
for sublist in l:
sublist[0] = [sublist[0]]
# break the nesting
def flatten(l):
for item in l:
if not isinstance(item, list) or 0 == len([x for x in item if isinstance(x, list)]):
yield item
else:
for subitem in flatten(item):
yield subitem
l = [list(flatten(i)) for i in l]
# extend all lists to greatest length
list_lengths = { }
for i in range(0, len(l[0])):
for item in l:
list_lengths[i] = max(len(item[i]), list_lengths.get(i, 0))
for i in range(0, len(l[0])):
for item in l:
item[i] += [''] * (list_lengths[i] - len(item[i]))
# flatten each row
return [list(itertools.chain(*sublist)) for sublist in l]
l = [ [ "id1", [["x", "y", "z"], [1, 2]], ["a", "b", "c"]],
[ "id2", [["x", "y", "z"], [1, 2, 3]], ["a", "b"]],
[ "id3", [["x", "y"], [1, 2, 3]], ["a", "b", "c", ""]] ]
l = normalize(l)
print l
对于“相同结构”的情况,我有一个简单的解决方案,使用递归生成器和
itertools
中的izip_longest
函数。这段代码适用于Python 2,但通过一些调整(注释中提到),可以使其在Python 3上工作:
from itertools import izip_longest # in py3, this is renamed zip_longest
def flatten(nested_list):
return zip(*_flattengen(nested_list)) # in py3, wrap this in list()
def _flattengen(iterable):
for element in izip_longest(*iterable, fillvalue=""):
if isinstance(element[0], list):
for e in _flattengen(element):
yield e
else:
yield element
在Python3.3中,它将变得更加简单,这将允许递归步骤,
for e In _flatengen(element):yield e
,变成yield from _flatengen(element)
请澄清您希望如何表示空格,因为[x,y,,1,2,3,“a”,“b”,“c”,“”]
看起来不是有效的Python列表-您必须在y
之后和1
之前放置一些内容。您希望它是None
?但这将与列表末尾用作空白的“
相冲突……还不清楚什么是x
、y
和z
。它们是某种事先定义的常数或变量吗?编辑它以澄清。列表中的某些项已经是空白的,可以用空白展开列表。我根据从页面中提取的元素/列表构建列表。您如何处理上一个示例?我的意思是数据集中的第二行有4个元素,而其余的只有3个。是否应该用空格从右边填充其余的行?再次扩展问题。在这种情况下,应插入空格,以便结果如图所示。我对齐了相应的列表/列。有一个输入错误:length=map(len(l))
应该是length=map(len,l)
@BigYellowCactus,谢谢,修复了!如果我理解正确,这应该将列表扩展到一个公共长度?然而,当我运行它时,它实际上并没有改变任何东西。这是一个优雅而灵活的解决方案,也是有效使用递归的好地方。我正在测试它,它似乎可以工作。我已经确定了结构偏离的大多数情况,并将事先处理这些情况。