Python 将行拆分为段落
输入:行的列表 输出:行列表,是在(一个或多个)空行处拆分的输入列表 这是迄今为止我所拥有的最不丑陋的解决方案:Python 将行拆分为段落,python,Python,输入:行的列表 输出:行列表,是在(一个或多个)空行处拆分的输入列表 这是迄今为止我所拥有的最不丑陋的解决方案: split_at_empty(lines): paragraphs = [] p = [] def flush(): if p: paragraphs.append(p) p = [] for l in lines: if l: p.append(l)
split_at_empty(lines):
paragraphs = []
p = []
def flush():
if p:
paragraphs.append(p)
p = []
for l in lines:
if l:
p.append(l)
else:
flush()
flush()
return paragraphs
必须有更好的解决方案(甚至可能是功能性的)!有人吗
输入列表示例:
['','2','3','','5','6','7','8','','','11']
输出:
[['2','3'],['5','6','7','8'],['11']]
您可以将列表合并为字符串,然后重新拆分:
>>> a = ['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
>>> [x.strip().split(' ') for x in ' '.join(a).split(' ')]
[['2', '3'], ['5', '6', '7', '8'], ['11']]
您可能应该使用正则表达式来捕获任意数量的空格(我在'11'之前添加了另一个):
您可以将列表合并为字符串,然后重新拆分:
>>> a = ['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
>>> [x.strip().split(' ') for x in ' '.join(a).split(' ')]
[['2', '3'], ['5', '6', '7', '8'], ['11']]
您可能应该使用正则表达式来捕获任意数量的空格(我在'11'之前添加了另一个):
以下是基于生成器的解决方案:
def split_at_empty(lines):
sep = [0] + [i for (i,l) in enumerate(lines) if not l] + [len(lines)]
for start, end in zip(sep[:-1], sep[1:]):
if start + 1 < end:
yield lines[start+1:end]
它产生
['2', '3']
['5', '6', '7', '8']
['11']
以下是基于生成器的解决方案:
def split_at_empty(lines):
sep = [0] + [i for (i,l) in enumerate(lines) if not l] + [len(lines)]
for start, end in zip(sep[:-1], sep[1:]):
if start + 1 < end:
yield lines[start+1:end]
它产生
['2', '3']
['5', '6', '7', '8']
['11']
结果
['Princess Maria Amelia of Brazil (1831\x961853)']
['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]
['and Amelie of Leuchtenberg.']
["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]
['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]
['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
另一种方法是按列表行事:
li = [ '', '2', '3', '', '5', '6', '7', '8', '', '', '11']
lo = ['5055','','','2','54','87','','1','2','5','8','','']
lu = ['AAAAA','BB','','HU','JU','GU']
def selines(L):
ye = []
for x in L:
if x:
ye.append(x)
elif ye:
yield ye ; ye = []
if ye:
yield ye
for lx in (li,lo,lu):
print lx
print list(selines(lx))
print
结果
['Princess Maria Amelia of Brazil (1831\x961853)']
['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]
['and Amelie of Leuchtenberg.']
["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]
['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]
['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
结果
['Princess Maria Amelia of Brazil (1831\x961853)']
['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]
['and Amelie of Leuchtenberg.']
["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]
['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]
['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
另一种方法是按列表行事:
li = [ '', '2', '3', '', '5', '6', '7', '8', '', '', '11']
lo = ['5055','','','2','54','87','','1','2','5','8','','']
lu = ['AAAAA','BB','','HU','JU','GU']
def selines(L):
ye = []
for x in L:
if x:
ye.append(x)
elif ye:
yield ye ; ye = []
if ye:
yield ye
for lx in (li,lo,lu):
print lx
print list(selines(lx))
print
结果
['Princess Maria Amelia of Brazil (1831\x961853)']
['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]
['and Amelie of Leuchtenberg.']
["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]
['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]
['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]
['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
比原版稍微丑一点:
def split_at_empty(lines):
r = [[]]
for l in lines:
if l:
r[-1].append(l)
else:
r.append([])
return [l for l in r if l]
(最后一行去掉了原本要添加的空列表。)比原来的略不难看:
def split_at_empty(lines):
r = [[]]
for l in lines:
if l:
r[-1].append(l)
else:
r.append([])
return [l for l in r if l]
(最后一行删除了原本会添加的空列表。)对于列表理解痴迷者
def split_at_empty(L):
return [L[start:end+1] for start, end in zip(
[n for n in xrange(len(L)) if L[n] and (n == 0 or not L[n-1])],
[n for n in xrange(len(L)) if L[n] and (n+1 == len(L) or not L[n+1])]
)]
或者更好
def split_at_empty(lines):
L = [i for i, a in enumerate(lines) if not a]
return [lines[s + 1:e] for s, e in zip([-1] + L, L + [len(lines)])
if e > s + 1]
而对于那些痴迷于理解的人
def split_at_empty(L):
return [L[start:end+1] for start, end in zip(
[n for n in xrange(len(L)) if L[n] and (n == 0 or not L[n-1])],
[n for n in xrange(len(L)) if L[n] and (n+1 == len(L) or not L[n+1])]
)]
或者更好
def split_at_empty(lines):
L = [i for i, a in enumerate(lines) if not a]
return [lines[s + 1:e] for s, e in zip([-1] + L, L + [len(lines)])
if e > s + 1]
发布输入列表的示例。@Jo因此您的“解决方案”不起作用:
flush()
中的局部p
负责UnboundLocalError:赋值前引用的局部变量“p”
。那不是serious@eyquem. 我的错。太过沉迷于JavaScript了。要让它工作,我们必须让它更难看一点。@Jo好吧,你是一个很好的人张贴了你的输入列表样本。@Jo所以你的“解决方案”不起作用:localpinflush()
负责UnboundLocalError:赋值前引用的局部变量“p
。那不是serious@eyquem. 我的错。太过沉迷于JavaScript了。要让它工作,我们必须让它更难看一点。@Jo好吧,你是个不错的人我想到过这个,但你不觉得它有点复杂,有太多的开销和长的线路,唯一的好处是线路少?我想到过这个,但是你不觉得它有点复杂,而且有太多的开销和很长的线路,唯一的好处是线路少?也许这是最好的一个!使用发电机可以让它更干净,谢谢。现在,这是一个被接受的答案。请把它编辑得简洁明了,你会得到回复:)@Jo你好,我仔细考虑了你的评论。我在两天前更正了上述代码。今天我还纠正了我的其他答案。因为你是对的,我倾向于写太长的答案。我甚至删除了我刚才在这里写的评论,没有兴趣用我这些无用的评论来干扰stackoverflow的记忆。谢谢你指出我的缺点,我会提醒你,也许这是最好的!使用发电机可以让它更干净,谢谢。现在,这是一个被接受的答案。请把它编辑得简洁明了,你会得到回复:)@Jo你好,我仔细考虑了你的评论。我在两天前更正了上述代码。今天我还纠正了我的其他答案。因为你是对的,我倾向于写太长的答案。我甚至删除了我刚才在这里写的评论,没有兴趣用我这些无用的评论来干扰stackoverflow的记忆。谢谢你指出了我的缺点,我会提醒你,还不错,真的!开销还可以。我比较喜欢这个简单。唯一的问题是如果输入列表很大。真的不错!开销还可以。我比较喜欢这个简单。唯一的问题是输入列表是否庞大。不幸的是,第一个列表是错误的。第二个很好,但不能很好地使用列表生成器。这两个都适用于原始示例输入(和其他输入)。您使用的是什么输入列表?对不起,您是对的,但是使用不同索引的L
和行有点混乱。而且,与L[n]!=“相比,L[n]更容易阅读,而不是L[n+1]
”而L[n+1]=''
(也许你想把我的更改排除在外)不幸的是,第一个错误。第二个很好,但不能很好地使用列表生成器。这两个都适用于原始示例输入(和其他输入)。您使用的是什么输入列表?对不起,您是对的,但是使用不同索引的L
和行有点混乱。而且,与L[n]!=“相比,L[n]更容易阅读,而不是L[n+1]
”和L[n+1]=''
(也许你想删除我的更改)