Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2008/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将行拆分为段落_Python - Fatal编程技术网

Python 将行拆分为段落

Python 将行拆分为段落,python,Python,输入:行的列表 输出:行列表,是在(一个或多个)空行处拆分的输入列表 这是迄今为止我所拥有的最不丑陋的解决方案: split_at_empty(lines): paragraphs = [] p = [] def flush(): if p: paragraphs.append(p) p = [] for l in lines: if l: p.append(l)

输入:行的列表

输出:行列表,是在(一个或多个)空行处拆分的输入列表

这是迄今为止我所拥有的最不丑陋的解决方案:

split_at_empty(lines):
    paragraphs = []
    p = []
    def flush():
        if p:
            paragraphs.append(p)
        p = []
    for l in lines:
        if l:
            p.append(l)
        else:
            flush()
    flush()
    return paragraphs
必须有更好的解决方案(甚至可能是功能性的)!有人吗

输入列表示例:

['','2','3','','5','6','7','8','','','11']
输出:

[['2','3'],['5','6','7','8'],['11']]

您可以将列表合并为字符串,然后重新拆分:

>>> a = ['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
>>> [x.strip().split(' ') for x in ' '.join(a).split('  ')]
[['2', '3'], ['5', '6', '7', '8'], ['11']]
您可能应该使用正则表达式来捕获任意数量的空格(我在'11'之前添加了另一个):


您可以将列表合并为字符串,然后重新拆分:

>>> a = ['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
>>> [x.strip().split(' ') for x in ' '.join(a).split('  ')]
[['2', '3'], ['5', '6', '7', '8'], ['11']]
您可能应该使用正则表达式来捕获任意数量的空格(我在'11'之前添加了另一个):


以下是基于生成器的解决方案:

def split_at_empty(lines):
   sep = [0] + [i for (i,l) in enumerate(lines) if not l] + [len(lines)]
   for start, end in zip(sep[:-1], sep[1:]):
      if start + 1 < end:
         yield lines[start+1:end]
它产生

['2', '3']
['5', '6', '7', '8']
['11']

以下是基于生成器的解决方案:

def split_at_empty(lines):
   sep = [0] + [i for (i,l) in enumerate(lines) if not l] + [len(lines)]
   for start, end in zip(sep[:-1], sep[1:]):
      if start + 1 < end:
         yield lines[start+1:end]
它产生

['2', '3']
['5', '6', '7', '8']
['11']
结果

['Princess Maria Amelia of Brazil (1831\x961853)']

['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]

['and Amelie of Leuchtenberg.']

["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]

['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']

[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]

['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]

['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
另一种方法是按列表行事:

li = [ '', '2', '3', '', '5', '6', '7', '8', '', '', '11']

lo = ['5055','','','2','54','87','','1','2','5','8','','']

lu = ['AAAAA','BB','','HU','JU','GU']

def selines(L):
    ye = []
    for x in L:
        if x:
            ye.append(x)
        elif ye:
            yield ye ; ye = []
    if ye:
        yield ye



for lx in (li,lo,lu):
    print lx
    print list(selines(lx))
    print
结果

['Princess Maria Amelia of Brazil (1831\x961853)']

['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]

['and Amelie of Leuchtenberg.']

["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]

['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']

[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]

['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]

['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
结果

['Princess Maria Amelia of Brazil (1831\x961853)']

['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]

['and Amelie of Leuchtenberg.']

["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]

['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']

[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]

['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]

['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]
另一种方法是按列表行事:

li = [ '', '2', '3', '', '5', '6', '7', '8', '', '', '11']

lo = ['5055','','','2','54','87','','1','2','5','8','','']

lu = ['AAAAA','BB','','HU','JU','GU']

def selines(L):
    ye = []
    for x in L:
        if x:
            ye.append(x)
        elif ye:
            yield ye ; ye = []
    if ye:
        yield ye



for lx in (li,lo,lu):
    print lx
    print list(selines(lx))
    print
结果

['Princess Maria Amelia of Brazil (1831\x961853)']

['was the daughter of Dom Pedro I,', "founder of Brazil's independence and its first emperor,"]

['and Amelie of Leuchtenberg.']

["The only child from her father's second marriage,", 'Maria Amelia was born in France', "following Pedro I's 1831 abdication in favor of his son Dom Pedro II."]

['Before Maria Amelia was a month old, Pedro I left for Portugal', 'to restore its crown to his eldest daughter Dona Maria II.', "He defeated his brother Miguel I (who had usurped Maria II's throne),", 'only to die a few months later of tuberculosis.']

[['2', '3'], ['5', '6', '7', '8'], ['11']]
['', '2', '3', '', '5', '6', '7', '8', '', '', '11']
[['2', '3'], ['5', '6', '7', '8'], ['11']]

['5055', '', '', '2', '54', '87', '', '1', '2', '5', '8', '', '']
[['5055'], ['2', '54', '87'], ['1', '2', '5', '8']]

['AAAAA', 'BB', '', 'HU', 'JU', 'GU']
[['AAAAA', 'BB'], ['HU', 'JU', 'GU']]

比原版稍微丑一点:

def split_at_empty(lines):
    r = [[]]
    for l in lines:
        if l:
            r[-1].append(l)
        else:
            r.append([])
    return [l for l in r if l]

(最后一行去掉了原本要添加的空列表。)

比原来的略不难看:

def split_at_empty(lines):
    r = [[]]
    for l in lines:
        if l:
            r[-1].append(l)
        else:
            r.append([])
    return [l for l in r if l]

(最后一行删除了原本会添加的空列表。)

对于列表理解痴迷者

def split_at_empty(L):
    return [L[start:end+1] for start, end in zip(
        [n for n in xrange(len(L)) if L[n] and (n == 0 or not L[n-1])],
        [n for n in xrange(len(L)) if L[n] and (n+1 == len(L) or not L[n+1])]
        )]
或者更好

def split_at_empty(lines):
    L = [i for i, a in enumerate(lines) if not a]
    return [lines[s + 1:e] for s, e in zip([-1] + L, L + [len(lines)]) 
            if e > s + 1]

而对于那些痴迷于理解的人

def split_at_empty(L):
    return [L[start:end+1] for start, end in zip(
        [n for n in xrange(len(L)) if L[n] and (n == 0 or not L[n-1])],
        [n for n in xrange(len(L)) if L[n] and (n+1 == len(L) or not L[n+1])]
        )]
或者更好

def split_at_empty(lines):
    L = [i for i, a in enumerate(lines) if not a]
    return [lines[s + 1:e] for s, e in zip([-1] + L, L + [len(lines)]) 
            if e > s + 1]


发布输入列表的示例。@Jo因此您的“解决方案”不起作用:
flush()
中的局部
p
负责
UnboundLocalError:赋值前引用的局部变量“p”
。那不是serious@eyquem. 我的错。太过沉迷于JavaScript了。要让它工作,我们必须让它更难看一点。@Jo好吧,你是一个很好的人张贴了你的输入列表样本。@Jo所以你的“解决方案”不起作用:local
p
in
flush()
负责
UnboundLocalError:赋值前引用的局部变量“p
。那不是serious@eyquem. 我的错。太过沉迷于JavaScript了。要让它工作,我们必须让它更难看一点。@Jo好吧,你是个不错的人我想到过这个,但你不觉得它有点复杂,有太多的开销和长的线路,唯一的好处是线路少?我想到过这个,但是你不觉得它有点复杂,而且有太多的开销和很长的线路,唯一的好处是线路少?也许这是最好的一个!使用发电机可以让它更干净,谢谢。现在,这是一个被接受的答案。请把它编辑得简洁明了,你会得到回复:)@Jo你好,我仔细考虑了你的评论。我在两天前更正了上述代码。今天我还纠正了我的其他答案。因为你是对的,我倾向于写太长的答案。我甚至删除了我刚才在这里写的评论,没有兴趣用我这些无用的评论来干扰stackoverflow的记忆。谢谢你指出我的缺点,我会提醒你,也许这是最好的!使用发电机可以让它更干净,谢谢。现在,这是一个被接受的答案。请把它编辑得简洁明了,你会得到回复:)@Jo你好,我仔细考虑了你的评论。我在两天前更正了上述代码。今天我还纠正了我的其他答案。因为你是对的,我倾向于写太长的答案。我甚至删除了我刚才在这里写的评论,没有兴趣用我这些无用的评论来干扰stackoverflow的记忆。谢谢你指出了我的缺点,我会提醒你,还不错,真的!开销还可以。我比较喜欢这个简单。唯一的问题是如果输入列表很大。真的不错!开销还可以。我比较喜欢这个简单。唯一的问题是输入列表是否庞大。不幸的是,第一个列表是错误的。第二个很好,但不能很好地使用列表生成器。这两个都适用于原始示例输入(和其他输入)。您使用的是什么输入列表?对不起,您是对的,但是使用不同索引的
L
行有点混乱。而且,与
L[n]!=“相比,
L[n]更容易阅读,而不是L[n+1]
”而L[n+1]=''
(也许你想把我的更改排除在外)不幸的是,第一个错误。第二个很好,但不能很好地使用列表生成器。这两个都适用于原始示例输入(和其他输入)。您使用的是什么输入列表?对不起,您是对的,但是使用不同索引的
L
行有点混乱。而且,与
L[n]!=“相比,
L[n]更容易阅读,而不是L[n+1]
”和L[n+1]=''
(也许你想删除我的更改)