Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当一行没有'时,组合多行;不要以点结尾_Python_String - Fatal编程技术网

Python 当一行没有'时,组合多行;不要以点结尾

Python 当一行没有'时,组合多行;不要以点结尾,python,string,Python,String,我正在清理一些字幕文本,我正在寻找一种方法,当它们不以“.”结尾时,将不同的行合并成一行。考虑到这个例子 No, he's building one actually. Baller. Anyway, he's also offering a hundred k to people willing to skip or drop out of college To pursue their idea. 我想把它转换成 No, he's building one actually. Bal

我正在清理一些字幕文本,我正在寻找一种方法,当它们不以“.”结尾时,将不同的行合并成一行。考虑到这个例子

No, he's building one actually. 
Baller. 
Anyway, he's also offering a hundred k to people willing to skip or drop out of college
To pursue their idea.
我想把它转换成

No, he's building one actually. 
Baller. 
Anyway, he's also offering a hundred k to people willing to skip or drop out of college to pursue their idea.

所以每一行都以一个点结束。您对如何实现这一点有什么建议吗?

一个选项是使用lookback来断言例如小写字符a-z,然后匹配换行符并在下一行使用大写字符a-z的捕获组

在替换中,在捕获组上使用
lower()
,并预先添加空格,以便匹配的换行符不属于替换的一部分

import re

s = ("No, he's building one actually. \n"
                "Baller. \n"
                "Anyway, he's also offering a hundred k to people willing to skip or drop out of college\n"
                "To pursue their idea.")

s = re.sub(r"(?<=[a-z])\r?\n([A-Z])", lambda x: " " + x.group(1).lower(), s)
print(s)

例如,另一个选项可以是断言非whitspace字符,但左侧的点
[^\s.]
除外,以使其更宽一些

(?<=[^\s.])\r?\n([A-Z])

(?代码:

file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)
摘要:

file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)
我们首先将换行标题分隔开。我们将以点结尾的句子放在一个单独的列表中(除非这个句子在一个没有以点结尾的句子之后,否则not_add_count的目的是指示这个句子的位置)。然后,我们将过滤后的所有内容打印干净。

注意:

file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)

这个答案的目的只是为了展示另一种方式

谢谢你的提示!我使用“.read()”从一个txt文件中加载了文本,如果我运行你的命令,事情不会得到解决……我做错了什么吗?@albus_c我已经用open('yourfile','r')测试了它,作为f:s=f.read()s=re.sub(r)(?您不需要
\r?
,因为python通过default@DillonDavis谢谢你的评论,我不知道。因为这是可选的,所以在这种情况下应该不重要。@我的第四只鸟有一些空行,我正在使用
text=os.linesep.join([s代表text.splitlines()中的s,如果s])处理这些空行
。这是我与您的脚本的唯一区别(我仍然没有得到我希望看到的结果)