Python 当一行没有'时，组合多行；不要以点结尾_Python_String

Python 当一行没有'时，组合多行；不要以点结尾

python string

Python 当一行没有'时，组合多行；不要以点结尾,python,string,Python,String,我正在清理一些字幕文本，我正在寻找一种方法，当它们不以“.”结尾时，将不同的行合并成一行。考虑到这个例子 No, he's building one actually. Baller. Anyway, he's also offering a hundred k to people willing to skip or drop out of college To pursue their idea. 我想把它转换成 No, he's building one actually. Bal

我正在清理一些字幕文本，我正在寻找一种方法，当它们不以“.”结尾时，将不同的行合并成一行。考虑到这个例子

No, he's building one actually. 
Baller. 
Anyway, he's also offering a hundred k to people willing to skip or drop out of college
To pursue their idea.

我想把它转换成

No, he's building one actually. 
Baller. 
Anyway, he's also offering a hundred k to people willing to skip or drop out of college to pursue their idea.

所以每一行都以一个点结束。您对如何实现这一点有什么建议吗？

一个选项是使用lookback来断言例如小写字符a-z，然后匹配换行符并在下一行使用大写字符a-z的捕获组

在替换中，在捕获组上使用

lower（）

，并预先添加空格，以便匹配的换行符不属于替换的一部分

import re

s = ("No, he's building one actually. \n"
                "Baller. \n"
                "Anyway, he's also offering a hundred k to people willing to skip or drop out of college\n"
                "To pursue their idea.")

s = re.sub(r"(?<=[a-z])\r?\n([A-Z])", lambda x: " " + x.group(1).lower(), s)
print(s)

例如，另一个选项可以是断言非whitspace字符，但左侧的点

[^\s.]

除外，以使其更宽一些

(?<=[^\s.])\r?\n([A-Z])

（？代码：
file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)

摘要：
file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)

我们首先将换行标题分隔开。我们将以点结尾的句子放在一个单独的列表中（除非这个句子在一个没有以点结尾的句子之后，否则not_add_count的目的是指示这个句子的位置）。然后，我们将过滤后的所有内容打印干净。
注意：
file = open("your_file_name.txt","r").readlines()

list_1 = []
list_2 = []

for i in file:
    if i[-1] == "\n":
        list_1.append(i[:-1])
    else:
        list_1.append(i)

count = 0
not_add_count = None
for i in list_1:
    if count != not_add_count:
        if i[-1] == ".":
            list_2.append(i)
        else:
            list_2.append(i+" "+list_1[count+1].lower())
            not_add_count=count+1


    count +=1

for i in list_2:
    print(i)

这个答案的目的只是为了展示另一种方式
谢谢你的提示！我使用“.read（）”从一个txt文件中加载了文本，如果我运行你的命令，事情不会得到解决……我做错了什么吗？@albus_c我已经用open（'yourfile'，'r'）测试了它，作为f:s=f.read（）s=re.sub（r）（？您不需要\r？
，因为python通过default@DillonDavis谢谢你的评论，我不知道。因为这是可选的，所以在这种情况下应该不重要。@我的第四只鸟有一些空行，我正在使用text=os.linesep.join（[s代表text.splitlines（）中的s，如果s]）处理这些空行
。这是我与您的脚本的唯一区别（我仍然没有得到我希望看到的结果）