Python 当一行没有'时,组合多行;不要以点结尾
我正在清理一些字幕文本,我正在寻找一种方法,当它们不以“.”结尾时,将不同的行合并成一行。考虑到这个例子Python 当一行没有'时,组合多行;不要以点结尾,python,string,Python,String,我正在清理一些字幕文本,我正在寻找一种方法,当它们不以“.”结尾时,将不同的行合并成一行。考虑到这个例子 No, he's building one actually. Baller. Anyway, he's also offering a hundred k to people willing to skip or drop out of college To pursue their idea. 我想把它转换成 No, he's building one actually. Bal
No, he's building one actually.
Baller.
Anyway, he's also offering a hundred k to people willing to skip or drop out of college
To pursue their idea.
我想把它转换成
No, he's building one actually.
Baller.
Anyway, he's also offering a hundred k to people willing to skip or drop out of college to pursue their idea.
所以每一行都以一个点结束。您对如何实现这一点有什么建议吗?一个选项是使用lookback来断言例如小写字符a-z,然后匹配换行符并在下一行使用大写字符a-z的捕获组 在替换中,在捕获组上使用
lower()
,并预先添加空格,以便匹配的换行符不属于替换的一部分
import re
s = ("No, he's building one actually. \n"
"Baller. \n"
"Anyway, he's also offering a hundred k to people willing to skip or drop out of college\n"
"To pursue their idea.")
s = re.sub(r"(?<=[a-z])\r?\n([A-Z])", lambda x: " " + x.group(1).lower(), s)
print(s)
例如,另一个选项可以是断言非whitspace字符,但左侧的点
[^\s.]
除外,以使其更宽一些
(?<=[^\s.])\r?\n([A-Z])
(?代码:
file = open("your_file_name.txt","r").readlines()
list_1 = []
list_2 = []
for i in file:
if i[-1] == "\n":
list_1.append(i[:-1])
else:
list_1.append(i)
count = 0
not_add_count = None
for i in list_1:
if count != not_add_count:
if i[-1] == ".":
list_2.append(i)
else:
list_2.append(i+" "+list_1[count+1].lower())
not_add_count=count+1
count +=1
for i in list_2:
print(i)
摘要:
file = open("your_file_name.txt","r").readlines()
list_1 = []
list_2 = []
for i in file:
if i[-1] == "\n":
list_1.append(i[:-1])
else:
list_1.append(i)
count = 0
not_add_count = None
for i in list_1:
if count != not_add_count:
if i[-1] == ".":
list_2.append(i)
else:
list_2.append(i+" "+list_1[count+1].lower())
not_add_count=count+1
count +=1
for i in list_2:
print(i)
我们首先将换行标题分隔开。我们将以点结尾的句子放在一个单独的列表中(除非这个句子在一个没有以点结尾的句子之后,否则not_add_count的目的是指示这个句子的位置)。然后,我们将过滤后的所有内容打印干净。
注意:
file = open("your_file_name.txt","r").readlines()
list_1 = []
list_2 = []
for i in file:
if i[-1] == "\n":
list_1.append(i[:-1])
else:
list_1.append(i)
count = 0
not_add_count = None
for i in list_1:
if count != not_add_count:
if i[-1] == ".":
list_2.append(i)
else:
list_2.append(i+" "+list_1[count+1].lower())
not_add_count=count+1
count +=1
for i in list_2:
print(i)
这个答案的目的只是为了展示另一种方式谢谢你的提示!我使用“.read()”从一个txt文件中加载了文本,如果我运行你的命令,事情不会得到解决……我做错了什么吗?@albus_c我已经用open('yourfile','r')测试了它,作为f:s=f.read()s=re.sub(r)(?您不需要\r?
,因为python通过default@DillonDavis谢谢你的评论,我不知道。因为这是可选的,所以在这种情况下应该不重要。@我的第四只鸟有一些空行,我正在使用text=os.linesep.join([s代表text.splitlines()中的s,如果s])处理这些空行
。这是我与您的脚本的唯一区别(我仍然没有得到我希望看到的结果)