我在python中遇到了一个问题，就是用句子的行尾字符分割文本的一部分_Python_Regex_Annotations

我在python中遇到了一个问题，就是用句子的行尾字符分割文本的一部分

python regex

我在python中遇到了一个问题，就是用句子的行尾字符分割文本的一部分,python,regex,annotations,Python,Regex,Annotations,我想分析一个xml文件，我的程序的一部分将数据分割成句子，但我的行尾字符消失了……我需要它们在句子的开头和结尾添加带有xml标记的注释目前我有： import re line_end_chars = "!", "?", ".",">" regexPattern = '|'.join(map(re.escape, line_end_chars)) line_list = re.split(regexPattern, texte) 问题如果我用文本运行这个代码 " Je pens

我想分析一个xml文件，我的程序的一部分将数据分割成句子，但我的行尾字符消失了……我需要它们在句子的开头和结尾添加带有xml标记的注释

目前我有：

import re

line_end_chars = "!", "?", ".",">"


regexPattern = '|'.join(map(re.escape, line_end_chars))

line_list = re.split(regexPattern, texte)

问题如果我用文本运行这个代码

" Je pense que cela est compliqué de coder. Où puis-je apprendre?"

这会给我：

["Je pense que cela est compliqué de coder",
"Où puis-je apprendre"]

而不是我想要的，那就是：

["Je pense que cela est compliqué de coder.",
"Où puis-je apprendre?"]

然后我可以做一个

.replace

代码来添加我的xml标记。

一个可能的解决方案是使用

re.sub

而不是

re.split

，然后使用

str.splitlines（）

：

印刷品：

['Je pense que cela est compliqué de coder.', 'Où puis-je apprendre?']

我有两种方法可以考虑这样做

import re

# Method 1)
line_end_chars = "!", "?", ".", ">"
regexPattern = '|'.join(map(re.escape, line_end_chars))
s = "Je pense que cela est compliqué de coder. Où puis-je apprendre?"
linelist = []

for substr, delim in zip(re.split(regexPattern, s), re.findall(regexPattern, s)):
    linelist.append(substr+delim)

# Method 2)
line_end_chars = ["!", "?", ".", ">"]
s = "Je pense que cela est compliqué de coder. Où puis-je apprendre?"
linelist = []

temp_str = ""
for c in s:
    if c in line_end_chars:
        linelist.append(temp_str+c)
        temp_str = ""
    else:
        temp_str += c

都是印刷品

['Je pense que cela est compliqué de coder.', 'Où puis-je apprendre?']

谢谢你的帮助，效果很好！！！现在我可以更有效地完成我的程序了。谢谢你，我也尝试了你的方法，它们也很棒！我保存所有的学习方法。

['Je pense que cela est compliqué de coder.', 'Où puis-je apprendre?']