Python 正则表达式将字符添加到匹配的字符串_Python_Regex_Nlp

Python 正则表达式将字符添加到匹配的字符串

python regex nlp

Python 正则表达式将字符添加到匹配的字符串,python,regex,nlp,Python,Regex,Nlp,我有一个很长的字符串，它是一个段落，但是句点后没有空格。例如： para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It i

我有一个很长的字符串，它是一个段落，但是句点后没有空格。例如：

para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."

我正在尝试使用re.sub来解决这个问题，但输出不是我所期望的

这就是我所做的：

re.sub("(?<=\.).", " \1", para)

re.sub

将匹配的字符替换为

\x01

，而不是匹配任何前面有句点的字符并在其前面添加空格。为什么？如何在匹配字符串之前添加字符？

您可以使用以下正则表达式（带有and断言）：

（？我想这就是你想要做的。你可以传入一个函数来进行替换
import re

def my_replace(match):
    return " " + match.group()

my_string = "dhd.hd hd hs fjs.hello"
print(re.sub(r'(?<=\.).', my_replace, my_string))

正如@Seanny123所指出的，这将添加一个空格，即使在该句点之后已经有空格。
该（？可以使用的最简单的正则表达式替换是：
re.sub(r'\.(?=\w)', '. ', para)

它只是简单地匹配每个句点，并使用前瞻，（？=\w）
确保句点后面有一个单词字符，而不是空格，并用
替换，稍微修改一下的正则表达式也可以：
print re.sub(r"([\.])([^\s])", r"\1 \2", para)

# I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses' home and rapes, tortures and kills various women. It is in black and white but saves the colour for one shocking shot. At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. Avoid.

我认为问题应该是“但是经过一段时间后，没有空白。”因为在第一个句号之后有空格吗？@shash678是的，你是对的。我没有这么说，因为在我的情况下，有多个空格是可以的，我不想让问题复杂化。这里总是有一个便宜的答案：text=text.replace（“.”，“.”。replace（“+”，”）
（字符串连接是因为堆栈交换占用了双空格）。基本上，将每个句点替换为句点+空格，将每个句点+空格+空格替换为句点+单个空格。不需要正则表达式，也不必导入任何内容。如果已经存在空格，则此答案会添加一个空格。@seanny123 OP states“经期后没有空格".我们可以整天为要求争论。我在打电话，不想费心去完善这个。只是不要放弃，继续前进，伙计。对不起，关于语气和内容。我试图提供信息，搞砸了。我的错。@seanny123不，伙计，你很好。你是对的。我只是累了，在电话上，所以很难打出有意义的代表谎言。我会在我的amswer中添加一条，添加您的评论，以确保这是有价值的信息。@Seanny123是正确的，这个解决方案是有效的，因为在这种情况下，我可以使用多个空格。如果我们对这个解决方案进行一点推广，那么我们需要注意额外的空格。您的方法好得多！它减少了复杂性只需匹配一个字符即可实现exity。很有效，谢谢：）你不打算用这种方式将“…”替换为“…”。@AlbertoSantini很好的捕获，已更新以防止出现这种情况，谢谢！更简单的是：re.sub（r“\”（\S）”，r“\1”，para）+1以避免四处张望。
import re

def my_replace(match):
    return " " + match.group()

my_string = "dhd.hd hd hs fjs.hello"
print(re.sub(r'(?<=\.).', my_replace, my_string))

dhd. hd hd hs fjs. hello

re.sub(r'\.(?=[^ .])', '. ', para)

re.sub(r'\.(?=\w)', '. ', para)

print re.sub(r"([\.])([^\s])", r"\1 \2", para)

# I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses' home and rapes, tortures and kills various women. It is in black and white but saves the colour for one shocking shot. At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. Avoid.