Python NLP-仅在单词的开头和结尾使用单独的标点符号

Python NLP-仅在单词的开头和结尾使用单独的标点符号,python,regex,machine-learning,nlp,data-science,Python,Regex,Machine Learning,Nlp,Data Science,我是NLP的新手,我在学习的同时尝试基本的预处理步骤。我正在尝试在单词的开头和结尾处分离标点符号以进行嵌入。在这样做的时候,我不想破坏诸如不能、我等词,因为我要单独处理它们 s = 'This is what I'm trying to do, but I can't figure out how.' 期望输出: s_separated = 'This is what I'm trying to do , but I can't figure out how .' 尝试一下: import

我是NLP的新手,我在学习的同时尝试基本的预处理步骤。我正在尝试在单词的开头和结尾处分离标点符号以进行嵌入。在这样做的时候,我不想破坏诸如
不能
等词,因为我要单独处理它们

s = 'This is what I'm trying to do, but I can't figure out how.'
期望输出:

s_separated = 'This is what I'm trying to do , but I can't figure out how .'
尝试一下:

import re

str = "This is what I'm trying to do, but I can't figure out how."
res = re.sub(r'(?<=\w)(?=[,.!;:])', ' ', str)
print res
重新导入
str=“这就是我想做的,但我不知道怎么做。”

res=re.sub(r')(?你能把你当前的方法和失败的地方都包括进来吗单词边界的正则表达式是
\b
,所以类似
r',\b'
的东西会在单词的末尾找到逗号