使用python将字符串拆分为句子_Python_String

使用python将字符串拆分为句子

python string

使用python将字符串拆分为句子,python,string,Python,String,我有以下字符串： string = 'This is one sentence ${w_{1},..,w_{i}}$. This is another sentence. ' 现在，我想把它分成两句话然而，当我这样做时： string.split('.') 我得到： ['This is one sentence ${w_{1},', '', ',w_{i}}$', ' This is another sentence', ' '] 为了不检测到$中的“.”，有人知道如何改进它

我有以下字符串：

string = 'This is one sentence  ${w_{1},..,w_{i}}$. This is another sentence. '

现在，我想把它分成两句话

然而，当我这样做时：

string.split('.')

我得到：

['This is one sentence  ${w_{1},',
 '',
 ',w_{i}}$',
 ' This is another sentence',
 ' ']

为了不检测到

中的“.”，有人知道如何改进它吗

另外，您将如何进行此操作：

string2 = 'This is one sentence  ${w_{1},..,w_{i}}$! This is another sentence. Is this a sentence? Maybe !  '

编辑1:

预期的产出将是：

对于字符串1：

['This is one sentence  ${w_{1},..,w_{i}}$','This is another sentence']

对于字符串2：

['This is one sentence  ${w_{1},..,w_{i}}$','This is another sentence', 'Is this a sentence', 'Maybe !  ']

使用“.”（在后面加空格。）因为这只在句子结束时存在，而不是句子中间

string = 'This is one sentence  ${w_{1},..,w_{i}}$. This is another sentence. '

string.split('. ')

这将返回：

['这是一个句子${w{1}，…，w{i}}$'，'这是另一个句子'，'']

对于更一般的情况，您可以使用

re.split

如下：

重新导入
mystr='这是一个句子${w{1}，…，w{i}$'。这是另一句话。”
关于拆分（“[.！？]”\s{1，}”，mystr）
#[“这是一个句子，${w{1}，…，w{i}}$'，“这是另一个句子，”]
str2='这是一个句子${w{1}，…，w{i}$！这是另一句话。这是一个句子吗？也许吧
关于拆分（“[.！？]”\s{1，}”，str2）
[“这是一个句子，${w{1}，…，w{i}}$'，“这是另一个句子”，“这是一个句子吗”，“可能”，”

括号中的字符是您选择的标点符号，您在

\s{1，}

末尾至少添加了一个空格，以忽略其他没有空格的

。这也将处理感叹号的情况

这里有一种（有点老套的）找回标点符号的方法

punct=re.findall（“[.！？]”\s{1，}”，str2）
['! ', '. ', '? ', '!  ']
sent=[x+y代表x，y在zip中（重新拆分（“[！？]”）\s{1，}，str2，punt）]
发送
[“这是一个句子，${w{1}，…，w{i}$！”，“这是另一个句子，，“这是一个句子吗？”，“可能吧！”]

您可以将

re.findall

与交替模式一起使用。要确保句子以非空白开头和结尾，请在开头使用正向前视模式，在结尾使用正向后视模式：

re.findall(r'((?=[^.!?\s])(?:$.*?\$|[^.!?])*(?<=[^.!?\s]))\s*[.!?]', string)

对于第二个字符串：

['This is one sentence  ${w_{1},', ',w_{i}}$', 'This is another sentence', 'Is this a sentence', 'Maybe']

你应该考虑的是什么？你应该考虑的是，在胶乳中，适当的省略号是<代码> \LDOS/<代码>，而不是<代码>…>代码>。你将代码< > <代码>作为你的定界符，这就是为什么它会在每个<代码>中分割你的字符串。< /代码>它发现，不管它在字符串内的上下文。@ ZeChanoLes是的，当然。我会改变的。然而，问题仍然存在，因为我还有其他情况，我不能简单地替换它。好主意！但是如果句子的格式不好呢？也许你可以忽略

$$

之间的所有内容？我不确定。你可以确保所有的句子都有正确的格式，这样当一个句子以a结尾时，在下一个句子开始之前有一个空格。这样就永远不会有错误的格式。这对你的代码是可能的吗？非常感谢你的回答，尤其是你上次的破解！真的很酷+非常好！非常感谢你的回答！

['This is one sentence  ${w_{1},', ',w_{i}}$', 'This is another sentence', 'Is this a sentence', 'Maybe']