Python 解析替换引号_Python_Regex_Parsing_Nlp_Quotes

Python 解析替换引号

python regex parsing nlp

Python 解析替换引号,python,regex,parsing,nlp,quotes,Python,Regex,Parsing,Nlp,Quotes,我正在尝试解析一个文本文件，以便在python中对其进行一些统计。为此，我想用记号替换一些标点符号。这种标记的一个例子是所有结束句子的标点（？变成）。我用正则表达式成功地做到了这一点。现在我正试图解析引号。因此，我认为，我需要一种方法来区分开盘报价和收盘报价。我正在逐行读取输入文件，我不能保证引号会平衡例如： "Death to the traitors!" cried the exasperated burghers. "Go along with you," growled the o

我正在尝试解析一个文本文件，以便在python中对其进行一些统计。为此，我想用记号替换一些标点符号。这种标记的一个例子是所有结束句子的标点（

？

变成

）。我用正则表达式成功地做到了这一点。现在我正试图解析引号。因此，我认为，我需要一种方法来区分开盘报价和收盘报价。我正在逐行读取输入文件，我不能保证引号会平衡

例如：

 "Death to the traitors!" cried the exasperated burghers.
 "Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome."

应该变成这样：

 [Open] Death to the traitors! [Close] cried the exasperated burghers.
 [Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close]

是否可以使用正则表达式执行此操作？有更简单/更好的方法吗？

您可以使用sub方法（模块re）：

import re

def replace_dbquote(render):
    return '[OPEN]' + render.group(0).replace('"', '') + '[CLOSE]'

string = '"Death to the traitors!" cried the exasperated burghers. "Go along with you", growled the officer.'
parser = re.sub('"[^"]*"', replace_dbquote, string)

print(parser)