Python 要在文本引用中识别的正则表达式模式是什么；（作者姓名，年份）和#x27；？_Python_Regex_Pandas_Dataframe_Nlp

Python 要在文本引用中识别的正则表达式模式是什么；（作者姓名，年份）和#x27；？

python regex pandas dataframe nlp

Python 要在文本引用中识别的正则表达式模式是什么；（作者姓名，年份）和#x27；？,python,regex,pandas,dataframe,nlp,Python,Regex,Pandas,Dataframe,Nlp,我已将标记化句子列表转换为数据帧。现在，我需要过滤包含引用的行（句子）数据帧示例： sentences 1 This is my house 2 This is clear water(World Health organisation, 2018). 3 This house was built in 2000 4 According to me (Sundar, 2015)it is good. 预期产出： sentences 1 This is clear wa

我已将标记化句子列表转换为数据帧。现在，我需要过滤包含引用的行（句子）

数据帧示例：

   sentences
1  This is my house
2  This is clear water(World Health organisation, 2018).
3  This house was built in 2000 
4  According to me (Sundar, 2015)it is good.

预期产出：

   sentences
1  This is clear water(World Health organisation, 2018).
2  According to me (Sundar, 2015)it is good.

我一直在以不同的模式使用下面的代码，r'[（]\w+，\d{4}[）]，r'[（\w+\s+，\d{4}]

你可以试试：

print(df[df['sentences'].str.contains(r'\d{4}\)', regex = True)])

或：

两项产出：

                                               sentences
2  This is clear water(World Health organisation, 2018).
4              According to me (Sundar, 2015)it is good.

很高兴它能帮助你！你能吗？

print(df[df['sentences'].str.contains(r'\w.+\(\w.+\d{4}\)', regex = True)])

                                               sentences
2  This is clear water(World Health organisation, 2018).
4              According to me (Sundar, 2015)it is good.