Python 要在文本引用中识别的正则表达式模式是什么;(作者姓名,年份)和#x27;?
我已将标记化句子列表转换为数据帧。现在,我需要过滤包含引用的行(句子) 数据帧示例:Python 要在文本引用中识别的正则表达式模式是什么;(作者姓名,年份)和#x27;?,python,regex,pandas,dataframe,nlp,Python,Regex,Pandas,Dataframe,Nlp,我已将标记化句子列表转换为数据帧。现在,我需要过滤包含引用的行(句子) 数据帧示例: sentences 1 This is my house 2 This is clear water(World Health organisation, 2018). 3 This house was built in 2000 4 According to me (Sundar, 2015)it is good. 预期产出: sentences 1 This is clear wa
sentences
1 This is my house
2 This is clear water(World Health organisation, 2018).
3 This house was built in 2000
4 According to me (Sundar, 2015)it is good.
预期产出:
sentences
1 This is clear water(World Health organisation, 2018).
2 According to me (Sundar, 2015)it is good.
我一直在以不同的模式使用下面的代码,r'[(]\w+,\d{4}[)],r'[(\w+\s+,\d{4}]
你可以试试:
print(df[df['sentences'].str.contains(r'\d{4}\)', regex = True)])
或:
两项产出:
sentences
2 This is clear water(World Health organisation, 2018).
4 According to me (Sundar, 2015)it is good.
很高兴它能帮助你!你能吗?
print(df[df['sentences'].str.contains(r'\w.+\(\w.+\d{4}\)', regex = True)])
sentences
2 This is clear water(World Health organisation, 2018).
4 According to me (Sundar, 2015)it is good.