基于特定子字符串的python正则表达式提取
我有一个数据框,包含如下句子,但行数更多:基于特定子字符串的python正则表达式提取,python,regex,pandas,Python,Regex,Pandas,我有一个数据框,包含如下句子,但行数更多: data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]} 我想按以下方式摘录包含“五分钟”的句子: desired output: first part desired part
data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}
我想按以下方式摘录包含“五分钟”的句子:
desired output:
first part desired part
0 see you in five minutes.
1 NaN NaN
2 she goes to school in five minutes.
我正在使用以下代码,但它返回NaN:
data.text.str.extract(r"(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+")
data.text.str.extract(r“(?i)(?P.*)s(?P(?=5分钟\s))\w+\w+)
如果没有空格,则需要空格:
(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+
# ^^^
import pandas as pd
data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}
df = pd.DataFrame(data)
df2 = df.text.str.extract(r"(?i)(?P<before>.*?)(?=five minutes)(?P<after>.*)")
print(df2)
before after
0 see you in five minutes.
1 NaN NaN
2 she goes to school in five minutes.