Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
基于特定子字符串的python正则表达式提取_Python_Regex_Pandas - Fatal编程技术网

基于特定子字符串的python正则表达式提取

基于特定子字符串的python正则表达式提取,python,regex,pandas,Python,Regex,Pandas,我有一个数据框,包含如下句子,但行数更多: data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]} 我想按以下方式摘录包含“五分钟”的句子: desired output: first part desired part

我有一个数据框,包含如下句子,但行数更多:

data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}
我想按以下方式摘录包含“五分钟”的句子:

desired output:

     first part              desired part     
0    see you in              five minutes.
1    NaN                     NaN
2    she goes to school in   five minutes.
我正在使用以下代码,但它返回NaN:

data.text.str.extract(r"(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+")    
data.text.str.extract(r“(?i)(?P.*)s(?P(?=5分钟\s))\w+\w+)

如果没有空格,则需要空格:

(?i)(?P<before>.*)\s(?P<minutes>(?=five minutes\s))\w+ \w+
#                                              ^^^
import pandas as pd

data= {"text":["see you in five minutes.", "she is my friend.", "she goes to school in five minutes."]}

df = pd.DataFrame(data)
df2 = df.text.str.extract(r"(?i)(?P<before>.*?)(?=five minutes)(?P<after>.*)")
print(df2)
                   before          after
0             see you in   five minutes.
1                     NaN            NaN
2  she goes to school in   five minutes.