Python 3.x 如何从熊猫中提取匹配模式后的所有文本?
我的数据帧是:Python 3.x 如何从熊猫中提取匹配模式后的所有文本?,python-3.x,pandas,Python 3.x,Pandas,我的数据帧是: name type 0 apple red fruit with red peel that is edible 1 orange thick peel that is bitter and used dried sometimes 我想从每一行中提取剥离后的所有文本,并创建一个单独的列 name type peel 0 ap
name type
0 apple red fruit with red peel that is edible
1 orange thick peel that is bitter and used dried sometimes
我想从每一行中提取剥离后的所有文本,并创建一个单独的列
name type peel
0 apple red fruit with red peel that is edible that is edible
1 orange thick peel is bitter and used dried is bitter and used dried
我正在尝试:
def get_peel(desc):
text = desc.split(' ')
for i,t in enumerate(text):
if t.lower() == 'peel':
return text[i:]
return 'not found'
df['peel'] = df['type'].apply(get_peel)
但我得到的结果是:
0 not found
1 not found
我做错了什么?使用正则表达式的str.extract
Ex:
df = pd.DataFrame({"name": ['apple', 'orange'], 'type': ['red fruit with red peel that is edible', 'thick peel that is bitter and used dried sometimes']})
df['peel'] = df['type'].str.extract(r"(?<=\bpeel\b)(.*)$")
print(df['peel'])
0 that is edible
1 that is bitter and used dried sometimes
Name: peel, dtype: object
使用带有正则表达式的str.extract
Ex:
df = pd.DataFrame({"name": ['apple', 'orange'], 'type': ['red fruit with red peel that is edible', 'thick peel that is bitter and used dried sometimes']})
df['peel'] = df['type'].str.extract(r"(?<=\bpeel\b)(.*)$")
print(df['peel'])
0 that is edible
1 that is bitter and used dried sometimes
Name: peel, dtype: object
你能试试下面的吗
df
创建:
df = pd.DataFrame({'name':['apple','orange'],
'type':['red fruit with red peel that is edible','thick peel that is bitter and used dried sometimes']})
添加新列的代码:
df['peel']=df['type'].replace(regex=True,to_replace=r'.*peel(.*)',value=r'\1')
你能试试下面的吗
df
创建:
df = pd.DataFrame({'name':['apple','orange'],
'type':['red fruit with red peel that is edible','thick peel that is bitter and used dried sometimes']})
添加新列的代码:
df['peel']=df['type'].replace(regex=True,to_replace=r'.*peel(.*)',value=r'\1')