Python Pandas:为元素列表的dataframe赋值(如果存在)
我试图从列表中的元素赋值,如果它Python Pandas:为元素列表的dataframe赋值(如果存在),python,pandas,Python,Pandas,我试图从列表中的元素赋值,如果它startswiththis substring到pandas data frame列 代码: searchwords = ['harry','harry potter','lotr','secret garden'] l1 = [1, 2, 3,4,5] l2 = ['Harry Potter is a great book', 'Harry Potter is very famous', 'I enjoyed reading Harr
startswith
this substring到pandas data frame列
代码:
searchwords = ['harry','harry potter','lotr','secret garden']
l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
'Harry Potter is very famous',
'I enjoyed reading Harry Potter series',
'LOTR is also a great book along',
'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()
数据预览:
id text
0 1 harry potter is a great book
1 2 harry potter is very famous
2 3 i enjoyed reading harry potter series
3 4 lotr is also a great book along
4 5 have you read secret garden as well?
尝试:
df.loc[df['text'].str.startswith(tuple(searchwords)),'tags'] if (df['text'].str.startswith(tuple(searchwords))) == True else np.NaN
错误:ValueError:序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any()或a.all()。
我做错了什么?我想您可以在if/else逻辑中分配值==True
寻找如下输出:
id text tags
0 1 harry potter is a great book harry;harry potter
1 2 harry potter is very famous haryy;harry potter
2 3 i enjoyed reading harry potter series NaN
3 4 lotr is also a great book along lotr
4 5 have you read secret garden as well? NaN
尝试使用
apply
:
df['tags']=df.text.apply(
lambda text:[searchword中searchword的searchword如果为text.startswith(searchword)]
)
这将为您提供包含相应标记列表的列tags
,如下所示:
如果您更喜欢nan
而不是空列表[]
,则可以在第二步中执行此操作
df['tags'] = df.tags.apply(
lambda current_tag: float('nan') if len(current_tag)==0 else current_tag
)
这是另一个版本
df["tags"] = df["text"].str.split(" ").apply(lambda x: list(set(x) & set(
searchwords)))
如果您想要Nan
而不是空列表,请添加以下内容
import numpy as np
df['tags'] = df['tags'].apply(lambda x: np.nan if len(x)==0 else x)
谢谢你的快速回复。但是,此代码搜索整个句子。我只是在找句子的开头。出于这个原因,我正在使用“startswith”not contain。id 3和5应为
NaN