Python Pandas：为元素列表的dataframe赋值（如果存在）_Python_Pandas

Python Pandas：为元素列表的dataframe赋值（如果存在）

python pandas

Python Pandas：为元素列表的dataframe赋值（如果存在）,python,pandas,Python,Pandas,我试图从列表中的元素赋值，如果它startswiththis substring到pandas data frame列代码： searchwords = ['harry','harry potter','lotr','secret garden'] l1 = [1, 2, 3,4,5] l2 = ['Harry Potter is a great book', 'Harry Potter is very famous', 'I enjoyed reading Harr

我试图从列表中的元素赋值，如果它

startswith

this substring到pandas data frame列

代码：

searchwords = ['harry','harry potter','lotr','secret garden']

l1 = [1, 2, 3,4,5]
l2 = ['Harry Potter is a great book',
      'Harry Potter is very famous',
      'I enjoyed reading Harry Potter series',
      'LOTR is also a great book along',
      'Have you read Secret Garden as well?'
]
df = pd.DataFrame({'id':l1,'text':l2})
df['text'] = df['text'].str.lower()

数据预览：

   id   text
0   1   harry potter is a great book
1   2   harry potter is very famous
2   3   i enjoyed reading harry potter series
3   4   lotr is also a great book along
4   5   have you read secret garden as well?

尝试：

df.loc[df['text'].str.startswith(tuple(searchwords)),'tags'] if (df['text'].str.startswith(tuple(searchwords))) == True else np.NaN

错误：

ValueError:序列的真值不明确。使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）。

我做错了什么？我想您可以在if/else逻辑中分配值

==True

寻找如下输出：

   id   text                                     tags
0   1   harry potter is a great book             harry;harry potter
1   2   harry potter is very famous              haryy;harry potter
2   3   i enjoyed reading harry potter series    NaN
3   4   lotr is also a great book along          lotr
4   5   have you read secret garden as well?     NaN

尝试使用

apply

：

df['tags']=df.text.apply(
lambda text:[searchword中searchword的searchword如果为text.startswith（searchword）]
)

这将为您提供包含相应标记列表的列

tags

，如下所示：

如果您更喜欢

nan

而不是空列表

[]

，则可以在第二步中执行此操作

df['tags'] = df.tags.apply(
    lambda current_tag: float('nan') if len(current_tag)==0 else current_tag
)

这是另一个版本

df["tags"] = df["text"].str.split(" ").apply(lambda x: list(set(x) & set(
        searchwords)))

如果您想要

Nan

而不是空列表，请添加以下内容

import numpy as np 

df['tags'] = df['tags'].apply(lambda x: np.nan if len(x)==0 else x)

谢谢你的快速回复。但是，此代码搜索整个句子。我只是在找句子的开头。出于这个原因，我正在使用“startswith”not contain。id 3和5应为

NaN