Python 如果描述中包含列表中的短语，则不计算总分_Python_Pandas

Python 如果描述中包含列表中的短语，则不计算总分

python pandas

Python 如果描述中包含列表中的短语，则不计算总分,python,pandas,Python,Pandas,我有一长串（200000多个）短语： phrase_list = ['some word', 'another example', ...] 一个两列的数据帧，第一列有描述，第二列有一些分数 Description Score this sentence contains some word in it 6 some word is on my mind 3 re

我有一长串（200000多个）短语：

phrase_list = ['some word', 'another example', ...]

一个两列的数据帧，第一列有描述，第二列有一些分数

Description                                    Score
this sentence contains some word in it         6
some word is on my mind                        3
repeat another example of me                   2
this sentence has no matches                   100
another example with some word                 10

有300000多行。对于短语列表中的每个短语，如果在每一行中都找到了该短语，我希望得到总分数。因此，对于“某个单词”，分数为6+3+10=19。对于“另一个例子”，分数为2+10=12

到目前为止，我使用的代码工作正常，但速度非常慢：

phrase_score = []

for phrase in phrase_list:
    phrase_score.append([phrase, df['score'][df['description'].str.contains(phrase)].sum()])

我想返回pandas dataframe，在一列中包含短语，在第二列中包含分数（如果我有列表列表的话，这一部分很简单）。然而，我想要一种更快的方法来获取列表。

您可以使用字典理解来为短语列表中的每个短语生成分数

对于每个短语，它会创建数据帧中包含该短语的行的掩码。掩码是

df.Description.str.contains（短语）

。然后将此掩码应用于依次求和的分数，有效地

df.Score[mask].sum（）

在更详细地阅读了你的文章之后，我注意到与你的方法相似。然而，我相信字典理解可能比for循环更快。然而，根据我的测试，结果似乎相似。我不知道有没有更有效的解决方案不会导致多处理。

我想知道是否可以更快地标记单词和搜索标记短语？我不确定有没有简单的方法。

df = pd.DataFrame({'Description': ['this sentence contains some word in it', 
                                   'some word on my mind', 
                                   'repeat another word on my mind', 
                                   'this sentence has no matches', 
                                   'another example with some word'], 
                   'Score': [6, 3, 2, 100, 10]})

phrase_list = ['some word', 'another example']
scores = {phrase: df.Score[df.Description.str.contains(phrase)].sum() 
          for phrase in phrase_list}

>>> scores
{'another example': 10, 'some word': 19}