Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在数据框中查找句子中的多个单词并转换为分数总和_Python_Python 3.x_Pandas_Nltk_Data Conversion - Fatal编程技术网

Python 在数据框中查找句子中的多个单词并转换为分数总和

Python 在数据框中查找句子中的多个单词并转换为分数总和,python,python-3.x,pandas,nltk,data-conversion,Python,Python 3.x,Pandas,Nltk,Data Conversion,我有以下数据帧: Sentence 0 Cat is a big lion 1 Dogs are descendants of wolf 2 Elephants are pachyderm 3 Pachyderm animals include rhino, Elephants and hippopotamus 我需要创建一个python代码,查看上面句子中的单词,并根据以下不同的数据帧计算每个单词的分数总和 Name Score cat

我有以下数据帧:

    Sentence
0   Cat is a big lion
1   Dogs are descendants of wolf
2   Elephants are pachyderm
3   Pachyderm animals include rhino, Elephants and hippopotamus
我需要创建一个python代码,查看上面句子中的单词,并根据以下不同的数据帧计算每个单词的分数总和

Name          Score
cat             1
dog             2
wolf            2
lion            3
elephants       5
rhino           4
hippopotamus    5
例如,对于第0行,分数将为1(猫)+3(狮子)=4

我希望创建一个如下所示的输出

    Sentence                                                      Value
0   Cat is a big lion                                                4
1   Dogs are descendants of wolf                                     4
2   Elephants are pachyderm                                          5
3   Pachyderm animals include rhino, Elephants and hippopotamus      14

首先,您可以尝试基于
拆分
映射
的方法,然后使用
groupby
计算分数

v = df1['Sentence'].str.split(r'[\s.!?,]+', expand=True).stack().str.lower()
df1['Value'] = (
    v.map(df2.set_index('Name')['Score'])
     .sum(level=0)
     .fillna(0, downcast='infer'))

nltk
你可能需要下载一些东西

import nltk

nltk.download('punkt')
然后设置词干和标记化

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer()
创建一本方便的词典

m = dict(zip(map(ps.stem, scores.Name), scores.Score))
并生成分数

def f(s):
  return sum(filter(None, map(m.get, map(ps.stem, word_tokenize(s)))))

df.assign(Score=[*map(f, df.Sentence)])

                                            Sentence  Score
0                                  Cat is a big lion      4
1                       Dogs are descendants of wolf      4
2                            Elephants are pachyderm      5
3  Pachyderm animals include rhino, Elephants and...     14

尝试将
findall
re
re.I

df.Sentence.str.findall(df1.Name.str.cat(sep='|'),flags=re.I).\
   map(lambda x : sum([df1.loc[df1.Name==str.lower(y),'Score' ].values for y in x])[0])
Out[49]: 
0     4
1     4
2     5
3    14
Name: Sentence, dtype: int64

我终于可以继续了。
df.Sentence.str.findall(df1.Name.str.cat(sep='|'),flags=re.I).\
   map(lambda x : sum([df1.loc[df1.Name==str.lower(y),'Score' ].values for y in x])[0])
Out[49]: 
0     4
1     4
2     5
3    14
Name: Sentence, dtype: int64