List 数据帧中单词的频率_List_Pandas_Dataframe

List 数据帧中单词的频率

list pandas dataframe

List 数据帧中单词的频率,list,pandas,dataframe,List,Pandas,Dataframe,我有一个pandas数据框，其中包含“review”列中的单词列表。我需要找出单词在复习栏中出现的频率 id sentiment review 0 5814_8 1 [stuff, going, moment, mj, 've, started, liste... 1 2381_9 1 [\the, classic, war, worlds\, '', timothy, hin... 2 7759_3 0 [film, starts, manager, ni

我有一个pandas数据框，其中包含“review”列中的单词列表。我需要找出单词在复习栏中出现的频率

id  sentiment   review
0   5814_8  1   [stuff, going, moment, mj, 've, started, liste...
1   2381_9  1   [\the, classic, war, worlds\, '', timothy, hin...
2   7759_3  0   [film, starts, manager, nicholas, bell, giving...
3   3630_4  0   [must, assumed, praised, film, \the, greatest,...
4   9495_8  1   [superbly, trashy, wondrously, unpretentious, ...
5   8196_8  1   [dont, know, people, think, bad, movie, got, p...

我尝试过使用计数器函数，但它显示“不可损坏列表”为错误。

怎么做

如果我理解正确，那么您需要一组“review”列中提到的所有单词，并获取该列中所有单元格的单词数

那么解决方案就是一行：

import pandas
from collections import Counter
import itertools 

df = pandas.DataFrame({'id': ['5814_8', '2381_9', '7759_3', '3630_4', '9495_8', '8196_8'], 'review':
    [['stuff', 'going', 'moment', 'mj', 've', 'started'],
    ['the', 'classic', 'war', 'worlds', '', 'timothy'],
    ['film', 'starts', 'manager', 'nicholas', 'bell'],
    ['must', 'assumed', 'praised', 'film', 'the'],
    ['superbly', 'trashy', 'wondrously', 'unpretentious'],
    ['dont', 'know', 'people', 'think', 'bad', 'movie', 'got']]})

Counter(itertools.chain(*df['review'].tolist()))

结果: 计数器（{''：1， “假定”：1， “坏”：1， "钟":1,， "经典":1,， “不要”：1， "电影":二,， "走":1,， “得到”：1， “知道”：1， “经理”：1， “mj”：1， "时刻":1,， "电影":一,， “必须”：1，尼古拉斯：1， "人":一,， "表扬":一,， “开始”：1， “开始”：1， "东西":1,， “超级”：1， "第一":二,， "想":1,， “提摩太”：1， “垃圾”：1， "朴实无华":一,， “ve”：1， "战争":一,， "奇妙":1,，

“世界”：1}）

您可以在计数器内使用列表：

Counter([i for s in df.review for i in s])

一篇评论或所有评论中的单词频率？举一个例子，你如何期望一个输出。一个单词在所有评论中的频率，比如每个单词在数据框的“review”列中出现的次数。谢谢Viktor，我问的问题是解决问题的步骤之一。你的解决方案有帮助。