在Python中,如何计算一个单词在特定类别的列中重复的次数?
所以我已经在这个问题上纠缠了好几天了,如果有人帮助我,我将不胜感激。 我有一个dataframe,列是:在Python中,如何计算一个单词在特定类别的列中重复的次数?,python,pandas,dataframe,Python,Pandas,Dataframe,所以我已经在这个问题上纠缠了好几天了,如果有人帮助我,我将不胜感激。 我有一个dataframe,列是: # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PhraseId 93636 non-null int64 1 SentenceId 93636 non-null int64 2 Phrase 93636 non-null
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PhraseId 93636 non-null int64
1 SentenceId 93636 non-null int64
2 Phrase 93636 non-null object
3 Sentiment 93636 non-null int64
情绪是从0到4,这基本上是从好到坏的评分。我添加了两列可能会有所帮助:每个短语的单词数,并将每个短语拆分为一个列表,该列表包含短语中的单词
我想做的是创建4个条形图(每个情绪对应一个条形图),显示该情绪中重复次数最多的15个单词。x轴将是该情绪中重复出现的前15个词
下面,我粘贴了一个我写的代码,它计算每个词在每个情绪中重复的次数。这可能是条形图所需要的
样本数据:
PhraseId SentenceId Phrase Sentiment SplitPhrase NumOfWords
44723 75358 3866 Build some robots... 0 [Build, some, robots...] 52
counters = {}
for Sentiment in train_data['Sentiment'].unique():
counters[Sentiment] = Counter()
indices = (train_data['Sentiment'] == Sentiment)
for Phrase in train_data['SplitPhrase'][indices]:
counters[Sentiment].update(Phrase)
print(counters)
{2: Counter({'the': 28041, ',': 25046, 'a': 19962, 'of': 19376, 'and': 19052, 'to': 13470, '.': 10505, "'s": 10290, 'in': 8108, 'is': 8012, 'that': 7276, 'it': 6176, 'as': 5027, 'with': 4474, 'for': 4362, 'its': 4159, 'film': 3933......}),
3: Counter({'the': 28041, ',': 25046, 'a': 19962,.....
要计算每个情绪的单词重复次数:
PhraseId SentenceId Phrase Sentiment SplitPhrase NumOfWords
44723 75358 3866 Build some robots... 0 [Build, some, robots...] 52
counters = {}
for Sentiment in train_data['Sentiment'].unique():
counters[Sentiment] = Counter()
indices = (train_data['Sentiment'] == Sentiment)
for Phrase in train_data['SplitPhrase'][indices]:
counters[Sentiment].update(Phrase)
print(counters)
{2: Counter({'the': 28041, ',': 25046, 'a': 19962, 'of': 19376, 'and': 19052, 'to': 13470, '.': 10505, "'s": 10290, 'in': 8108, 'is': 8012, 'that': 7276, 'it': 6176, 'as': 5027, 'with': 4474, 'for': 4362, 'its': 4159, 'film': 3933......}),
3: Counter({'the': 28041, ',': 25046, 'a': 19962,.....
样本输出:
PhraseId SentenceId Phrase Sentiment SplitPhrase NumOfWords
44723 75358 3866 Build some robots... 0 [Build, some, robots...] 52
counters = {}
for Sentiment in train_data['Sentiment'].unique():
counters[Sentiment] = Counter()
indices = (train_data['Sentiment'] == Sentiment)
for Phrase in train_data['SplitPhrase'][indices]:
counters[Sentiment].update(Phrase)
print(counters)
{2: Counter({'the': 28041, ',': 25046, 'a': 19962, 'of': 19376, 'and': 19052, 'to': 13470, '.': 10505, "'s": 10290, 'in': 8108, 'is': 8012, 'that': 7276, 'it': 6176, 'as': 5027, 'with': 4474, 'for': 4362, 'its': 4159, 'film': 3933......}),
3: Counter({'the': 28041, ',': 25046, 'a': 19962,.....
你的解释有道理;但是,请包括示例数据,而不仅仅是
df.info()
的输出。请查看此链接,了解如何询问一个好的pandas
问题:好的,谢谢,我附上了示例数据的图像无图像!请阅读我共享的链接:)我又编辑了一次,希望这样更好。我还稍微改变了我的问题,因为我找到了一种方法来计算每个词在每个情绪中重复了多少次。我现在需要基于此创建一个条形图。