Python 在发生特定故障时使用ntlk汇总属性_Python_Pandas_Dictionary_Nltk

Python 在发生特定故障时使用ntlk汇总属性

python pandas dictionary

Python 在发生特定故障时使用ntlk汇总属性,python,pandas,dictionary,nltk,Python,Pandas,Dictionary,Nltk,我是python新手，请帮助我）我有一个如下所示的数据帧：我设法计算了代码中每个二元内存的所有出现次数，但我还需要总结这个二元内存所包含的其他值尝试使用自定义词典迭代数据帧，但遇到了很多问题 import nltk nltk.download('averaged_perceptron_tagger') nltk.download('punkt') counts = collections.Counter() for sent in df["Search term"]: word

我是python新手，请帮助我）我有一个如下所示的数据帧：

我设法计算了代码中每个二元内存的所有出现次数，但我还需要总结这个二元内存所包含的其他值

尝试使用自定义词典迭代数据帧，但遇到了很多问题

import nltk

nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
counts = collections.Counter()
for sent in df["Search term"]:
    words = nltk.word_tokenize(sent)
    counts.update(nltk.bigrams(words))

counts.most_common(10)

我的尝试看起来像这样，但我无法通过bigrams进行迭代：

import nltk
nltk.download('punkt')
word_dictionary = dict()

for row in df.itertuples():
  words = nltk.word_tokenize(str(row[0]))
  print(nltk.bigrams(words))
  for bigram in nltk.bigrams(words):
    print(bigram)
    if bigram in word_dictionary:

        word_dictionary[bigram][0], word_dictionary[bigram][1] = (word_dictionary[bigram][0] + row[14]) , (word_dictionary[bigram][1] + row[15])  
    else:
         word_dictionary[bigram]= (row[14]) , (row[15])

print(word_dictionary)

最终结果应该是一个排序的字典（按第一个值，比如items），我不太关心格式：

转到（23,30）出发（10、15）

等等。

请更好地描述您所需的数据结构，并请包括您遇到的“问题”，以及您的代码目前正在执行的操作。这有助于我们确定差异所在。我尽了最大努力并添加了一张图片=）