Python 如何使用NLTK查找csv文件中特定单词的频率分布

Python 如何使用NLTK查找csv文件中特定单词的频率分布,python,nltk,Python,Nltk,我刚刚开始使用python和nltk,尝试从csv文件中读取记录,并确定所有记录中特定单词的频率。我可以这样做: with f: reader = csv.reader(f) # Skip the header next(reader) for row in reader: note = row[4] tokens = [t for t in note.split()] # Calculate row fre

我刚刚开始使用python和nltk,尝试从csv文件中读取记录,并确定所有记录中特定单词的频率。我可以这样做:

with f:
    reader = csv.reader(f)

    # Skip the header
    next(reader)

    for row in reader:
        note = row[4]
        tokens = [t for t in note.split()] 

        # Calculate row frequency distribution
        freq = nltk.FreqDist(tokens) 
        for key,val in freq.items(): 
            print (str(key) + ':' + str(val))

        # Plot the results
        freq.plot(20, cumulative=False)

我不知道如何修改它,以使频率跨越所有记录,并且只包括我感兴趣的单词。如果这是一个非常简单的问题,请道歉。

您可以在循环外部定义计数器
freq\u all=nltk.FreqDist()
,然后在每行更新它
freq\u all.update(令牌)

with f:
    reader = csv.reader(f)

    # Skip the header
    next(reader)
    freq_all = nltk.FreqDist()

    for row in reader:
        note = row[4]
        tokens = [t for t in note.split()] 

        # Calculate raw frequency distribution
        freq = nltk.FreqDist(tokens) 
        freq_all.update(tokens)
        for key,val in freq.items(): 
            print (str(key) + ':' + str(val))

        # Plot the results
        freq.plot(20, cumulative=False)

    # Plot the overall results
    freq_all.plot(20, cumulative=False)