如何在Python中编辑图形（Zipf'；s定律）_Python_Python 3.x_Zipf

如何在Python中编辑图形（Zipf'；s定律）

python python-3.x

如何在Python中编辑图形（Zipf'；s定律）,python,python-3.x,zipf,Python,Python 3.x,Zipf,我需要帮助制作一个条形图，显示文件中十个最常见单词的频率。每根杆的旁边是第二根杆，其高度为齐夫定律预测的频率。（例如，假设最常见的单词出现100次。齐夫定律预测第二个最常见的单词出现大约50次（是最常见单词的一半），第三个最常见的单词出现大约33次（是最常见单词的三分之一），第四个最常见的单词出现大约25次。）（四分之一是最常见的），以此类推该函数以文本文件的名称（作为字符串）作为输入 def zipf_graph(text_file): import string file

我需要帮助制作一个条形图，显示文件中十个最常见单词的频率。每根杆的旁边是第二根杆，其高度为齐夫定律预测的频率。（例如，假设最常见的单词出现100次。齐夫定律预测第二个最常见的单词出现大约50次（是最常见单词的一半），第三个最常见的单词出现大约33次（是最常见单词的三分之一），第四个最常见的单词出现大约25次。）（四分之一是最常见的），以此类推

该函数以文本文件的名称（作为字符串）作为输入

def zipf_graph(text_file):
    import string
    file = open(text_file, encoding = 'utf8')
    text = file.read()
    file.close()

    punc = string.punctuation + '’”—⎬⎪“⎫'
    new_text = text
    for char in punc:
        new_text = new_text.replace(char,'')
        new_text = new_text.lower()
    text_split = new_text.split()

    # Determines how many times each word appears in the file. 
    from collections import Counter
    word_and_freq = Counter(text_split)
    top_ten_words = word_and_freq.most_common(10)

    print(top_ten_words) 

    #graph info

    import numpy as np
    import matplotlib.pyplot as plt
    barWidth = 0.25
    bars1 = [1,2,3,4,5,6,7,8,9,10] # I want the top_ten_words here
    bars2 = [10,5,3.33,2.5,2,1.67,1.43,1.25,1.11,1] # Zipf Law freq here, numbers are just ex.

    r1 = np.arange(len(bars1))
    r2 = [x + barWidth for x in r1]

    plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Word')
    plt.bar(r2, bars2, color='#2d7f5e', width=barWidth, edgecolor='white', label='Zipf Law')
    plt.xlabel('group', fontweight='bold')
    plt.xticks([r + barWidth for r in range(len(bars1))], ['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'word10']) 
    # Want words to print below bars
    plt.legend()
    plt.show()

zipf_graph('gatsby.txt')

代码以这种格式打印前十个单词及其频率（例如，我使用了《了不起的盖茨比》一书）：

Matplotlib。这里有一个演示

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()

此解决方案适合我。请注意：

我更喜欢用熊猫来收集我的数据集
你需要一个函数，根据齐夫定律返回期望的频率。我锚定在最频繁的，但另一种选择是锚定总数（前10名）

听起来像是家庭作业问题（询问答案），您尝试了什么？请不要通过破坏您的帖子来为其他人做更多的工作。通过在Stack Exchange网络上发布，您已授予Stack Exchange在下分发该内容的不可撤销的权利（即，无论您未来的选择如何）。根据堆栈交换策略，该帖子的非故意破坏版本是已分发的版本。因此，任何故意破坏行为都将被还原。如果您想了解有关删除帖子的更多信息，请参阅：

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()

import pandas as pd

def zipf_frequency(most_common_count, n=10):
    zipf_law = []
    for x in range(1, n+1):
        zipf_law.append(most_common_count/(x))
    return zipf_law

top_ten_words_df = pd.DataFrame(top_ten_words, columns=['word', 'actual count'])
top_ten_words_df['expected zipf frequency'] = zipf_frequency(top_ten_words_df.loc[0, 'actual count'])

fig, ax = plt.subplots()
top_ten_words_df.plot(kind='bar', ax=ax)
ax.set_xticklabels(top_ten_words_df['word'])
fig.tight_layout()