Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中编辑图形(Zipf';s定律)_Python_Python 3.x_Zipf - Fatal编程技术网

如何在Python中编辑图形(Zipf';s定律)

如何在Python中编辑图形(Zipf';s定律),python,python-3.x,zipf,Python,Python 3.x,Zipf,我需要帮助制作一个条形图,显示文件中十个最常见单词的频率。每根杆的旁边是第二根杆,其高度为齐夫定律预测的频率。(例如,假设最常见的单词出现100次。齐夫定律预测第二个最常见的单词出现大约50次(是最常见单词的一半),第三个最常见的单词出现大约33次(是最常见单词的三分之一),第四个最常见的单词出现大约25次。)(四分之一是最常见的),以此类推 该函数以文本文件的名称(作为字符串)作为输入 def zipf_graph(text_file): import string file

我需要帮助制作一个条形图,显示文件中十个最常见单词的频率。每根杆的旁边是第二根杆,其高度为齐夫定律预测的频率。(例如,假设最常见的单词出现100次。齐夫定律预测第二个最常见的单词出现大约50次(是最常见单词的一半),第三个最常见的单词出现大约33次(是最常见单词的三分之一),第四个最常见的单词出现大约25次。)(四分之一是最常见的),以此类推

该函数以文本文件的名称(作为字符串)作为输入

def zipf_graph(text_file):
    import string
    file = open(text_file, encoding = 'utf8')
    text = file.read()
    file.close()

    punc = string.punctuation + '’”—⎬⎪“⎫'
    new_text = text
    for char in punc:
        new_text = new_text.replace(char,'')
        new_text = new_text.lower()
    text_split = new_text.split()

    # Determines how many times each word appears in the file. 
    from collections import Counter
    word_and_freq = Counter(text_split)
    top_ten_words = word_and_freq.most_common(10)

    print(top_ten_words) 

    #graph info

    import numpy as np
    import matplotlib.pyplot as plt
    barWidth = 0.25
    bars1 = [1,2,3,4,5,6,7,8,9,10] # I want the top_ten_words here
    bars2 = [10,5,3.33,2.5,2,1.67,1.43,1.25,1.11,1] # Zipf Law freq here, numbers are just ex.

    r1 = np.arange(len(bars1))
    r2 = [x + barWidth for x in r1]

    plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Word')
    plt.bar(r2, bars2, color='#2d7f5e', width=barWidth, edgecolor='white', label='Zipf Law')
    plt.xlabel('group', fontweight='bold')
    plt.xticks([r + barWidth for r in range(len(bars1))], ['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'word10']) 
    # Want words to print below bars
    plt.legend()
    plt.show()

zipf_graph('gatsby.txt')
代码以这种格式打印前十个单词及其频率(例如,我使用了《了不起的盖茨比》一书):

Matplotlib。 这里有一个演示

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()

此解决方案适合我。请注意:

  • 我更喜欢用熊猫来收集我的数据集
  • 你需要一个函数,根据齐夫定律返回期望的频率。我锚定在最频繁的,但另一种选择是锚定总数(前10名)

听起来像是家庭作业问题(询问答案),您尝试了什么?请不要通过破坏您的帖子来为其他人做更多的工作。通过在Stack Exchange网络上发布,您已授予Stack Exchange在下分发该内容的不可撤销的权利(即,无论您未来的选择如何)。根据堆栈交换策略,该帖子的非故意破坏版本是已分发的版本。因此,任何故意破坏行为都将被还原。如果您想了解有关删除帖子的更多信息,请参阅:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
y_pos = np.arange(len(objects))
performance = [10,8,6,4,2,1]

plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()
import pandas as pd

def zipf_frequency(most_common_count, n=10):
    zipf_law = []
    for x in range(1, n+1):
        zipf_law.append(most_common_count/(x))
    return zipf_law

top_ten_words_df = pd.DataFrame(top_ten_words, columns=['word', 'actual count'])
top_ten_words_df['expected zipf frequency'] = zipf_frequency(top_ten_words_df.loc[0, 'actual count'])

fig, ax = plt.subplots()
top_ten_words_df.plot(kind='bar', ax=ax)
ax.set_xticklabels(top_ten_words_df['word'])
fig.tight_layout()