使用python的文件中字符串的字母频率_Python_Python 2.7_Python 3.x

使用python的文件中字符串的字母频率

python python-2.7 python-3.x

使用python的文件中字符串的字母频率,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,示例文本文件： airport, 2007, 175702 airport, 2008, 173294 request, 2005, 646179 request, 2006, 677820 request, 2007, 697645 request, 2008, 795265 wandered, 2005, 83769 wandered, 2006, 87688 wandered, 2007, 108634 wandered, 2008, 171015 此文本文件包含一个单词（例如：“air

示例文本文件：

airport, 2007, 175702
airport, 2008, 173294
request, 2005, 646179
request, 2006, 677820
request, 2007, 697645
request, 2008, 795265
wandered, 2005, 83769
wandered, 2006, 87688
wandered, 2007, 108634
wandered, 2008, 171015

此文本文件包含一个单词（例如：“airport”）；一年以及该词在该年被使用的次数。我所做的是创建了一个类，该类将单词作为一个键，并具有该年的年份和事件。现在我要做的是找到从a到z的每个字母的出现次数。这是通过找出字母表中每个字母在单词中出现的次数，然后将该数字乘以该单词出现的总次数，再加上其他单词出现的次数

例如：

“a”；在漫游和机场中出现一次，因此我们得到1（83769+87688+108634+171015）=451106个漫游中“a”的总出现次数，以及1（175702+173294）=348996个机场中“a”的总出现次数，总共是字母a出现次数的800102倍。为了找到“a”出现的频率，我们将800102除以所有字母的总数，即25770183，得到字母“a”的频率为0.013047“b”和“c”将为0.0，因为当前没有单词使用这些字母

到目前为止，这就是我所拥有的，但它根本不起作用，我没有想法：

from wordData import*

def letterFreq(words):
    totalLetters = 0
    letterDict = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,
                  'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}

    for word in words:
        totalLetters += totalOccurances(word,words)*len(word)
        for char in range(0,len(word)):
            for letter in letterDict:
                if letter == word[char]:
                    for year in words[word]:
                        letterDict[letter] += year.count
    for letters in letterDict:
        letterDict[letters] /= totalLetters


    print(letterDict)

def main():
    filename = "data/very_short.csv"
    words = readWordFile(filename)
    letterFreq(words)

    if __name__ == '__main__':
        main()

如果要计算文件中所有字母的数量，请使用dict：

要获得总数，只需乘以它出现的次数：

from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        word, year, count = line.split()
        c.update(word*int(count))

 print(c["a"] / float(sum(c.values())))

我在考虑从a-z列表，并用它来比较这个单词的所有字母。“its[sic]not working at all”到底是什么意思？而

readWordFile

在哪里？我有一个单独的.py文件，它有一个类和函数，在这个.py文件中被调用。我的程序正确地获得文件中所有单词出现的字母总数，什么不起作用查找文件中每个字母的频率“不起作用”是什么意思？你以为会发生什么，结果又发生了什么？错误（提供完整的回溯）？意外输出（提供输入以及预期和实际输出）？

from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        word, year, count = line.split()
        c.update(word*int(count))

 print(c["a"] / float(sum(c.values())))