Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python的文件中字符串的字母频率_Python_Python 2.7_Python 3.x - Fatal编程技术网

使用python的文件中字符串的字母频率

使用python的文件中字符串的字母频率,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,示例文本文件: airport, 2007, 175702 airport, 2008, 173294 request, 2005, 646179 request, 2006, 677820 request, 2007, 697645 request, 2008, 795265 wandered, 2005, 83769 wandered, 2006, 87688 wandered, 2007, 108634 wandered, 2008, 171015 此文本文件包含一个单词(例如:“air

示例文本文件:

airport, 2007, 175702
airport, 2008, 173294
request, 2005, 646179
request, 2006, 677820
request, 2007, 697645
request, 2008, 795265
wandered, 2005, 83769
wandered, 2006, 87688
wandered, 2007, 108634
wandered, 2008, 171015
此文本文件包含一个单词(例如:“airport”);一年以及该词在该年被使用的次数。我所做的是创建了一个类,该类将单词作为一个键,并具有该年的年份和事件。现在我要做的是找到从a到z的每个字母的出现次数。这是通过找出字母表中每个字母在单词中出现的次数,然后将该数字乘以该单词出现的总次数,再加上其他单词出现的次数

例如:

“a”;在漫游和机场中出现一次,因此我们得到1(83769+87688+108634+171015)=451106个漫游中“a”的总出现次数,以及1(175702+173294)=348996个机场中“a”的总出现次数,总共是字母a出现次数的800102倍。为了找到“a”出现的频率,我们将800102除以所有字母的总数,即25770183,得到字母“a”的频率为0.013047“b”和“c”将为0.0,因为当前没有单词使用这些字母

到目前为止,这就是我所拥有的,但它根本不起作用,我没有想法:

from wordData import*

def letterFreq(words):
    totalLetters = 0
    letterDict = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,
                  'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}

    for word in words:
        totalLetters += totalOccurances(word,words)*len(word)
        for char in range(0,len(word)):
            for letter in letterDict:
                if letter == word[char]:
                    for year in words[word]:
                        letterDict[letter] += year.count
    for letters in letterDict:
        letterDict[letters] /= totalLetters


    print(letterDict)

def main():
    filename = "data/very_short.csv"
    words = readWordFile(filename)
    letterFreq(words)

    if __name__ == '__main__':
        main()

如果要计算文件中所有字母的数量,请使用dict:

要获得总数,只需乘以它出现的次数:

from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        word, year, count = line.split()
        c.update(word*int(count))

 print(c["a"] / float(sum(c.values())))

我在考虑从a-z列表,并用它来比较这个单词的所有字母。“its[sic]not working at all”到底是什么意思?而
readWordFile
在哪里?我有一个单独的.py文件,它有一个类和函数,在这个.py文件中被调用。我的程序正确地获得文件中所有单词出现的字母总数,什么不起作用查找文件中每个字母的频率“不起作用”是什么意思?你以为会发生什么,结果又发生了什么?错误(提供完整的回溯)?意外输出(提供输入以及预期和实际输出)?
from collections import Counter
c = Counter()
with open("input.txt") as f:
    for line in f:
        word, year, count = line.split()
        c.update(word*int(count))

 print(c["a"] / float(sum(c.values())))