Python 确定相对字母频率_Python_Python 3.x

Python 确定相对字母频率

python python-3.x

Python 确定相对字母频率,python,python-3.x,Python,Python 3.x,我需要创建一个函数，该函数将文本文件作为输入，并返回一个大小为26的向量，其频率以每个字符a到z的百分比为单位。这必须对大小写不敏感。应忽略所有其他字母如å和符号我试着用这里的一些答案，特别是雅各布的答案这是我目前的代码： def letterFrequency(filename): #f: the text file is converted to lowercase f=filename.lower() #n: the sum of the letters in

我需要创建一个函数，该函数将文本文件作为输入，并返回一个大小为26的向量，其频率以每个字符a到z的百分比为单位。这必须对大小写不敏感。应忽略所有其他字母如å和符号

我试着用这里的一些答案，特别是雅各布的答案

这是我目前的代码：

def letterFrequency(filename):
    #f: the text file is converted to lowercase 
    f=filename.lower()
    #n: the sum of the letters in the text file
    n=float(len(f))
    import collections
    dic=collections.defaultdict(int)
    #the absolute frequencies
    for x in f:
        dic[x]+=1
    #the relative frequencies
    from string import ascii_lowercase
    for x in ascii_lowercase:
        return x,(dic[x]/n)*100

例如，如果我尝试以下方法：

print(letterFrequency('I have no idea'))
>>> ('a',14.285714)

为什么不打印字母的所有相对值？还有不在字符串中的字母，比如我的示例中的z

我如何让我的代码打印一个大小为26的向量

编辑：我试过使用计数器，但它打印“a”：14.2857，字母顺序不一。我只需要按顺序排列的字母的相对频率

for x in ascii_lowercase:
    return x,(dic[x]/n)*100

函数在循环的第一次迭代中返回

相反，将其更改为返回元组列表：

letters = []
for x in ascii_lowercase:
    letters.append((x,(dic[x]/n)*100))
return letters

问题在于，在for循环中：

返回一个元组，因此它将在第一次迭代时停止

使用收益率而不是回报率，这将使发电机按预期工作

另一种方法是返回列表：

return [x,(dic[x]/n)*100 for x in ascii_lowercase]

但是，如果您的目的是计算项目，我建议使用该类：

如您所见，c=Countertxt.lower完成了遍历字符和保持计数的所有工作。计数器的行为就像defaultdict一样

请注意，计数器也有很好的有用方法，例如c.most_common…

谢谢，这很有效。。但如何删除打印结果中的逗号？它会打印[number，number，number]，但我真的很想像array@Gliz使用letters.appenddic[x]/n*100并在letterFrequency'I'no idea'：printe，end=''中使用e来打印它。这就是你想要的吗？

return [x,(dic[x]/n)*100 for x in ascii_lowercase]

def letterFrequency(txt):
    from collections import Counter
    from string import ascii_lowercase
    c=Counter(txt.lower())
    n=len(txt)/100.
    return [(x, c[x]/n) for x in ascii_lowercase]