Python函数查找#个唯一单词/总单词不起作用…为什么？_Python_Python 3.x_Unique

Python函数查找#个唯一单词/总单词不起作用…为什么？

python python-3.x

Python函数查找#个唯一单词/总单词不起作用…为什么？,python,python-3.x,unique,Python,Python 3.x,Unique,为什么这个代码不起作用 def hapax_legomana_ratio(text): ''' Return the hapax_legomana ratio for this text. This ratio is the number of words that occur exactly once divided by the total number of words. text is a list of strings each ending in \n. At least one l

为什么这个代码不起作用

def hapax_legomana_ratio(text):
''' Return the hapax_legomana ratio for this text.
This ratio is the number of words that occur exactly once divided
by the total number of words.
text is a list of strings each ending in \n.
At least one line in text contains a word.'''

uniquewords=dict()
words=0
for line in text:
    line=line.strip().split()
    for word in line:
        words+=1
        if word in words:
            uniquewords[word]-=1
        else:
            uniquewords[word]=1
HLR=len(uniquewords)/words

print (HLR)

当我测试它时，它给出了错误的答案。例如，当一个9的字符串中有3个唯一的单词时，它给出的是0.2045456而不是.33333。

您的代码中有很多错误。我认为如果word in words中的word出现了一个拼写错误，因为它应该是

uniquewords

（dict），而不仅仅是

word

（这是计数）

更详细地说，您所提供的文本应该被分成几行&应该是这些行的列表。我宁愿建议这样做

for line in text.splitlines():

这样，您就不必担心传递的文本是一个

列表

此外，您正在执行

len（uniquewords）

操作，这是错误的，因为您将所有单词存储在dict中，而不管它们是否唯一。单词的唯一性由dict中的
值给出，该值通过将单词作为键传递而获得，即1或-1。因此，您应该迭代dict的项，并使用值1 计算键而且，你没有注意到标点符号！假设这是文本 This is a test, yes it is a test. 最后，如果这是一个非常大的项目&/或者您将来也会需要它，我建议您使用集合.Counter 库来完成所有这一切，而不是完成所有这一切。要找到以下比率：文本中的唯一单词数与总单词数之比： from collections import Counter def hapax_legomana_ratio(text): words = text.split() # a word is anything separated by a whitespace return sum(count == 1 for count in Counter(words).values()) / len(words) 它假定text 是一个字符串。如果你有一个行列表，那么你可以得到如下的单词列表： words = [word for line in all_lines for word in line.split()] 使用相同的测试，此方法给了我0.045456。对不起，我认为分割线部分是line.strip和split的替代品。非常好，谢谢。 words = [word for line in all_lines for word in line.split()]