Python 与ngram和nltk相关的零误差浮点除法_Python_Nltk

Python 与ngram和nltk相关的零误差浮点除法

python

Python 与ngram和nltk相关的零误差浮点除法,python,nltk,Python,Nltk,我的任务是使用10倍交叉验证方法，在语料库中使用uni、bi和trigrams，并比较它们的准确性。但是，我遇到了一个浮点除法错误。除了循环之外，所有这些代码都是由提问者给出的，因此错误可能就在那里。在这里，我们只使用前1000个句子来测试程序，一旦我知道程序运行，这一行将被删除 import codecs mypath = "/Users/myname/Desktop/" corpusFile = codecs.open(mypath + "estonianSample.txt",mode="

我的任务是使用10倍交叉验证方法，在语料库中使用uni、bi和trigrams，并比较它们的准确性。但是，我遇到了一个浮点除法错误。除了循环之外，所有这些代码都是由提问者给出的，因此错误可能就在那里。在这里，我们只使用前1000个句子来测试程序，一旦我知道程序运行，这一行将被删除

import codecs
mypath = "/Users/myname/Desktop/"
corpusFile = codecs.open(mypath + "estonianSample.txt",mode="r",encoding="latin-1")
sentences = [[tuple(w.split("/")) for w in line[:-1].split()] for line in corpusFile.readlines()]
corpusFile.close()


from math import ceil
N=len(sentences)
chunkSize = int(ceil(N/10.0))


sentences = sentences[:1000]

chunks=[sentences[i:i+chunkSize] for i in range(0, N, chunkSize)]

for i in range(10):

    training = reduce(lambda x,y:x+y,[chunks[j] for j in range(10) if j!=i])
    testing = chunks[i]

from nltk import UnigramTagger,BigramTagger,TrigramTagger
t1 = UnigramTagger(training)
t2 = BigramTagger(training,backoff=t1)
t3 = TrigramTagger(training,backoff=t2)

t3.evaluate(testing)

错误是这样说的：

runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
Traceback (most recent call last):
  File "<ipython-input-1-921164840ebd>", line 1, in <module>
    runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname') 
  File "/Users/myname/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 580, in runfile
    execfile(filename, namespace)
  File "/Users/myname/pythonhw3.py", line 34, in <module>
    t3.evaluate(testing)
  File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/tag/api.py", line 67, in evaluate
    return accuracy(gold_tokens, test_tokens)
  File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/metrics/scores.py", line 40, in accuracy
    return float(sum(x == y for x, y in izip(reference, test))) / len(test)    
ZeroDivisionError: float division by zero

runfile（'/Users/myname/pythonhw3.py'，wdir='/Users/myname'）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
运行文件（'/Users/myname/pythonhw3.py'，wdir='/Users/myname'）
runfile中的文件“/Users/myname/anaconda/lib/python2.7/site packages/spyderlib/widgets/externalshell/sitecustomize.py”，第580行
execfile（文件名、命名空间）
文件“/Users/myname/pythonhw3.py”，第34行，在
t3.评估（测试）
文件“/Users/myname/anaconda/lib/python2.7/site packages/nltk/tag/api.py”，第67行
返回精度（黄金代币、测试代币）
文件“/Users/myname/anaconda/lib/python2.7/site packages/nltk/metrics/scores.py”，第40行，准确无误
返回浮点（和（x==y表示x，y表示izip（参考，测试））/len（测试）
ZeroDivisionError：浮点除以零

由于返回值接近负无穷大，因此发生错误

导致问题的具体原因是

t3.evaluate(testing)

你能做的是

try:
    t3.evaluate(testing)
except ZeroDivisonError:
    # Do whatever you want it to do
    print(0)

这对我很有效。试试看

答案是四年后的事了，但希望一位网友能发现这一点很有帮助。

你能发布错误的完整输出吗？已编辑！添加了完整的输出！