Python TF_IDF计算发送错误
我目前正在从事一个小项目,头脑完全空白,我有以下代码来计算术语频率: 从袋进口*Python TF_IDF计算发送错误,python,python-3.x,python-2.7,tf-idf,Python,Python 3.x,Python 2.7,Tf Idf,我目前正在从事一个小项目,头脑完全空白,我有以下代码来计算术语频率: 从袋进口* words = ['the','new','the','shiny','new','car','went','through','the','tunnel'] carDoc = Bag() for word in words: carDoc.add(word) def tf(word, carDoc): if word != "" and carDoc.size() > 0:
words =
['the','new','the','shiny','new','car','went','through','the','tunnel']
carDoc = Bag()
for word in words:
carDoc.add(word)
def tf(word, carDoc):
if word != "" and carDoc.size() > 0:
return carDoc.count(word)/carDoc.size()
我还有以下逆文档频率代码:
from Bag import *
from math import log
carDoc1 = Bag()
for word in ['the', 'car']:
carDoc1.add(word)
carDoc2 = Bag()
for word in ['the', 'shiny', 'new']:
carDoc2.add(word)
allCarDocs = [carDoc1, carDoc2]
def idf(word, carDocs):
total = len(allCarDocs)
wordIsIn = 0
for docs in allCarDocs:
if docs.contains(word):
wordIsIn = wordIsIn + 1
return log(total / (1 + wordIsIn))
carDoc1 = Bag()
for word in ['the', 'car']:
carDoc1.add(word)
carDoc2 = Bag()
for word in ['the', 'shiny', 'new']:
carDoc2.add(word)
allCarDocs = [carDoc1, carDoc2]
def tf_idf(word, documents):
return tf(word, carDoc) * idf (word, allCarDocs)
我得到的错误是carDoc未定义
这些都很好,并且也可以按照我的预期工作,但是当涉及到实现tfidf函数时,我总是会出错。对于本例中解析tfidf的任何帮助,我们将不胜感激def tf_idf(word,文档):
返回tf(word,carDoc)*idf(word,allCarDocs)
如果您的函数采用(word,文档),您希望从何处获取carDoc和allCarDocs?def tf_idf(word,文档):
返回tf(word,carDoc)*idf(word,allCarDocs)
如果您的函数采用(word,documents),您希望从何处获取carDoc和allCarDocs?您还可以发布错误吗?您还可以发布错误吗?