Python 关键字错误:word'';不在词汇表中WORD2VEC
我正在从事Python项目,并使用Word2Vec推荐产品。 对于包含19401的数据集,该代码绝对可以正常工作,但每当我传递产品id时,我都会得到此错误“keyerror:word'1077'不在词汇表中”我不知道如何解决此问题,因为我对此知之甚少,我仍在学习。请帮助我解决此问题 购买列车=[]Python 关键字错误:word'';不在词汇表中WORD2VEC,python,word2vec,keyerror,Python,Word2vec,Keyerror,我正在从事Python项目,并使用Word2Vec推荐产品。 对于包含19401的数据集,该代码绝对可以正常工作,但每当我传递产品id时,我都会得到此错误“keyerror:word'1077'不在词汇表中”我不知道如何解决此问题,因为我对此知之甚少,我仍在学习。请帮助我解决此问题 购买列车=[] for i in tqdm(product_train): temp = train_df[train_df["Clothing ID"] == i]["Revi
for i in tqdm(product_train):
temp = train_df[train_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_train.append(temp)
purchases_val = []
for i in tqdm(validation_df['Clothing ID'].unique()):
temp = validation_df[validation_df["Clothing ID"] == i]["Review Text"].tolist()
purchases_val.append(temp)
model = Word2Vec(window = 10, sg = 1, hs = 0,
negative = 10, # for negative sampling
alpha=0.03, min_count= 1 , min_alpha=0.0007,
seed = 14)
model.build_vocab(purchases_train, progress_per=200)
model.train(purchases_train, total_examples = model.corpus_count,
epochs=10, report_delay=1)
# save word2vec model
model.save("word2vec_2.model")
model.init_sims(replace=True)
# extract all vectors
X = model[model.wv.vocab]
products = train_df[["Clothing ID", "Review Text"]]
# remove duplicates
products.drop_duplicates(inplace=True, subset='Clothing ID', keep="last")
# create product-ID and product-description dictionary
products_dict = products.groupby('Clothing ID')['Review Text'].apply(list).to_dict()
def similar_products(v, n = 6):
# extract most similar products for the input vector
ms = model.similar_by_vector(v, topn= n+1)[1:]
# extract name and similarity score of the similar products
new_ms = []
for j in ms:
pair = (products_dict[j[0]][0], j[1])
new_ms.append(pair)
return new_ms
similar_products(model['1077'])
如果出现错误,
word'847'不在词汇表中
,则可以确定:培训数据中未提供令牌'847'
如果你认为它在那里,你应该查看数据,看看它不是
如果您的代码需要能够对培训数据中不包含的单词执行一些有用的操作,则应将其扩展到以下任一方面:
(1) 在尝试获取其向量之前,请检查单词是否存在
if '847' in model:
similar_products(model['847'])
else:
# do something else
...
……或者
(2) 捕获
keyrorm
&在捕获时执行其他操作。请发布错误的全部回溯,以及您正在处理的示例数据。