Python 实现word2vec,但我得到了错误,单词car_名词在词汇表中
我写了下面的代码:为了在上面实现word2vec,现在我正在测试w2v_模型的嵌入。wv['car_noon'],但我得到如下错误:“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中,有什么问题? 有人能帮我吗 关于代码:我使用spacy将推文中的单词限制为内容词,即名词、动词和形容词。将单词转换成小写,并添加分数较低的词组。例如:love_动词。然后我想在新列表中实现word2vec,但我发现了这个错误 爱情动词旧式名词Python 实现word2vec,但我得到了错误,单词car_名词在词汇表中,python,pandas,nlp,spacy,Python,Pandas,Nlp,Spacy,我写了下面的代码:为了在上面实现word2vec,现在我正在测试w2v_模型的嵌入。wv['car_noon'],但我得到如下错误:“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中,有什么问题? 有人能帮我吗 关于代码:我使用spacy将推文中的单词限制为内容词,即名词、动词和形容词。将单词转换成小写,并添加分数较低的词组。例如:love_动词。然后我想在新列表中实现word2vec,但我发现了这个错误 爱情动词旧式名词 KeyError
KeyError Traceback (most recent call last)
<ipython-input-145-f6fb9c62175c> in <module>()
----> 1 w2v_model.wv['car_NOUN']
2 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
450 return result
451 else:
--> 452 raise KeyError("word '%s' not in vocabulary" % word)
453
454 def get_vector(self, word):
KeyError: "word 'car_NOUN' not in vocabulary"
您的convert函数中有一个错误:您应该将列表列表传递给Word2Vec,例如,包含列表中的句子的列表。我已经为你改变了。基本上,你想从这样的事情开始
['prices_NOUN',
'change_VERB',
'want_VERB',
'research_VERB',
'price_NOUN',
'many_ADJ',
'different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]
[['prices_NOUN',
'change_VERB',
'want_VERB',]
['research_VERB',
'price_NOUN',
'many_ADJ',]
['different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]]
像这样的事情
['prices_NOUN',
'change_VERB',
'want_VERB',
'research_VERB',
'price_NOUN',
'many_ADJ',
'different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]
[['prices_NOUN',
'change_VERB',
'want_VERB',]
['research_VERB',
'price_NOUN',
'many_ADJ',]
['different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]]
我还修改了一些代码来训练模型,让它适合我,你可能想尝试一下
! pip install wget
from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')
with ZipFile('reviews.full.tsv.zip', 'r') as zf:
zf.extractall()
# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()
nlp = spacy.load('en_core_web_sm') # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}
sentences = documents[:103] # first 10 sentences
new_sentences = []
for sentence in sentences:
new_sentence = []
for token in nlp(sentence):
if token.pos_ in included_tags:
new_sentence.append(token.text.lower()+'_'+token.pos_)
new_sentences.append(new_sentence)
# initialize model
w2v_model = Word2Vec(new_sentences,
size=100,
window=15,
sample=0.0001,
iter=200,
negative=5,
min_count=1, # <-- it seems your min_count was too high
workers=-1,
hs=0
)
w2v_model.wv['car_NOUN']
您的convert函数中有一个错误:您应该将列表列表传递给Word2Vec,例如,包含列表中的句子的列表。我已经为你改变了。基本上,你想从这样的事情开始
['prices_NOUN',
'change_VERB',
'want_VERB',
'research_VERB',
'price_NOUN',
'many_ADJ',
'different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]
[['prices_NOUN',
'change_VERB',
'want_VERB',]
['research_VERB',
'price_NOUN',
'many_ADJ',]
['different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]]
像这样的事情
['prices_NOUN',
'change_VERB',
'want_VERB',
'research_VERB',
'price_NOUN',
'many_ADJ',
'different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]
[['prices_NOUN',
'change_VERB',
'want_VERB',]
['research_VERB',
'price_NOUN',
'many_ADJ',]
['different_ADJ',
'sites_NOUN',
'found_VERB',
'cheaper_ADJ',]]
我还修改了一些代码来训练模型,让它适合我,你可能想尝试一下
! pip install wget
from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget
url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')
with ZipFile('reviews.full.tsv.zip', 'r') as zf:
zf.extractall()
# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()
nlp = spacy.load('en_core_web_sm') # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}
sentences = documents[:103] # first 10 sentences
new_sentences = []
for sentence in sentences:
new_sentence = []
for token in nlp(sentence):
if token.pos_ in included_tags:
new_sentence.append(token.text.lower()+'_'+token.pos_)
new_sentences.append(new_sentence)
# initialize model
w2v_model = Word2Vec(new_sentences,
size=100,
window=15,
sample=0.0001,
iter=200,
negative=5,
min_count=1, # <-- it seems your min_count was too high
workers=-1,
hs=0
)
w2v_model.wv['car_NOUN']
w2v_model.wv.vocab.keys()
显示了什么?它显示了dict_键(['p','r','i','c','e','s','U','N','O','U',…])啊哈!看看x
-这是你想要的吗?我是python新手,我不知道我写的是否正确,我认为x是个问题x的输出是这个,我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'distance_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_形容词'、'e_名词']什么是w2v_model.wv.wv.vocab
show?它显示dict_键(['p','r','i','c','e','s','U','N','O','U',…])啊哈!看看x
-这是你想要的吗?我是python新手,我不知道我写的是否正确,我想x是个问题x的输出是这个,我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'different_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_名词']谢谢,我的代码也有同样的错误。我运行了你上面的代码,我也有同样的错误,你改变了这个?w2v_model=Word2Vec(新的句子,对吗?这还不是全部-试着复制粘贴整个东西并尝试一下,它真的应该能工作:-)是的,非常感谢,训练部分怎么样。就像这样“w2v\u模型。构建语音(新句子)w2v\u模型。训练(新句子,总例子=w2v\u模型。语料库数量,年代=w2v\u模型。年代)``你不需要那个().将句子传递给Word2Vec可以帮你完成所有这些!如果你想得到一个好的解释,谢谢我在这段代码中也遇到了同样的错误。我运行了你上面输入的代码,也遇到了同样的错误,你更改了这个?w2v_model=Word2Vec(新句子,对吗?这还不是全部-试着复制粘贴整个东西并尝试一下,它真的应该能工作:-)是的,它能工作,非常感谢你,训练部分怎么样。它是这样的``w2v\U model.build\u vocab(新句子)w2v\U model.train(新句子,total_examples=w2v_model.corpus_count,epochs=w2v_model.epochs)``你不需要这个()。把句子传给Word2Vec就可以了!如果你想要一个好的解释,