Python 实现word2vec,但我得到了错误,单词car_名词在词汇表中

Python 实现word2vec,但我得到了错误,单词car_名词在词汇表中,python,pandas,nlp,spacy,Python,Pandas,Nlp,Spacy,我写了下面的代码:为了在上面实现word2vec,现在我正在测试w2v_模型的嵌入。wv['car_noon'],但我得到如下错误:“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中,有什么问题? 有人能帮我吗 关于代码:我使用spacy将推文中的单词限制为内容词,即名词、动词和形容词。将单词转换成小写,并添加分数较低的词组。例如:love_动词。然后我想在新列表中实现word2vec,但我发现了这个错误 爱情动词旧式名词 KeyError

我写了下面的代码:为了在上面实现word2vec,现在我正在测试w2v_模型的嵌入。wv['car_noon'],但我得到如下错误:“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中,有什么问题? 有人能帮我吗

关于代码:我使用spacy将推文中的单词限制为内容词,即名词、动词和形容词。将单词转换成小写,并添加分数较低的词组。例如:love_动词。然后我想在新列表中实现word2vec,但我发现了这个错误

爱情动词旧式名词

KeyError                                  Traceback (most recent call last)
<ipython-input-145-f6fb9c62175c> in <module>()
----> 1 w2v_model.wv['car_NOUN']

2 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
    450             return result
    451         else:
--> 452             raise KeyError("word '%s' not in vocabulary" % word)
    453 
    454     def get_vector(self, word):

KeyError: "word 'car_NOUN' not in vocabulary"

您的convert函数中有一个错误:您应该将列表列表传递给Word2Vec,例如,包含列表中的句子的列表。我已经为你改变了。基本上,你想从这样的事情开始

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]
[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]
像这样的事情

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]
[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]
我还修改了一些代码来训练模型,让它适合我,你可能想尝试一下

! pip install wget

from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget

url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')

with ZipFile('reviews.full.tsv.zip', 'r') as zf:
    zf.extractall()

# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()

nlp = spacy.load('en_core_web_sm')  # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}

sentences = documents[:103]  # first 10 sentences
new_sentences = []
for sentence in sentences:
    new_sentence = []
    for token in nlp(sentence):
        if token.pos_ in included_tags:
            new_sentence.append(token.text.lower()+'_'+token.pos_)
    new_sentences.append(new_sentence)


# initialize model
w2v_model = Word2Vec(new_sentences,
                     size=100,
                     window=15,
                     sample=0.0001,
                     iter=200,
                     negative=5,
                     min_count=1,  # <-- it seems your min_count was too high
                     workers=-1,
                     hs=0
                     )

w2v_model.wv['car_NOUN']

您的convert函数中有一个错误:您应该将列表列表传递给Word2Vec,例如,包含列表中的句子的列表。我已经为你改变了。基本上,你想从这样的事情开始

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]
[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]
像这样的事情

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]
[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]
我还修改了一些代码来训练模型,让它适合我,你可能想尝试一下

! pip install wget

from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget

url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')

with ZipFile('reviews.full.tsv.zip', 'r') as zf:
    zf.extractall()

# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()

nlp = spacy.load('en_core_web_sm')  # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}

sentences = documents[:103]  # first 10 sentences
new_sentences = []
for sentence in sentences:
    new_sentence = []
    for token in nlp(sentence):
        if token.pos_ in included_tags:
            new_sentence.append(token.text.lower()+'_'+token.pos_)
    new_sentences.append(new_sentence)


# initialize model
w2v_model = Word2Vec(new_sentences,
                     size=100,
                     window=15,
                     sample=0.0001,
                     iter=200,
                     negative=5,
                     min_count=1,  # <-- it seems your min_count was too high
                     workers=-1,
                     hs=0
                     )

w2v_model.wv['car_NOUN']

w2v_model.wv.vocab.keys()
显示了什么?它显示了dict_键(['p','r','i','c','e','s','U','N','O','U',…])啊哈!看看
x
-这是你想要的吗?我是python新手,我不知道我写的是否正确,我认为x是个问题x的输出是这个,我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'distance_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_形容词'、'e_名词']什么是
w2v_model.wv.wv.vocab
show?它显示dict_键(['p','r','i','c','e','s','U','N','O','U',…])啊哈!看看
x
-这是你想要的吗?我是python新手,我不知道我写的是否正确,我想x是个问题x的输出是这个,我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'different_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_名词']谢谢,我的代码也有同样的错误。我运行了你上面的代码,我也有同样的错误,你改变了这个?w2v_model=Word2Vec(新的句子,对吗?这还不是全部-试着复制粘贴整个东西并尝试一下,它真的应该能工作:-)是的,非常感谢,训练部分怎么样。就像这样“w2v\u模型。构建语音(新句子)w2v\u模型。训练(新句子,总例子=w2v\u模型。语料库数量,年代=w2v\u模型。年代)``你不需要那个().将句子传递给Word2Vec可以帮你完成所有这些!如果你想得到一个好的解释,谢谢我在这段代码中也遇到了同样的错误。我运行了你上面输入的代码,也遇到了同样的错误,你更改了这个?w2v_model=Word2Vec(新句子,对吗?这还不是全部-试着复制粘贴整个东西并尝试一下,它真的应该能工作:-)是的,它能工作,非常感谢你,训练部分怎么样。它是这样的``w2v\U model.build\u vocab(新句子)w2v\U model.train(新句子,total_examples=w2v_model.corpus_count,epochs=w2v_model.epochs)``你不需要这个()。把句子传给Word2Vec就可以了!如果你想要一个好的解释,