Python 实现word2vec，但我得到了错误，单词car_名词在词汇表中_Python_Pandas_Nlp_Spacy

Python 实现word2vec，但我得到了错误，单词car_名词在词汇表中

python pandas nlp

Python 实现word2vec，但我得到了错误，单词car_名词在词汇表中,python,pandas,nlp,spacy,Python,Pandas,Nlp,Spacy,我写了下面的代码：为了在上面实现word2vec，现在我正在测试w2v_模型的嵌入。wv['car_noon']，但我得到如下错误：“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中，有什么问题？有人能帮我吗关于代码：我使用spacy将推文中的单词限制为内容词，即名词、动词和形容词。将单词转换成小写，并添加分数较低的词组。例如：love_动词。然后我想在新列表中实现word2vec，但我发现了这个错误爱情动词旧式名词 KeyError

我写了下面的代码：为了在上面实现word2vec，现在我正在测试w2v_模型的嵌入。wv['car_noon']，但我得到如下错误：“单词‘car_noon’不在词汇表中”但我确定单词car_noon在词汇表中，有什么问题？有人能帮我吗

关于代码：我使用spacy将推文中的单词限制为内容词，即名词、动词和形容词。将单词转换成小写，并添加分数较低的词组。例如：love_动词。然后我想在新列表中实现word2vec，但我发现了这个错误

爱情动词旧式名词

KeyError                                  Traceback (most recent call last)
<ipython-input-145-f6fb9c62175c> in <module>()
----> 1 w2v_model.wv['car_NOUN']

2 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
    450             return result
    451         else:
--> 452             raise KeyError("word '%s' not in vocabulary" % word)
    453 
    454     def get_vector(self, word):

KeyError: "word 'car_NOUN' not in vocabulary"

您的convert函数中有一个错误：您应该将列表列表传递给Word2Vec，例如，包含列表中的句子的列表。我已经为你改变了。基本上，你想从这样的事情开始

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]

[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]

像这样的事情

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]

[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]

我还修改了一些代码来训练模型，让它适合我，你可能想尝试一下

! pip install wget

from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget

url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')

with ZipFile('reviews.full.tsv.zip', 'r') as zf:
    zf.extractall()

# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()

nlp = spacy.load('en_core_web_sm')  # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}

sentences = documents[:103]  # first 10 sentences
new_sentences = []
for sentence in sentences:
    new_sentence = []
    for token in nlp(sentence):
        if token.pos_ in included_tags:
            new_sentence.append(token.text.lower()+'_'+token.pos_)
    new_sentences.append(new_sentence)


# initialize model
w2v_model = Word2Vec(new_sentences,
                     size=100,
                     window=15,
                     sample=0.0001,
                     iter=200,
                     negative=5,
                     min_count=1,  # <-- it seems your min_count was too high
                     workers=-1,
                     hs=0
                     )

w2v_model.wv['car_NOUN']

您的convert函数中有一个错误：您应该将列表列表传递给Word2Vec，例如，包含列表中的句子的列表。我已经为你改变了。基本上，你想从这样的事情开始

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]

[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]

像这样的事情

['prices_NOUN',
  'change_VERB',
  'want_VERB',
  'research_VERB',
  'price_NOUN',
  'many_ADJ',
  'different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]

[['prices_NOUN',
  'change_VERB',
  'want_VERB',]
  ['research_VERB',
  'price_NOUN',
  'many_ADJ',]
  ['different_ADJ',
  'sites_NOUN',
  'found_VERB',
  'cheaper_ADJ',]]

我还修改了一些代码来训练模型，让它适合我，你可能想尝试一下

! pip install wget

from gensim.models.word2vec import FAST_VERSION
from gensim.models import Word2Vec
import spacy
import pandas as pd
from zipfile import ZipFile
import wget

url = 'https://raw.githubusercontent.com/dirkhovy/NLPclass/master/data/reviews.full.tsv.zip'
wget.download(url, 'reviews.full.tsv.zip')

with ZipFile('reviews.full.tsv.zip', 'r') as zf:
    zf.extractall()

# nrows , max amount of rows
df = pd.read_csv('reviews.full.tsv', sep='\t', nrows=100000)
documents = df.text.values.tolist()

nlp = spacy.load('en_core_web_sm')  # you can use other methods
# excluded tags
included_tags = {"NOUN", "VERB", "ADJ"}

sentences = documents[:103]  # first 10 sentences
new_sentences = []
for sentence in sentences:
    new_sentence = []
    for token in nlp(sentence):
        if token.pos_ in included_tags:
            new_sentence.append(token.text.lower()+'_'+token.pos_)
    new_sentences.append(new_sentence)


# initialize model
w2v_model = Word2Vec(new_sentences,
                     size=100,
                     window=15,
                     sample=0.0001,
                     iter=200,
                     negative=5,
                     min_count=1,  # <-- it seems your min_count was too high
                     workers=-1,
                     hs=0
                     )

w2v_model.wv['car_NOUN']

w2v_model.wv.vocab.keys（）

显示了什么？它显示了dict_键（['p'，'r'，'i'，'c'，'e'，'s'，'U'，'N'，'O'，'U'，…]）啊哈！看看

-这是你想要的吗？我是python新手，我不知道我写的是否正确，我认为x是个问题x的输出是这个，我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'distance_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_形容词'、'e_名词']什么是

w2v_model.wv.wv.vocab

show？它显示dict_键（['p'，'r'，'i'，'c'，'e'，'s'，'U'，'N'，'O'，'U'，…]）啊哈！看看

-这是你想要的吗？我是python新手，我不知道我写的是否正确，我想x是个问题x的输出是这个，我想要这个是的['prices_名词'、'change_动词'、'want_动词'、'research_动词'、'price_名词'、'many_形容词'、'different_名词'、'sites_名词'、'found_动词'、'sapper_形容词'、'cars_名词'、'don_动词'、't_名词'、'lot_名词'、'time_名词'、'research_动词'、'price_名词'、'site_名词'、'top_名词']谢谢，我的代码也有同样的错误。我运行了你上面的代码，我也有同样的错误，你改变了这个？w2v_model=Word2Vec（新的句子，对吗？这还不是全部-试着复制粘贴整个东西并尝试一下，它真的应该能工作：-）是的，非常感谢，训练部分怎么样。就像这样“w2v\u模型。构建语音（新句子）w2v\u模型。训练（新句子，总例子=w2v\u模型。语料库数量，年代=w2v\u模型。年代）``你不需要那个（）.将句子传递给Word2Vec可以帮你完成所有这些！如果你想得到一个好的解释，谢谢我在这段代码中也遇到了同样的错误。我运行了你上面输入的代码，也遇到了同样的错误，你更改了这个？w2v_model=Word2Vec（新句子，对吗？这还不是全部-试着复制粘贴整个东西并尝试一下，它真的应该能工作：-）是的，它能工作，非常感谢你，训练部分怎么样。它是这样的``w2v\U model.build\u vocab（新句子）w2v\U model.train（新句子，total_examples=w2v_model.corpus_count，epochs=w2v_model.epochs）``你不需要这个（）。把句子传给Word2Vec就可以了！如果你想要一个好的解释，