Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Gensim(word2vec模型)中哪些经过训练的嵌入向量应用于Tensorflow?非标准化的还是标准化的?_Tensorflow_Keras_Gensim_Word2vec_Word Embedding - Fatal编程技术网

Gensim(word2vec模型)中哪些经过训练的嵌入向量应用于Tensorflow?非标准化的还是标准化的?

Gensim(word2vec模型)中哪些经过训练的嵌入向量应用于Tensorflow?非标准化的还是标准化的?,tensorflow,keras,gensim,word2vec,word-embedding,Tensorflow,Keras,Gensim,Word2vec,Word Embedding,我想在神经网络(Tensorflow)中使用Gensim(word2vec模型)训练向量。有两种重量我可以用于此目的。第一组是model.syn0,第二组是model.vectors\u norm(调用model.init\u sims(replace=True))。第二个是我们用来计算相似性的向量组。哪一个具有神经网络嵌入层的正确顺序(与model.wv.indexword和model.wv.vocab[X].index)和权重?如果使用谷歌的谷歌新闻向量作为预训练模型,可以使用model.s

我想在神经网络(Tensorflow)中使用Gensim(word2vec模型)训练向量。有两种重量我可以用于此目的。第一组是
model.syn0
,第二组是
model.vectors\u norm
(调用
model.init\u sims(replace=True)
)。第二个是我们用来计算相似性的向量组。哪一个具有神经网络嵌入层的正确顺序(与
model.wv.indexword
model.wv.vocab[X].index
)和权重?

如果使用谷歌的
谷歌新闻向量
作为预训练模型,可以使用
model.syn0
。如果您使用Facebook的
fastText
word嵌入,您可以直接加载二进制文件。
下面是加载这两个实例的示例

加载谷歌新闻预训练嵌入:

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',binary=True,limit=500000) # To load the model first time.
model.wv.save_word2vec_format(model_path) #You can save the loaded model to binary file to load the model faster
model = gensim.models.KeyedVectors.load(model_path,mmap='r')
model.syn0norm = model.syn0
index2word_set = set(model.index2word)

model[word] gives the vector representation of the word which can be used to find similarity. 
import gensim
from gensim.models import FastText
model = FastText.load_fasttext_format('cc.en.300') # to load the model for first time.
model.save("fasttext_en_bin") # Save the model to binary file to load faster.
model = gensim.models.KeyedVectors.load("fasttext_en_bin",mmap="r")
index2word_set = set(model.index2word)

model[word] gives the vector representation of the word which can be used to find similarity. 
加载fastText预训练嵌入:

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz',binary=True,limit=500000) # To load the model first time.
model.wv.save_word2vec_format(model_path) #You can save the loaded model to binary file to load the model faster
model = gensim.models.KeyedVectors.load(model_path,mmap='r')
model.syn0norm = model.syn0
index2word_set = set(model.index2word)

model[word] gives the vector representation of the word which can be used to find similarity. 
import gensim
from gensim.models import FastText
model = FastText.load_fasttext_format('cc.en.300') # to load the model for first time.
model.save("fasttext_en_bin") # Save the model to binary file to load faster.
model = gensim.models.KeyedVectors.load("fasttext_en_bin",mmap="r")
index2word_set = set(model.index2word)

model[word] gives the vector representation of the word which can be used to find similarity. 
一般示例:

if word in index2word:
   feature_vec = model[word]

为什么要使用
model.syn0norm=model.syn0
?@Eghbal该行基本上阻止了重新计算归一化向量以提高速度。由于相似的运算必须在单位归一化向量上进行,我认为这里仍然没有回答主要问题。在神经网络中使用这些权重时,我们应该使用Gensim的归一化权重还是只使用
syn0
?您可以使用
syn0