Python gensim Word2Vec中的'null_word'参数是什么？_Python_Null_Deep Learning_Gensim_Word2vec

Python gensim Word2Vec中的'null_word'参数是什么？

python deep-learning

Python gensim Word2Vec中的'null_word'参数是什么？,python,null,deep-learning,gensim,word2vec,Python,Null,Deep Learning,Gensim,Word2vec,gensim中的对象具有文档中未解释的null\u word参数类gensim.models.word2vec.word2vec（句子=None，size=100，alpha=0.025，window=5，minu count=5，max\u vocab\u size=None，sample=0.001，seed=1，workers=3，minu alpha=0.0001，sg=0，hs=0，negative=5，cbow\u mean=1，hashfxn=，iter=5，null\u wor

gensim

中的对象具有文档中未解释的

null\u word

参数

类gensim.models.word2vec.word2vec（句子=None，size=100，alpha=0.025，window=5，minu count=5，max\u vocab\u size=None，sample=0.001，seed=1，workers=3，minu alpha=0.0001，sg=0，hs=0，negative=5，cbow\u mean=1，hashfxn=，iter=5，null\u word=0，trim\u rule=None，sorted\u vocab=1，batch\u words=10000）

什么是
null\u单词
参数？

检查处的代码时，它表示：

    if self.null_word:
        # create null pseudo-word for padding when using concatenative L1 (run-of-words)
        # this word is only ever input – never predicted – so count, huffman-point, etc doesn't matter
        word, v = '\0', Vocab(count=1, sample_int=0)
        v.index = len(self.wv.vocab)
        self.wv.index2word.append(word)
        self.wv.vocab[word] = v

什么是“串联L1”

仅当在模型初始化中将PV-DM与串联模式–参数

DM=1，DM\u concat=1一起使用时，才使用null\u字

在这种非默认模式下，doctag向量和目标字的窗口
位置内相邻字的向量被连接到一个非常宽的输入层，而不是更典型的平均值
这种模式比其他模式大得多，速度也慢得多。对于文本示例开头或结尾附近的目标词，可能没有足够的相邻词来创建此输入层，但模型需要这些槽的值。因此，null\u单词
基本上用作填充
虽然原始的段落向量
论文提到在他们的一些实验中使用这种模式，但这种模式不足以重现他们的结果。（据我所知，没有人能够重现这些结果，其中一位作者的其他评论暗示，原始论文的过程中存在一些错误或遗漏。）
此外，我还没有发现这种模式有明显的优点来证明增加的时间/内存是合理的。（可能需要非常大的数据集或非常长的训练时间才能显示任何好处。）
所以你不应该太在意这个模型属性，除非你正在用这个不太常见的模式做高级实验——在这种情况下，你可以查看源代码中关于它如何被用作填充的所有细节