Python 如何在spacy中保存单词向量_Python_Python 3.x_Nlp_Spacy

Python 如何在spacy中保存单词向量

python python-3.x nlp

Python 如何在spacy中保存单词向量,python,python-3.x,nlp,spacy,Python,Python 3.x,Nlp,Spacy,我有以下代码。目标是获得列表中每个单词的向量表示。我的意图是将这些词向量用于其他应用目的，如词聚类 import numpy as np import pandas as pd from sklearn.preprocessing import normalize import en_vectors_web_lg nlp = en_vectors_web_lg.load() def vectorize(text): return nlp(text, disable=['parser'

我有以下代码。目标是获得列表中每个单词的向量表示。我的意图是将这些词向量用于其他应用目的，如词聚类

import numpy as np
import pandas as pd
from sklearn.preprocessing import normalize
import en_vectors_web_lg
nlp = en_vectors_web_lg.load() 

def vectorize(text):
    return nlp(text, disable=['parser', 'tagger', 'ner']).vector

category=['Dell','Python','Asus','Apple','C','perl','Java','iphone','nokia','LG','Lenovo']
for ntext in category:
    doc = nlp(ntext)

    vectors = normalize(np.stack(vectorize(t) for t in doc.text))

我意识到我在上面的代码中做错了什么。如何保存列表“category”中每个单词的单词向量

关于使用

en\u vectors\u web\u lg

模型，我没有看到太多文档，但我知道

en\u core\u web\u lg

附带向量和其他功能

以下是如何将列表中的每个单词/术语矢量化：

import spacy

nlp = spacy.load('en_core_web_lg')

category=['Dell','Python','Asus','Apple','C','perl','Java','iphone','nokia','LG','Lenovo']
doc = list(nlp.pipe(category, disable=['parser', 'tagger', 'ner']))
vectors = [term.vector for term in doc]

每个向量如下所示（300d）：

您可能还对

vector\u norm

感兴趣：令牌向量的L2范数（值平方和的平方根）

“dell”的向量范数为8.001050178690836

spaCy还有一个内置的余弦相似性方法

.similarity（）

，用于比较向量

[-0.94557    0.46092    0.43141   -0.52199    0.55764    0.18107
  0.45607    0.031909   0.097713   0.061064   0.061381  -0.37256
 -0.21712   -0.065784  -0.4061    -0.11485   -0.48388    1.5697
  ...
  0.03717   -0.6773    -0.19379    0.31747   -0.19495    0.37144  ]