Python NLP问题:Spacy如何减少训练时间
我正在使用spacy构建一个定制的NER模型,但是当我在普通笔记本电脑(四核处理器和8GB内存)上开始培训时,培训大约需要30分钟,而当我将代码移动到一个具有20核和80GB内存的高计算虚拟机上时,仍然需要30分钟。为了加快训练速度,是否有什么需要改变的地方?下面是我的代码Python NLP问题:Spacy如何减少训练时间,python,nlp,spacy,Python,Nlp,Spacy,我正在使用spacy构建一个定制的NER模型,但是当我在普通笔记本电脑(四核处理器和8GB内存)上开始培训时,培训大约需要30分钟,而当我将代码移动到一个具有20核和80GB内存的高计算虚拟机上时,仍然需要30分钟。为了加快训练速度,是否有什么需要改变的地方?下面是我的代码 spaCy version 2.2.3 Platform Windows-10-10.0.17134-SP0 Python version 3.6.7 Models
spaCy version 2.2.3
Platform Windows-10-10.0.17134-SP0
Python version 3.6.7
Models en
import spacy
import random
def train_spacy(data,iterations):
TRAIN_DATA = data
nlp = spacy.blank('en')
nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
# create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp
TRAIN_DATA = [('Adobe upgrade not happening.',
{'entities': [(0, 5, 'Item'), (6, 27, 'itemAspect')]}),
('VPN - Unable to Change Server',
{'entities': [(0, 3, 'Item'), (6, 29, 'itemAspect')]}),
('Mailbox not getting updated in outlook.',
{'entities': [(8, 27, 'itemAspect'), (0, 7, 'Item')]}),
('Sharefile - User account disabled',
{'entities': [(0, 9, 'Item'), (12, 33, 'itemAspect')]})]
prdnlp = train_spacy(TRAIN_DATA, 20)
# Save our trained Model
modelfile = input("Enter your Model Name: ")
prdnlp.to_disk(modelfile)