Python NLP问题：Spacy如何减少训练时间_Python_Nlp_Spacy

Python NLP问题：Spacy如何减少训练时间

python nlp

Python NLP问题：Spacy如何减少训练时间,python,nlp,spacy,Python,Nlp,Spacy,我正在使用spacy构建一个定制的NER模型，但是当我在普通笔记本电脑（四核处理器和8GB内存）上开始培训时，培训大约需要30分钟，而当我将代码移动到一个具有20核和80GB内存的高计算虚拟机上时，仍然需要30分钟。为了加快训练速度，是否有什么需要改变的地方？下面是我的代码 spaCy version 2.2.3 Platform Windows-10-10.0.17134-SP0 Python version 3.6.7 Models

我正在使用spacy构建一个定制的NER模型，但是当我在普通笔记本电脑（四核处理器和8GB内存）上开始培训时，培训大约需要30分钟，而当我将代码移动到一个具有20核和80GB内存的高计算虚拟机上时，仍然需要30分钟。为了加快训练速度，是否有什么需要改变的地方？下面是我的代码

spaCy version      2.2.3
Platform           Windows-10-10.0.17134-SP0
Python version     3.6.7
Models             en

import spacy
import random

def train_spacy(data,iterations):
    TRAIN_DATA = data
    nlp = spacy.blank('en')
    nlp.vocab.vectors.name = 'spacy_pretrained_vectors'

    # create blank Language class
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)


    # add labels
    for _, annotations in TRAIN_DATA:
         for ent in annotations.get('entities'):
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(iterations):
            print("Statring iteration " + str(itn))
            random.shuffle(TRAIN_DATA)
            losses = {}
            for text, annotations in TRAIN_DATA:
                nlp.update(
                    [text],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.2,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
            print(losses)
    return nlp

TRAIN_DATA = [('Adobe upgrade not happening.',
  {'entities': [(0, 5, 'Item'), (6, 27, 'itemAspect')]}),
 ('VPN - Unable to Change Server',
  {'entities': [(0, 3, 'Item'), (6, 29, 'itemAspect')]}),
 ('Mailbox not getting updated in outlook.',
  {'entities': [(8, 27, 'itemAspect'), (0, 7, 'Item')]}),
 ('Sharefile - User account disabled',
  {'entities': [(0, 9, 'Item'), (12, 33, 'itemAspect')]})]

prdnlp = train_spacy(TRAIN_DATA, 20)

# Save our trained Model
modelfile = input("Enter your Model Name: ")
prdnlp.to_disk(modelfile)