Python 训练自定义模型_Python_Machine Learning_Nltk_Spacy_Ner

Python 训练自定义模型

python machine-learning

Python 训练自定义模型,python,machine-learning,nltk,spacy,ner,Python,Machine Learning,Nltk,Spacy,Ner,我一直在一些文本上训练我的NER模型，并试图在其中找到具有自定义实体的城市例如：- ('paragraph Designated Offices Party A New York Party B Delaware paragraph pricing source calculation Market Value shall generally accepted pricing source reasonably agreed parties paragraph Spot rate Sp

我一直在一些文本上训练我的NER模型，并试图在其中找到具有自定义实体的城市

例如：-

    ('paragraph Designated Offices Party A New York Party B Delaware paragraph pricing source calculation Market Value shall generally accepted pricing source reasonably agreed parties paragraph Spot rate Spot Rate specified paragraph reasonably agreed parties',
  {'entities': [(37, 41, 'DesignatedBankLoc'),(54, 62, 'CounterpartyBankLoc')]})

我在这里寻找两个实体

指定银行锁定

和

交易对手银行锁定

。单个文本也可以有多个实体

目前，我正在对60行数据进行如下培训：

import spacy
import random
def train_spacy(data,iterations):
    TRAIN_DATA = data
    nlp = spacy.blank('en')  # create blank Language class
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)


    # add labels
    for _, annotations in TRAIN_DATA:
         for ent in annotations.get('entities'):
            # print (ent[2])
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        optimizer = nlp.begin_training()
        for itn in range(iterations):
            print("Statring iteration " + str(itn))
            random.shuffle(TRAIN_DATA)
            losses = {}
            for text, annotations in TRAIN_DATA:
                nlp.update(
                    [text],  # batch of texts
                    [annotations],  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    sgd=optimizer,  # callable to update weights
                    losses=losses)
            print(losses)
    return nlp


prdnlp = train_spacy(TRAIN_DATA, 100)

我的问题是：-

当输入不同/相同的文本模式包含经过训练的城市时，模型预测正确。该模型不预测任何实体，即使相同/不同的文本模式，但不同的城市在训练数据集中从未出现

请告诉我发生的原因请让我了解它是如何得到训练的概念？

根据经验，您有60行数据，训练100次迭代。您过度拟合了实体的值，而不是它们的位置

要检查这一点，请尝试在句子中随机插入城市名称，看看会发生什么。如果算法对它们进行了标记，则可能是拟合过度

有两种解决方案：

为这些实体创建更多具有更多不同值的训练数据
测试不同的迭代次数

感谢您的回复，我想知道下降、迭代次数如何影响模型，以及如何检查拟合度？我尝试过使用相同迭代但下降值不同的训练模型。我在这两种情况下都遇到了损失。我如何比较这两种情况，并查看哪一种效果更好？