问使用自定义数据训练Spacy的预定义NER模型，需要考虑复合因子、批次大小和损失值
EN

Stack Overflow用户

提问于 2019-01-05 23:26:57

回答 1查看 1.4K关注 0票数 2

我正在尝试训练spacy NER模型，我有大约2600个段落的数据，每个段落的长度从200到800个单词不等。我必须添加两个新的实体标签，产品和规范。如果没有最好的替代方法，这种方法是不是很好呢？如果可以，那么有没有人能给我建议合适的复合因子和批量大小的值，并且在训练时，损失值应该在范围内，有什么想法吗？到目前为止，我得到的损失值在400-5之间。

def main(model=None, new_model_name='product_details_parser', 
output_dir=Path('/xyz_path/'), n_iter=20):
"""Set up the pipeline and entity recognizer, and train the new
 entity."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank('en')  # create blank Language class
        print("Created blank 'en' model")
    # Add entity recognizer to model if it's not in the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner)
    # otherwise, get it, so we can add labels to it
    else:
        ner = nlp.get_pipe('ner')
    ner.add_label(LABEL)   # add new entity label to entity recognizer
    if model is None:
        optimizer = nlp.begin_training()
    else:
        # Note that 'begin_training' initializes the models, so it'll zero out
        # existing entity types.
        optimizer = nlp.entity.create_optimizer()

     # get names of other pipes to disable them during training
     other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
     with nlp.disable_pipes(*other_pipes):  # only train NER
        for itn in range(n_iter):
            random.shuffle(ret_data)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(ret_data, size=compounding(1., 32., 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, drop=0.35,losses=losses)
            print('Losses', losses)

if __name__ == '__main__':
    plac.call(main)

python

nltk

spacy

named-entity-recognition

回答 1

Stack Overflow用户

发布于 2019-12-02 18:56:59

您可以从简单训练方法(https://spacy.io/usage/training#training-simple-style)开始，而不是这种类型的训练。与你的方法相比，这个简单的方法可能需要一些时间，但会产生更好的结果。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54053415

复制

相似问题

问使用自定义数据训练Spacy的预定义NER模型，需要考虑复合因子、批次大小和损失值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用自定义数据训练Spacy的预定义NER模型，需要考虑复合因子、批次大小和损失值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用自定义数据训练Spacy的预定义NER模型，需要考虑复合因子、批次大小和损失值
EN