我目前正在研究一个NER模型。我有一堆以CoNLL格式存储的数据,需要转换成Spacy格式。在CoNLL中,句子的每个单词旁边都有一个标签。在Spacy中,标记只显示在具有实际标记的单词中。如何从以下(CoNLL)格式转换
From O
2001 B-DateTime
to I-DateTime
2004 I-DateTime
, O
I O
was O
a O
stagehand O
for O
Hartford B-Company
Stage I-Company
Company O
. O到以下格式(Spacy)
TRAIN_DATA = [('what is the price of polo?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of ball?', {'entities': [(21, 25, 'PrdName')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'PrdName')]}),
('what is the price of jeans?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bat?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of shirt?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of bag?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of cup?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of jug?', {'entities': [(21, 24, 'PrdName')]}),
('what is the price of plate?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of glass?', {'entities': [(21, 26, 'PrdName')]}),
('what is the price of watch?', {'entities': [(21, 26, 'PrdName')]})]发布于 2021-07-27 05:07:13
只需使用空间变换。
spacy convert input.conll -c conll ./output/注意,默认情况下,这会生成一个二进制.spacy文件。JSON格式在v3中不受欢迎,对它没有多大帮助。
https://stackoverflow.com/questions/68524723
复制相似问题