这是我第一次尝试spacy。我有一个spacy训练数据,它的形式如下。
[
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"Michael",
"tag":"-",
"ner":"U-PER"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"Irwin",
"tag":"-",
"ner":"U-PER"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"Jordan",
"tag":"-",
"ner":"U-PER"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"is",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"an",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"American",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"scientist",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"Professor",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"at",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"the",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"University",
"tag":"-",
"ner":"U-ORG"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"of",
"tag":"-",
"ner":"U-ORG"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"California",
"tag":"-",
"ner":"U-ORG"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"Berkeley",
"tag":"-",
"ner":"U-LOC"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"and",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"a",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"researcher",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"in",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"machine",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"learning",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"statistics",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"and",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"artificial",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"intelligence",
"tag":"-",
"ner":"O"
}
]
}
]
}
]
},
{
"id":0,
"paragraphs":[
{
"sentences":[
{
"tokens":[
{
"orth":"",
"tag":"",
"ner":"O"
}
]
}
]
}
]
}
]到目前为止,我看到的训练空间模型(https://spacy.io/usage/training#spacy-train-cli)的所有示例都使用以下类型的输入

谁能给出一个例子来训练第一种形式的空间输入
发布于 2019-09-12 15:39:46
我最近更新了IOB/NER转换器,并使用以下格式的相应训练数据输出创建了一组spacy convert -c iob接受的示例输入:
更新的转换器将在下一个版本中发布,但如果您想更早地尝试它,可以从源代码安装主分支。
https://stackoverflow.com/questions/57897258
复制相似问题