我正在尝试从这个解析代码中解析.ConLL文件,这是一个解析代码的例子:
from io import open
from conllu import parse_tree_incr
import glob
import os
for filename in glob.glob('./licenses-conll-format/22-MIT/MIT_permissionCopy.conll'):
data_file=open(filename, "r", encoding="utf-8")
for tokentree in parse_incr(data_file):
print(tokentree.serialize())产出:
24 Permission _ NN NN _ 27 nsubjpass _ _
25 is _ VBZ VBZ _ 27 auxpass _ _
26 hereby _ RB RB _ 27 advmod _ _
27 granted _ VBN VBN _ 11 rcmod _ _
28 , _ , , _ 27 punct _ _
29 free _ JJ JJ _ 27 advmod _ _
30 of _ IN IN _ 0 erased _ _
31 charge _ NN NN _ 29 prep_of _ _这似乎缺少一些注释(I-权限、B-权限等)来自原始.conll文件:
24 Permission _ NN NN _ 27 nsubjpass _ _ B-PERMISSION COPY
25 is _ VBZ VBZ _ 27 auxpass _ _ I-PERMISSION
26 hereby _ RB RB _ 27 advmod _ _ I-PERMISSION
27 granted _ VBN VBN _ 11 rcmod _ _ I-PERMISSION
28 , _ , , _ 27 punct _ _ O
29 free _ JJ JJ _ 27 advmod _ _ I-PERMISSION
30 of _ IN IN _ 0 erased _ _ I-PERMISSION
31 charge _ NN NN _ 29 prep_of _ _ I-PERMISSION
32 , _ , , _ 27 punct _ _ O对如何获得所有注释有什么想法吗?
发布于 2020-04-03 11:24:56
您可以自己指定字段的元组:
fields = ('id', 'form', 'lemma', 'upostag', 'xpostag', 'feats', 'head', 'deprel', 'deps', 'misc', 'rest')
for tokentree in parse_incr(data_file, fields=fields):
print(tokentree.serialize())产出:
24 Permission _ NN NN _ 27 nsubjpass _ _ B-PERMISSION
25 is _ VBZ VBZ _ 27 auxpass _ _ I-PERMISSION
26 hereby _ RB RB _ 27 advmod _ _ I-PERMISSION
27 granted _ VBN VBN _ 11 rcmod _ _ I-PERMISSIONhttps://stackoverflow.com/questions/61010075
复制相似问题