在R包中,如果我们编写如下代码:
library(udpipe)
x <- udpipe("The economy is weak but the outlook is bright. the property market will be booming next year", "english")结果是:
doc_id paragraph_id sentence_id sentence start end term_id token_id token lemma upos
1 doc1 1 1 The economy is weak but the outlook is bright 1 3 1 1 The the DET
2 doc1 1 1 The economy is weak but the outlook is bright 5 11 2 2 economy economy NOUN
3 doc1 1 1 The economy is weak but the outlook is bright 13 14 3 3 is be AUX
4 doc1 1 1 The economy is weak but the outlook is bright 16 19 4 4 weak weak ADJ
5 doc1 1 1 The economy is weak but the outlook is bright 21 23 5 5 but but CCONJ
6 doc1 1 1 The economy is weak but the outlook is bright 25 27 6 6 the the DET
7 doc1 1 1 The economy is weak but the outlook is bright 29 35 7 7 outlook outlook NOUN
8 doc1 1 1 The economy is weak but the outlook is bright 37 38 8 8 is be AUX
9 doc1 1 1 The economy is weak but the outlook is bright 40 45 9 9 bright bright ADJ
xpos feats head_token_id dep_rel deps misc
1 DT Definite=Def|PronType=Art 2 det <NA> <NA>
2 NN Number=Sing 4 nsubj <NA> <NA>
3 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 cop <NA> <NA>
4 JJ Degree=Pos 0 root <NA> <NA>
5 CC <NA> 9 cc <NA> <NA>
6 DT Definite=Def|PronType=Art 7 det <NA> <NA>
7 NN Number=Sing 9 nsubj <NA> <NA>
8 VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 9 cop <NA> <NA>
9 JJ Degree=Pos 4 conj <NA> SpacesAfter=\\n我读了一遍https://universaldependencies.org/ext-feat-index.html。但我还是不明白壮举在这里意味着什么?
发布于 2019-11-11 09:41:44
这些都是词的形态特征。例如,名词的性别、数字和大小写;动词的人称、数字、体等。
普遍属地注释的这一部分根本不是通用的。您引用的页面包含所有可以在UD中的所有语言中出现的形态特征。它们大多不适用于大多数语言,有些现象可能在不同的树丛中以不同的名称出现多次。为了使这种情况更加棘手,一些经过UDPipe训练的树岸根本不包含形态学特征。当然,UDPipe只包含了它可以从树丛中学到的东西。
UD包含六种不同的英语树库,因此在UDPipe中也有六种不同的模型。UD网页概述解释了树岸的不同之处,也解释了用于英语的形态特征。英语的缺省值是UD_English-EWT。
https://stackoverflow.com/questions/58795322
复制相似问题