有没有办法将用户自定义单词的字典添加到udpipe models中
例如,下面使用默认的english模型,一些单词应该已经被标识为关键字,如R,Python,SQL,javascript,Excel,noSQL。
我想用我自己的自定义单词来扩充默认的english模型,以便textrank_keywords函数能够更好地识别相关的关键字。
library(udpipe)
library(dplyr)
tagger <- udpipe_download_model("english")
tagger <- udpipe_load_model(tagger$file_model)
# read data
rawdata <- c("Automating and R/Python package development.","You have a sound knowledge of another data analysis language (R,Python, SQL, javascript) and you don't care in which relational database, Excel, bigdata or noSQL store your data is located.")
# annotate
rawdata_annotate <- udpipe_annotate(tagger, rawdata) %>% as_tibble()
keyw <- textrank_keywords(rawdata_annotate$lemma,
relevant = rawdata_annotate$upos %in% c("PROPN","NOUN", "VERB", "ADJ"))
have <- keyw$terms
[1] "package" "analysis" "sound" "relational"
rawdata_annotate %>% dplyr::filter(token %in% c('R', 'Python', 'SQL', 'javascript', 'Excel', 'noSQL')) %>% dplyr::select(token, lemma, upos)
token lemma upos
<chr> <chr> <chr>
1 R R PROPN
2 Python python NOUN
3 R r NOUN
4 Python python NOUN
5 SQL sql NOUN
6 javascript javascript NOUN
7 Excel Excel PROPN
8 noSQL nosql AUX 发布于 2021-06-11 22:38:50
我想我找到答案了。基本上,我需要为自定义注释创建一个自定义CONLL-U文件。然后训练模型。
https://bnosac.github.io/udpipe/docs/doc3.htmlhttps://stackoverflow.com/questions/67936661
复制相似问题