我很难处理GBIF和NatureServe使用的分类法中的不一致之处。例如,阿芙罗狄特种( Speyeria)在NatureServe中被列为aphrodite (这个属级的同义词影响到所有的石杉)。我可以看到,NatureServe,值得赞扬,列出了一个名字,所以匹配这些不是不可能的。但是我有一些物种的名字,比如Tecmessa,在这里,被接受的属名是NatureServe API的NatureServe字段中的Cerura。我想自动匹配这些,因为它们基本上是一个同义词的1。
我想尽可能地将这些同义词与物种水平的聚集和分裂等问题分开处理。是的,理想的情况下,我会有一些解决方案,也会提醒我一个糟糕的匹配,因为,比方说,一个物种被NatureServe识别,它只被列为其他分类中的亚种,例如白鹭。使用relatedItisNames和conceptName字段,我可能会将其与Polites混为一谈,并将这两个Polite映射到gbif主干中的Wallen手雷otho。对于这个问题,我觉得我不需要回答如何处理这种情况,因为处理物种级的聚集/分裂问题要困难得多,而且我也不知道一个程序化的答案是否合适。相反,寻找解决gbif主干和NatureServe中未完全交叉引用的同义词的答案。
我如何通过R使用自然保护区、GBIF和其他API来解析Natureserve和其他被广泛接受的分类法之间的1:1同义词?R包分类、rgbif和natserv似乎都是很好的候选工具。
# this matches the genus *Argynnis*, which is valid but not what I'm looking for (i.e., not a species)
rgbif::name_backbone_verbose("Argynnis aphrodite")
# this doesn't find the match exactly
taxize::get_ids("Argynnis aphrodite")
# this doesn't include the taxon concept info I need
natserv::ns_search_spp("Argynnis aphrodite")
### how can I link it all together?发布于 2022-10-11 17:50:38
原谅这个特殊的密码。当时间允许的时候,我会编辑得更少,更笼统。但是,在大多数情况下,迭代搜索NS Explorer字段可以很好地使用1:1同义词,但不会标记拆分或其他尚未纳入gbif主干分类法的分类更改。在这里,我实现了一种使我现在能够向前迈进的东西:
library(tidyverse)
# define some helper functions
# return NA instead of character(0) or NULL
null_to_NA <- function(x){
c(x, NA)[1]
}
`%ni%` <- Negate(`%in%`)
# get example data
pleps <- purrr::map_dfr(c("Argynnis aphrodite"
, "Polities egermet"
, "Polites otho"
, "Tecmessa scitiscripta"
, "Peridea bordeloni")
, function(lep){
natserv::ns_search_spp(
lep
, location = list(nation ="US"
, subnation ="MD"))[[1]] %>%
unnest(cols = nations) %>%
unnest(cols = "subnations"
, names_repair ="unique") %>%
filter(subnationCode == "MD") %>%
separate(scientificName
, sep =" "
, into =c("genus", "species")
, remove = FALSE) %>%
mutate(gs = paste(genus, species, sep = "_")
, withspace = paste(genus
, species
, sep = " ")) %>%
filter(species %ni% c("x", "var", " ", "") &
!grepl("^[^[:alnum:]]", species))
})
# function to iteratively search NS fields for alternate names
alt_gs <- function(dat){
map_dfr(1:nrow(dat)[[1]]
, function(lrow){
# get the ns taxon concept info
ns_deets <- natserv::ns_altid(as.character(dat[lrow, "uniqueId"]))
# start with itis
itis.name <- null_to_NA(
stringr::str_replace(ns_deets$relatedItisNames
, "(.*\\<i\\>)(.*)(\\<\\/i\\>.*)"
, "\\2"))
# if itis doesn't work, try conceptName
better.name <- if_else(is.na(itis.name)
, null_to_NA(
stringr::str_replace(ns_deets$conceptName
, "(.*\\<i\\>)(.*)(\\<\\/i\\>.*)"
, "\\2"))
, itis.name)
# goal is to line up with gbif backbone
gbif.name <- rgbif::name_backbone(better.name)
natserv.name <- dat[lrow, "withspace"]
return(data.frame(natserv.name, gbif.name))
})
}
# run it
alt_gs(pleps)
# ok that I have NA for Peridea here
# not trying to solve the splitting for Polites
# perfect isn't the goal, just want to deal with trivial mismatches somehow. https://stackoverflow.com/questions/74032178
复制相似问题