text <- c('d__Viruses|f__Closteroviridae|g__Closterovirus|s__Citrus_tristeza_virus',
'd__Viruses|o__Tymovirales|f__Alphaflexiviridae|g__Mandarivirus|s__Citrus_yellow_vein_clearing_virus',
'd__Viruses|o__Ortervirales|f__Retroviridae|s__Columba_palumbus_retrovirus')我试过,但失败了:
str_extract(text, pattern = 'f.*\\|') 我怎么才能得到
f__Closteroviridae
f__Alphaflexiviridae
f__Retroviridae
任何帮助都将是非常感谢的!
发布于 2020-09-25 03:23:13
使正则表达式不贪婪,并且由于您不希望"|"在最终的输出中使用积极的前瞻性。
stringr::str_extract(text, 'f.*?(?=\\|)')
#[1] "f__Closteroviridae" "f__Alphaflexiviridae" "f__Retroviridae" 在基本R中,我们可以使用sub:
sub('.*(f_.*?)\\|.*', '\\1', text)
#[1] "f__Closteroviridae" "f__Alphaflexiviridae" "f__Retroviridae" 发布于 2020-09-25 03:31:39
对于基本的R解决方案,我将使用regmatches和gregexpr
m <- gregexpr("\\bf__[^|]+", text)
as.character(regmatches(text, m))
[1] "f__Closteroviridae" "f__Alphaflexiviridae" "f__Retroviridae"如上所述,使用gregexpr的优点是,如果输入包含多个f__匹配项,我们也可以捕获它。例如:
x <- 'd__Viruses|f__Closteroviridae|g__Closterovirus|f__some_virus'
m <- gregexpr("\\bf__[^|]+", x)
regmatches(x, m)[[1]]
[1] "f__Closteroviridae" "f__some_virus" 数据:
text <- c('d__Viruses|f__Closteroviridae|g__Closterovirus|s__Citrus_tristeza_virus',
'd__Viruses|o__Tymovirales|f__Alphaflexiviridae|g__Mandarivirus|s__Citrus_yellow_vein_clearing_virus',
'd__Viruses|o__Ortervirales|f__Retroviridae|s__Columba_palumbus_retrovirus')https://stackoverflow.com/questions/64057330
复制相似问题