我有一个叫做Identifier的向量
c("NC.1.OA", "NC.1.OA.0", "NC.1.OA.1", "NC.1.OA.1.a", "NC.1.OA.1.b",
"NC.1.OA.1.c", "NC.1.OA.2", "NC.1.OA.2.0", "NC.1.OA.3", "NC.1.OA.4"
)我想提取出OA
我试过:
gsub(".*\\.(.*)\\..*", "\\1", Identifier)基本上,我想在第二节和第三节之间抽出课文。如果只有两个周期(NC.1.OA),我想提取出第二个周期之后的所有内容。
发布于 2018-11-12 06:19:32
重复(非句点,后面跟着句点)两次,然后捕获非句点,您想要的子字符串在捕获的组中:
Identifier = c("NC.1.OA", "NC.1.OA.0", "NC.1.OA.1", "NC.1.OA.1.a", "NC.1.OA.1.b",
"NC.1.OA.1.c", "NC.1.OA.2", "NC.1.OA.2.0", "NC.1.OA.3", "NC.1.OA.4"
)
gsub("(?:[^.]+\\.){2}([^.]+).*", "\\1", Identifier)输出:
[1] "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA"为了详细说明,(?:[^.]+\\.)是一个组,它匹配非句点字符,然后匹配单个句点.组后面的{2}意味着重复前面的令牌(组)两次--即“非句点,后面跟着句点,然后是非句点,然后是句点”。然后,最终的([^.]+)匹配超过第二个句点的非句点字符,从而匹配第二个句点和第三个句点(或字符串结束)之间的非句点。
发布于 2018-11-12 06:18:20
下面是使用sub与apply一起使用strsplit的一种替代方法
sapply(Identifier, function(x) unlist(strsplit(x, "\\."))[3])
NC.1.OA NC.1.OA.0 NC.1.OA.1 NC.1.OA.1.a NC.1.OA.1.b NC.1.OA.1.c
"OA" "OA" "OA" "OA" "OA" "OA"
NC.1.OA.2 NC.1.OA.2.0 NC.1.OA.3 NC.1.OA.4
"OA" "OA" "OA" "OA" 发布于 2018-11-12 06:45:41
我们也可以试试stringr:
Identifier = c("NC.1.OA", "NC.1.OA.0", "NC.1.OA.1", "NC.1.OA.1.a", "NC.1.OA.1.b",
"NC.1.OA.1.c", "NC.1.OA.2", "NC.1.OA.2.0", "NC.1.OA.3", "NC.1.OA.4"
)
library(stringr)
str_extract(Identifier, ".OA.")
# [1] NA ".OA." ".OA." ".OA." ".OA." ".OA." ".OA." ".OA." ".OA." ".OA."
str_extract(Identifier, "OA")
# [1] "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA"
gsub('\\.', '', str_extract(Identifier, ".OA.?"))
# [1] "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA" "OA"https://stackoverflow.com/questions/53256738
复制相似问题