你好,如何从文本中提取两个破折号之间的数字?
下面是一个示例数据集:
text.var <- c("abd-GEN-eft-na-M-D-BINED-10-XX1","abd-GEN-eft-na-M-D-BINED-2-XX2","abd-GEN-eft-na-M-D-BINED-3-XX1")
id <- c(1,2,3)
data <- data.frame("id"=id,"text"=text.var)
> data
id text
1 1 abd-DEF-eft-na-M-D-BINED-10-XX1
2 2 abd-DEF-eft-na-M-D-BINED-2-XX2
3 3 abd-DEF-eft-na-M-D-BINED-3-XX1我想提取"-"s之间的数字。我希望得到的结果是:
> data
id text number
1 1 abd-DEF-eft-na-M-D-BINED-10-XX1 10
2 2 abd-DEF-eft-na-M-D-BINED-2-XX2 2
3 3 abd-DEF-eft-na-M-D-BINED-3-XX1 3有人能给点提示吗?
谢谢
发布于 2019-09-19 17:11:46
您可以使用来自"stringr“包的str_extract函数:
library(stringr)
str_extract(text.var, "(?<=-)[0-9]+(?=-)")(?<= )和(?= )是字符串,向后看和向前看选项。
发布于 2019-09-19 17:11:12
您可以使用sub和正则表达式来完成这一任务。
text.var <- c("abd-GEN-eft-na-M-D-BINED-10-XX1","abd-GEN-eft-na-M-D-BINED-2-XX2","abd-GEN-eft-na-M-D-BINED-3-XX1")
id <- c(1,2,3)
number = as.numeric(sub(".*-(\\d+)-.*", "\\1", text.var))
data <- data.frame("id"=id,"text"=text.var, number)
data
id text number
1 1 abd-GEN-eft-na-M-D-BINED-10-XX1 10
2 2 abd-GEN-eft-na-M-D-BINED-2-XX2 2
3 3 abd-GEN-eft-na-M-D-BINED-3-XX1 3一些额外的细节
在正则表达式中,-\\d+-选择一个由虚线包围的数字序列。我在\d部件周围加上括号,以存储被发现用于获取-(\\d+)-的数字。.*在-(\\d+)-之前和之后匹配所有其他字符。因此,sub将用数字替换整个字符串。给出带有数字的字符串。我用as.numeric把这些变成数字而不是字符串。
发布于 2019-09-19 17:25:26
我们可以使用str_extract
library(stringr)
library(dplyr)
data %>%
mutate(number = as.numeric(str_extract(text, "\\d+(?=-)")))
# id text number
#1 1 abd-GEN-eft-na-M-D-BINED-10-XX1 10
#2 2 abd-GEN-eft-na-M-D-BINED-2-XX2 2
#3 3 abd-GEN-eft-na-M-D-BINED-3-XX1 3https://stackoverflow.com/questions/58015866
复制相似问题