我想将data.frame中的变量名从"pmm_StartTimev4_E2_C19_1“更改为"pmm_StartTimev4_E2_C19”。因此,如果名称以下划线结尾,后面跟着任意数字,则它将被移除。
但是,我希望只有当变量名在中包含单词“”时,才会发生这种情况。
我有一些乱七八糟的代码不起作用。任何帮助都将不胜感激!
# Current data frame:
dfbefore <- data.frame(a=c("pmm_StartTimev4_E2_C19_1","pmm_StartTimev4_E2_E2_C1","delivery_C1_C12"),b=c("pmm_StartTo_v4_E2_C19_2","complete_E1_C12_1","pmm_StartTo_v4_E2_C19"))
# Desired data frame:
dfafter <- data.frame(a=c("pmm_StartTimev4_E2_C19","pmm_StartTimev4_E2_E2_C1","delivery_C1_C12"),b=c("pmm_StartTo_v4_E2_C19","complete_E1_C12_1","pmm_StartTo_v4_E2_C19"))
# Current code:
sub((.*{1,}[0-9]*).*","",grep("Start",names(df),value = TRUE)发布于 2018-06-13 19:00:22
使用gsub()这样的东西怎么样?
stripcol <- function(x) {
gsub("(.*Start.*)_\\d+$", "\\1", as.character(x))
}
dfnew <- dfbefore
dfnew[] <- lapply(dfbefore, stripcol)我们使用正则表达式查找"Start“,然后获取除了末尾的下划线数字以外的所有内容。我们使用lapply将该函数应用于所有列。
发布于 2018-06-13 18:59:55
我们可以使用sub捕获'Start‘子字符串也存在的组,后面跟着一个下划线和一个或多个数字。在替换中,使用捕获组的反向引用。由于有多个列,所以使用lapply循环这些列,应用sub并将输出分配给原始数据。
out <- dfbefore
out[] <- lapply(dfbefore, sub,
pattern = "^(.*_Start.*)_\\d+$", replacement ="\\1")
out
dfafter[] <- lapply(dfafter, as.character)
all.equal(out, dfafter, check.attributes = FALSE)
#[1] TRUE发布于 2018-06-13 19:01:57
doit <- function(x){
x <- as.character(x)
if(grepl("Start",x)){
x <- gsub("_([0-9])","",x)
}
return(x)
}
apply(dfbefore,c(1,2),doit)a b [1,] "pmm\_StartTimev4\_E2\_C19" "pmm\_StartTo\_v4\_E2\_C19" [2,] "pmm\_StartTimev4\_E2\_E2\_C1" "complete\_E1\_C12\_1" [3,] "delivery\_C1\_C12" "pmm\_StartTo\_v4\_E2\_C19"
https://stackoverflow.com/questions/50844306
复制相似问题