我收集到的数据来自亚马逊的机械土耳其,它有一个名为"LifeTimeApprovalRate“的列向量。该列包含信息。
head(ES$LifetimeApprovalRate)
[1] [1] "100% (32/32)" "50% (16/32)" "100% (11/11)" "100% (4/4)"`我想使用以下信息创建三个新变量:
ES$rate: "100%" "50%" "100%" "100%"
ES$approve: "32" "16" "11" "4"
ES$total: "32" "32" "11" "4"恐怕我所尝试的任何事情都会创造出这些很难管理成有用的可怕的清单。
发布于 2015-06-24 14:41:00
tidyr的separate在这类事情上也很方便:
library(tidyr)
> dat <- data.frame(x = 1:4,y = c("100% (32/32)", "50% (16/32)", "100% (11/11)", "100% (4/4)"))
> separate(dat,y,c("rate","approve","total"),sep = "[()/ ]+",extra = "drop")
x rate approve total
1 1 100% 32 32
2 2 50% 16 32
3 3 100% 11 11
4 4 100% 4 4发布于 2015-06-24 14:36:37
你可以试试strsplit
nm1 <- c('rate', 'approve', 'total')
ES[nm1] <- do.call(rbind,
strsplit(as.character(ES$LifetimeApprovalRate),'[()/ ]+'))
ES[nm1[-1]] <- lapply(ES[nm1[-1]], as.numeric)
ES
# LifetimeApprovalRate rate approve total
#1 100% (32/32) 100% 32 32
#2 50% (16/32) 50% 16 32
#3 100% (11/11) 100% 11 11
#4 100% (4/4) 100% 4 4下面有一个类似的选项,使用devel版本的data.table,即v1.9.5。安装开发版本的说明是here。在这里,我们使用tstrsplit来拆分列'LifetimeApprovalRate‘,并将输出列分配给新列('nm1')。还有转换列类的选项type.convert=TRUE。
library(data.table)#v1.9.5+
setDT(ES)[, (nm1):=tstrsplit(LifetimeApprovalRate,'[()/ ]+', type.convert=TRUE)]
# LifetimeApprovalRate rate approve total
#1: 100% (32/32) 100% 32 32
#2: 50% (16/32) 50% 16 32
#3: 100% (11/11) 100% 11 11
#4: 100% (4/4) 100% 4 4数据
ES <- structure(list(LifetimeApprovalRate = structure(c(2L, 4L, 1L,
3L), .Label = c("100% (11/11)", "100% (32/32)", "100% (4/4)",
"50% (16/32)"), class = "factor")), .Names = "LifetimeApprovalRate",
row.names = c(NA, -4L), class = "data.frame")https://stackoverflow.com/questions/31029459
复制相似问题