我有一些来自维基百科的数据
RHCP_data
V1 V2 V3 V4
1 bar:kiedis from:01/01/1983 till:01/11/1986 color:vocals
2 bar:kiedis from:01/12/1986 till:end color:vocals
3 bar:flea from:01/01/1983 till:end color:bass
4 bar:smith from:03/12/1988 till:end color:drums
5 bar:klinghoffer from:01/10/2009 till:end color:lead
6 bar:slovak from:01/01/1983 till:01/12/1983 color:lead
7 bar:slovak from:01/02/1985 till:25/06/1988 color:lead
...
...我试图使用tidyr删除变量名,这非常有用:
separate(RHCP_data, "V1", into = c("a", "b"), sep = ":")[2]
b
1 kiedis
2 kiedis
3 flea
4 smith
5 klinghoffer
6 slovak
7 slovak
...
...我想明白,为何这是行不通的。
for(i in 1:4){
RHCP_data[,i] <- separate(RHCP_data, paste0("V", i), into = c("a", "b"), sep = ":")[2][,1]
}我得到了一个错误:
Error: Invalid column specification显然,dataset是小的,所以在这种情况下它不是一个问题,但是我觉得tidyr或循环有一些地方我不理解。任何帮助都很感激。
发布于 2015-10-27 07:35:13
要将列作为变量传递,需要使用separate_而不是separate。
如果您想使用for循环,我建议:
lst = lapply(seq(ncol(df)), function(x) {
separate_(df, paste0('V', x), into = paste0(c("a", "b"), x), sep = ":")[x:(x+1)][,2]
})
data.frame(setNames(lst, names(df)))
# V1 V2 V3 V4
#1 kiedis 01/01/1983 01/11/1986 vocals
#2 kiedis 01/12/1986 end vocals
#3 flea 01/01/1983 end bass
#4 smith 03/12/1988 end drums
#5 klinghoffer 01/10/2009 end lead
#6 slovak 01/01/1983 01/12/1983 lead
#7 slovak 01/02/1985 25/06/1988 lead发布于 2015-10-27 07:34:55
我们可以简单地使用没有任何循环的cSplit。
library(splitstackshape)
DT <- cSplit(RHCP_data, 1:ncol(RHCP_data), ':')
DT[, seq(2, ncol(DT), by=2), with=FALSE]
# V1_2 V2_2 V3_2 V4_2
# 1: kiedis 01/01/1983 01/11/1986 vocals
#2: kiedis 01/12/1986 end vocals
#3: flea 01/01/1983 end bass
#4: smith 03/12/1988 end drums
#5: klinghoffer 01/10/2009 end lead
#6: slovak 01/01/1983 01/12/1983 lead
#7: slovak 01/02/1985 25/06/1988 leadhttps://stackoverflow.com/questions/33362006
复制相似问题