我有一个具有以下行命名方案的数据集:
a.X.V
where:
a is a fixed-length core ID
X is a variable-length string that subsets a, which means I should keep X
V is a variable-length ID which specifies the individual elements of a.X to be averaged
. is one of {-,_}我要做的是取所有a.X's的列平均值。示例:
sampleList <- list("a.12.1"=c(1,2,3,4,5), "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), "b.1.555"=c(6,8,9,0,6))
sampleList
$a.12.1
[1] 1 2 3 4 5
$b.1.23
[1] 3 4 1 4 5
$a.12.21
[1] 5 7 2 8 9
$b.1.555
[1] 6 8 9 0 6目前,我正在手动生成.Vs,以获得一个通用列表:
sampleList <- t(as.data.frame(sampleList))
y <- rowNames(sampleList)
y <- gsub("(\\w\\.\\d+)\\.d+", "\\1", y)有没有更快的方法来做这件事?
这是我在工作流程中遇到的两个问题中的一半。另一半回答是here。
发布于 2012-10-19 22:43:59
您可以使用模式矢量来查找要分组的列的位置。我加入了一个我知道不会匹配任何东西的模式,以表明该解决方案对这种情况是健壮的。
# A *named* vector of patterns you want to group by
patterns <- c(a.12="^a.12",b.12="^b.12",c.12="^c.12")
# Find the locations of those patterns in your list
inds <- lapply(patterns, grep, x=names(sampleList))
# Calculate the mean of each list element that matches the pattern
out <- lapply(inds, function(i)
if(l <- length(i)) Reduce("+",sampleList[i])/l else NULL)
# Set the names of the output
names(out) <- names(patterns)发布于 2012-10-19 22:47:13
也许您可以考虑处理数据结构,以便更容易地应用一些标准工具:
sampleList <- list("a.12.1"=c(1,2,3,4,5),
"b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9),
"b.1.555"=c(6,8,9,0,6))
library(reshape2)
m1 <- melt(do.call(cbind,sampleList))
m2 <- cbind(m1,colsplit(m1$Var2,"\\.",c("coreID","val1","val2")))结果如下所示:
head(m2)
Var1 Var2 value coreID val1 val2
1 1 a.12.1 1 a 12 1
2 2 a.12.1 2 a 12 1
3 3 a.12.1 3 a 12 1然后,您可以更轻松地执行以下操作:
aggregate(value~val1,mean,data=subset(m2,coreID=="a"))发布于 2012-10-19 22:52:36
如果你想把你的“a”、“X”和“V”放到自己的列中,那么R就准备好做这件事了。然后,您可以使用ave、by、aggregate、subset等。
data.frame(do.call(rbind, sampleList),
do.call(rbind, strsplit(names(sampleList), '\\.')))
# X1 X2 X3 X4 X5 X1.1 X2.1 X3.1
# a.12.1 1 2 3 4 5 a 12 1
# b.1.23 3 4 1 4 5 b 1 23
# a.12.21 5 7 2 8 9 a 12 21
# b.1.555 6 8 9 0 6 b 1 555https://stackoverflow.com/questions/12976569
复制相似问题