为了计算影响大小并对连续结果(d或g)的二分预测器进行元分析,需要为每项研究提供一个由平均值、sd和样本大小组成的数据。
我试图编写一些代码,这些代码将根据原始数据创建所需的数据。这将意味着每项研究都不必手工完成这一过程。
示例原始数据集
Study <- c("andrew", "andrew", "andrew", "andrew", "peters", "peters", "peters", "jess", "jess", "jess")
Score = c(100, 308, 584, 241, 241, 111, 431, 123, 321, 411)
Sex = c(1, 1, 1, 2, 2, 1, 2, 2, 1, 1)
data = cbind(Score, Sex, Study)
data
> Score Sex Study
> [1,] "100" "1" "andrew"
> [2,] "308" "1" "andrew"
> [3,] "584" "1" "andrew"
> [4,] "241" "2" "andrew"
> [5,] "241" "2" "peters"
> [6,] "111" "1" "peters"
> [7,] "431" "2" "peters"
> [8,] "123" "2" "jess"
> [9,] "321" "1" "jess"
> [10,] "411" "1" "jess" 我怎样才能把它变成下面的元数据文件,用于按性别和学习来划分数据?
Study MeanMale MeanFemale SDMale SDfemale NrowsMale NrowsFemale
andrew X X X X X X
peters X X X X X X
jess X X X X X X我可以想象,在Splitdata中使用describeBy、statsBy或Splitdata是可行的,但将其转换成所需的格式是很麻烦的。下一个目标将是引入一个年度专栏,例如,
Study <- c("andrew", "andrew", "andrew", "andrew", "peters", "peters", "peters", "jess", "jess", "jess")
Score = c(100, 308, 584, 241, 241, 111, 431, 123, 321, 411)
Sex = c(1, 1, 1, 2, 2, 1, 2, 2, 1, 1)
Year = (1992, 1992, 1992, 1992, 1988, 1988, 1988, 1977, 1977, 1977)
data = cbind(Study, Year, Score, Sex) 若要生成以下data.frame
Study Year MeanMale MeanFemale SDMale SDfemale NrowsMale NrowsFemale
andrew 1992 X X X X X X
peters 1988 X X X X X X
jess 1977 X X X X X X发布于 2015-06-14 05:30:29
我们可以使用data.table的开发版本,即v1.9.5。安装开发版本的说明是here。
我们将“data.frame”转换为“data.table”(setDT(data)),按“性别”和“学习”进行分组,得到mean、sd和.N (nrows),并使用dcast (可以使用多个value.var列)重新塑造“long”到“wide”格式。
library(data.table)#v1.9.5+
dcast(setDT(data)[, list(Mean= mean(Score), SD= sd(Score), Nrows=.N),
.(Sex, Study)], Study~ c('Male', 'Female')[Sex],
value.var=c('Mean', 'SD', 'Nrows'))
# Study Female_Mean Male_Mean Female_SD Male_SD Female_Nrows Male_Nrows
#1: andrew 241 330.6667 NA 242.79484 1 3
#2: jess 123 366.0000 NA 63.63961 1 2
#3: peters 336 111.0000 134.3503 NA 2 1编辑
在@Arun的注释中,来自data.table的data.table也接受多个函数。
dcast(setDT(data), Study ~ c('Male', 'Female')[Sex],
fun.agg=list(mean, sd, length), value.var="Score")
# Study Female_mean_Score Male_mean_Score Female_sd_Score Male_sd_Score
#1: andrew 241 330.6667 NA 242.79484
#2: jess 123 366.0000 NA 63.63961
#3: peters 336 111.0000 134.3503 NA
# Female_length_Score Male_length_Score
#1: 1 3
#2: 1 2
#3: 2 1或者我们可以使用reshape从base R获得mean,sd,nrow使用aggregate。
d1 <- do.call(data.frame,aggregate(Score~., transform(data, Sex=c('Male',
'Female')[Sex]), FUN=function(x) c(Mean=mean(x), SD=sd(x), Nrows=length(x))))
reshape(d1, idvar='Study', timevar='Sex', direction='wide')
# Study Score.Mean.Female Score.SD.Female Score.Nrows.Female Score.Mean.Male
#1 andrew 241 NA 1 330.6667
#3 jess 123 NA 1 366.0000
#5 peters 336 134.3503 2 111.0000
# Score.SD.Male Score.Nrows.Male
#1 242.79484 3
#3 63.63961 2
#5 NA 1数据
data <- data.frame(Score, Sex, Study)发布于 2015-06-14 05:39:00
这对于dplyr和reshape2来说是非常直接的。我们将性别转换为一个命名的因素,使用变异按组获得SD和样本大小,然后熔化和转换数据,按组获得具有良好的变量名称的方法:
require(reshape2); require(dplyr)
data$Sex <- factor(data$Sex, levels = c(1, 2), labels = c('Male', 'Female'))
data <- mutate(group_by(data, Study), SD = sd(Score), Nrow = length(Score))
data <- melt(data, id.vars = c('Study', 'Sex'))
data$value <- as.numeric(data$value)
dcast(data, Study ~ variable + Sex, mean, na.rm = TRUE)https://stackoverflow.com/questions/30825742
复制相似问题