我正在为R中的数据转换而挣扎,我收到的数据就是这样的:
input <- data.frame(AF = sample(0:1, 100, replace=TRUE),
CAD = sample(0:1, 100, replace=TRUE),
CHF = sample(0:1, 100, replace=TRUE),
DEM = sample(0:1, 100, replace=TRUE),
DIAB = sample(0:1, 100, replace=TRUE))
input$Counts <- rowSums(input)我想要达到的结果是:
output <- data.frame(Condition = c('AF', 'CAD', 'CHF', 'DEM', 'DIAB'),
'1' = sample(11:20, 5, replace=TRUE),
'2' = sample(11:20, 5, replace=TRUE),
'3' = sample(11:20, 5, replace=TRUE),
'4' = sample(11:20, 5, replace=TRUE),
'5' = sample(11:20, 5, replace=TRUE))其中,交叉口是与条件(现在在第一列中)和行和(现在是单独的列)相匹配的观察计数。
我的解决方案在下面,但我想知道是否有一个更优雅的解决方案?
data.frame(Condition = colnames(input[ ,1:5]),
"One" = c(nrow(input[input$AF==1 & input$Counts==1,]),
nrow(input[input$CAD==1 & input$Counts==1,]),
nrow(input[input$CHF==1 & input$Counts==1,]),
nrow(input[input$DEM==1 & input$Counts==1,]),
nrow(input[input$DIAB==1 & input$Counts==1,])),
"Two" = c(nrow(input[input$AF==1 & input$Counts==2,]),
nrow(input[input$CAD==1 & input$Counts==2,]),
nrow(input[input$CHF==1 & input$Counts==2,]),
nrow(input[input$DEM==1 & input$Counts==2,]),
nrow(input[input$DIAB==1 & input$Counts==2,])),
"Three" = c(nrow(input[input$AF==1 & input$Counts==3,]),
nrow(input[input$CAD==1 & input$Counts==3,]),
nrow(input[input$CHF==1 & input$Counts==3,]),
nrow(input[input$DEM==1 & input$Counts==3,]),
nrow(input[input$DIAB==1 & input$Counts==3,])),
"Four" = c(nrow(input[input$AF==1 & input$Counts==4,]),
nrow(input[input$CAD==1 & input$Counts==4,]),
nrow(input[input$CHF==1 & input$Counts==4,]),
nrow(input[input$DEM==1 & input$Counts==4,]),
nrow(input[input$DIAB==1 & input$Counts==4,])),
"Five" = c(nrow(input[input$AF==1 & input$Counts==5,]),
nrow(input[input$CAD==1 & input$Counts==5,]),
nrow(input[input$CHF==1 & input$Counts==5,]),
nrow(input[input$DEM==1 & input$Counts==5,]),
nrow(input[input$DIAB==1 & input$Counts==5,])),
"Six" = c(nrow(input[input$AF==1 & input$Counts==6,]),
nrow(input[input$CAD==1 & input$Counts==6,]),
nrow(input[input$CHF==1 & input$Counts==6,]),
nrow(input[input$DEM==1 & input$Counts==6,]),
nrow(input[input$DIAB==1 & input$Counts==6,]))
)发布于 2017-03-13 12:07:41
也许你在找aggregate。
这里有一个解决办法。
myMat <- t(aggregate(.~Counts, data=input, FUN=sum)[-1,-1])
myMat
2 3 4 5 6
AF 3 10 15 15 2
CAD 2 14 16 18 2
CHF 2 14 18 16 2
DEM 4 8 16 18 2
DIAB 5 14 22 17 2aggregate的第一个参数,. ~ Counts是一个公式,它通过计数对每一列执行某些操作。第二个参数指定数据集,第三个参数指出所需的操作是sum。第一列和第一行将使用[-1, -1]从输出中删除,因为它们与所需的结果无关。然后用t转换输出。要更改列名,可以使用colnames<-
colnames(myMat) <- c("One", "Two", "Three", "Four", "Five")可复制数据
set.seed(1234)
input <- data.frame(AF = sample(0:1, 100, replace=TRUE),
CAD = sample(0:1, 100, replace=TRUE),
CHF = sample(0:1, 100, replace=TRUE),
DEM = sample(0:1, 100, replace=TRUE),
DIAB = sample(0:1, 100, replace=TRUE))
input$Counts <- rowSums(input)发布于 2017-03-13 12:24:55
您还可以使用dplyr和tidyr切换到和从长而宽的格式切换(尽管在这种特殊情况下,使用aggregate更容易):
library(dplyr)
library(tidyr)
# take the input dataset
input %>%
# transform to long format
gather(condition, measurement,AF:DIAB) %>%
# summarise by Counts and condition
group_by(Counts, condition) %>%
summarise(measure = sum(measurement)) %>%
# transform back to the desired wide format
spread(Counts, measure)https://stackoverflow.com/questions/42762789
复制相似问题