我是R的新手,所以我有一些问题要修改我的数据文件:
id <- c(1, 2,3,4,5,6,7,8,9,10)
number <- c(1,1,1,1,1,1,8,8,2,2)
country <- c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Belgium", "Belgium")
year <- c(2010,2010,2011,2011,2010,2010,2009,2009,1996,1996)
sex <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
disease <- c("hiv","hiv","hiv","hiv","cancer","cancer","cancer","cancer","tubercolosis","tubercolosis")
value <- c(15,1,0,2,50,120,600,47,0,0)我想要的是类似的数据,但是有5个新行,它们表示Value列的M和F之和。就像这样:
id <- c(1, 2,3,4,5,6,7,8,9,10,11,12,13,14,15)
number <- c(1,1,1,1,1,1,8,8,2,2,1,1,1,8,2)
country <- c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Belgium", "Belgium","France", "France", "France", "Spain", "Belgium")
year <- c(2010,2010,2011,2011,2010,2010,2009,2009,1996,1996,2010,2011,2010,2009,1996)
sex <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F","T","T","T","T","T")
disease <- c("hiv","hiv","hiv","hiv","cancer","cancer","cancer","cancer","tubercolosis","tubercolosis","hiv","hiv","cancer","cancer","tubercolosis")
value <- c(15,1,0,2,50,120,600,47,0,0,16,2,170,647,0)非常清楚:
> whatIhave
id number country year sex disease value
1 1 1 France 2010 M hiv 15
2 2 1 France 2010 F hiv 1
3 3 1 France 2011 M hiv 0
4 4 1 France 2011 F hiv 2
5 5 1 France 2010 M cancer 50
6 6 1 France 2010 F cancer 120
7 7 8 Spain 2009 M cancer 600
8 8 8 Spain 2009 F cancer 47
9 9 2 Belgium 1996 M tubercolosis 0
10 10 2 Belgium 1996 F tubercolosis 0
> whatIwant
id number country year sex disease value
1 1 1 France 2010 M hiv 15
2 2 1 France 2010 F hiv 1
3 3 1 France 2011 M hiv 0
4 4 1 France 2011 F hiv 2
5 5 1 France 2010 M cancer 50
6 6 1 France 2010 F cancer 120
7 7 8 Spain 2009 M cancer 600
8 8 8 Spain 2009 F cancer 47
9 9 2 Belgium 1996 M tubercolosis 0
10 10 2 Belgium 1996 F tubercolosis 0
11 11 1 France 2010 T hiv 16
12 12 1 France 2011 T hiv 2
13 13 1 France 2010 T cancer 170
14 14 8 Spain 2009 T cancer 647
15 15 2 Belgium 1996 T tubercolosis 0它为列T创建了一个新的sex值,指示sum F + M。新的5行是最新的5行,有5行,因为我必须为每个country添加F和M值,包括year和disease。Number与国家有关。Id只表示每一行的id。我的数据框架显然比这个大得多。
我该怎么做?谢谢
发布于 2016-06-20 23:00:00
下面是一个使用data.table方法的快速解决方案:
library(data.table)
# calculate the sums and store it in a separate data table dtpart2
dtpart2 <- setDT(df)[ , .(value= sum(value)), by = .(number, country, year, disease)]
# create columns of sex and id
dtpart2[, id := max(df$id)+1: nrow(dtpart2) ][, sex := "T"]
# set the same column order as in the original data frame
setcolorder(dtpart2, names(df))
# Append the two data sets
newdata <- rbind(df,dtpart2)
#> id number country year sex disease value
#> 1: 1 1 France 2010 M hiv 15
#> 2: 2 1 France 2010 F hiv 1
#> 3: 3 1 France 2011 M hiv 0
#> 4: 4 1 France 2011 F hiv 2
#> 5: 5 1 France 2010 M cancer 50
#> 6: 6 1 France 2010 F cancer 120
#> 7: 7 8 Spain 2009 M cancer 600
#> 8: 8 8 Spain 2009 F cancer 47
#> 9: 9 2 Belgium 1996 M tubercolosis 0
#> 10: 10 2 Belgium 1996 F tubercolosis 0
#> 11: 11 1 France 2010 T hiv 16
#> 12: 12 1 France 2011 T hiv 2
#> 13: 13 1 France 2010 T cancer 170
#> 14: 14 8 Spain 2009 T cancer 647
#> 15: 15 2 Belgium 1996 T tubercolosis 0数据:
df <- data.frame(id, number, country, year, sex, disease, value)发布于 2016-06-20 20:28:53
df <-
data.frame(
number <- c(1,1,1,1,1,1,8,8,2,2),
country <- c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Belgium", "Belgium"),
year <- c(2010,2010,2011,2011,2010,2010,2009,2009,1996,1996),
sex <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F"),
disease <- c("hiv","hiv","hiv","hiv","cancer","cancer","cancer","cancer","tubercolosis","tubercolosis"),
value <- c(15,1,0,2,50,120,600,47,0,0))
colnames(df) <- c("number","country", "year", "sex",
"disease", "value")
df2 <- aggregate(df[,colnames(df) %in% c("number", "value")], by = list(df$country, df$disease, df$year), FUN = sum)
df2$sex <- "T"
colnames(df2) <- c("country", "disease", "year", "number", "value", "sex")
df2 <- df2[,colnames(df2) %in% c( "number", "country", "year", "sex", "disease", "value")]
newdf <- rbind(df,df2)
newdf
number country year sex disease value
1 1 France 2010 M hiv 15
2 1 France 2010 F hiv 1
3 1 France 2011 M hiv 0
4 1 France 2011 F hiv 2
5 1 France 2010 M cancer 50
6 1 France 2010 F cancer 120
7 8 Spain 2009 M cancer 600
8 8 Spain 2009 F cancer 47
9 2 Belgium 1996 M tubercolosis 0
10 2 Belgium 1996 F tubercolosis 0
11 4 Belgium 1996 T tubercolosis 0
12 16 Spain 2009 T cancer 647
13 2 France 2010 T cancer 170
14 2 France 2010 T hiv 16
15 2 France 2011 T hiv 2https://stackoverflow.com/questions/37931061
复制相似问题