我总结了一个数据框架,并希望添加缺少的年份,并在适当的情况下填写这些丢失的行w/ 0。以下是我的起始数据:
start_data <- structure(list(park = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("apis",
"grpo", "isro", "miss", "piro", "sacn", "slbe", "voya"), class = "factor"),
loc_01 = structure(c(1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L,
4L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("apis",
"isro", "miss", "non_apis", "non_grpo", "non_isro", "non_miss",
"non_piro", "non_sacn", "non_slbe", "non_voya", "piro", "sacn",
"slbe", "voya"), class = "factor"), year = c(2005L, 2006L,
2007L, 2008L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2003L,
2005L, 2006L, 2007L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L
), agriculture = c(0, 0, 0, 0, 2.83549420428, 0, 26.41126099384,
9.07370206906, 11.00043405833, 1.440345049, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), beaver = c(0, 0.29706355242, 1.25997210478,
1.5123175298, 9.14483092902, 0.70214089206, 2.78157443836,
4.42825988163, 0.9900762968, 2.3401234612, 2.8808849429,
1.2604019414, 0.54011663526, 0.729245712, 5.45502002852,
2.7912116718, 3.0604650244, 1.51253347654, 0.9002514858,
2.7548091776), blowdown = c(0, 0, 0, 0, 0, 0, 2.23207970694,
0, 0, 0, 0, 0, 0, 0.81011784036, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-20L), .Names = c("park", "loc_01", "year", "agriculture", "beaver",
"blowdown"))看起来是这样的:
park loc_01 year agriculture beaver blowdown
1 apis apis 2005 0.000000 0.0000000 0.0000000
2 apis apis 2006 0.000000 0.2970636 0.0000000
3 apis apis 2007 0.000000 1.2599721 0.0000000
4 apis apis 2008 0.000000 1.5123175 0.0000000
5 apis non_apis 2004 2.835494 9.1448309 0.0000000
6 apis non_apis 2005 0.000000 0.7021409 0.0000000
7 apis non_apis 2006 26.411261 2.7815744 2.2320797
8 apis non_apis 2007 9.073702 4.4282599 0.0000000
9 apis non_apis 2008 11.000434 0.9900763 0.0000000
10 apis non_apis 2009 1.440345 2.3401235 0.0000000
11 isro isro 2003 0.000000 2.8808849 0.0000000
12 isro isro 2005 0.000000 1.2604019 0.0000000
13 isro isro 2006 0.000000 0.5401166 0.0000000
14 isro isro 2007 0.000000 0.7292457 0.8101178
15 isro non_isro 2003 0.000000 5.4550200 0.0000000
16 isro non_isro 2004 0.000000 2.7912117 0.0000000
17 isro non_isro 2005 0.000000 3.0604650 0.0000000
18 isro non_isro 2006 0.000000 1.5125335 0.0000000
19 isro non_isro 2007 0.000000 0.9002515 0.0000000
20 isro non_isro 2008 0.000000 2.7548092 0.0000000根据公园的标识,我想在缺少的年份里加上行。例如,在park == apis中,apis和non_apis都应该存在2005:2009年。在这里,我可以引用一个列表,比如apis.yrs <- 2004:2009。park == isro,2003:2008应该再次出现,无论是isro还是non_isro。再一次,可以为这些年创建另一个列表,isro.yrs <- 2003:2008。添加这些新行时,agriculture、beaver和blowdown都应该填充为0。
这是我的最终目标:
goal <- structure(list(park = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("", "apis", "isro"), class = "factor"), loc_01 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L,
2L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("apis", "isro", "miss",
"non_apis", "non_isro", "non_miss", "non_piro", "non_sacn", "non_slbe",
"non_voya", "piro", "sacn", "slbe", "voya"), class = "factor"),
year = c(2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2003L, 2004L, 2005L, 2006L,
2007L, 2008L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L),
agriculture = c(0, 0, 0, 0, 0, 0, 2.835494204, 0, 26.41126099,
9.073702069, 11.00043406, 1.440345049, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), beaver = c(0, 0, 0.297063552, 1.259972105,
1.51231753, 0, 9.144830929, 0.702140892, 2.781574438, 4.428259882,
0.990076297, 2.340123461, 2.880884943, 0, 1.260401941, 0.540116635,
0.729245712, 0, 5.455020029, 2.791211672, 3.060465024, 1.512533477,
0.900251486, 2.754809178), blowdown = c(0, 0, 0, 0, 0, 0,
0, 0, 2.232079707, 0, 0, 0, 0, 0, 0, 0, 0.81011784, 0, 0,
0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-24L), .Names = c("park", "loc_01", "year", "agriculture", "beaver",
"blowdown"))看起来像这样..。
park loc_01 year agriculture beaver blowdown
1 apis apis 2004 0.000000 0.0000000 0.0000000
2 apis apis 2005 0.000000 0.0000000 0.0000000
3 apis apis 2006 0.000000 0.2970636 0.0000000
4 apis apis 2007 0.000000 1.2599721 0.0000000
5 apis apis 2008 0.000000 1.5123175 0.0000000
6 apis apis 2009 0.000000 0.0000000 0.0000000
7 apis non_apis 2004 2.835494 9.1448309 0.0000000
8 apis non_apis 2005 0.000000 0.7021409 0.0000000
9 apis non_apis 2006 26.411261 2.7815744 2.2320797
10 apis non_apis 2007 9.073702 4.4282599 0.0000000
11 apis non_apis 2008 11.000434 0.9900763 0.0000000
12 apis non_apis 2009 1.440345 2.3401235 0.0000000
13 isro isro 2003 0.000000 2.8808849 0.0000000
14 isro isro 2004 0.000000 0.0000000 0.0000000
15 isro isro 2005 0.000000 1.2604019 0.0000000
16 isro isro 2006 0.000000 0.5401166 0.0000000
17 isro isro 2007 0.000000 0.7292457 0.8101178
18 isro isro 2008 0.000000 0.0000000 0.0000000
19 isro non_isro 2003 0.000000 5.4550200 0.0000000
20 isro non_isro 2004 0.000000 2.7912117 0.0000000
21 isro non_isro 2005 0.000000 3.0604650 0.0000000
22 isro non_isro 2006 0.000000 1.5125335 0.0000000
23 isro non_isro 2007 0.000000 0.9002515 0.0000000
24 isro non_isro 2008 0.000000 2.7548092 0.0000000其中,2004年、2009年添加了loc_01==apis,而2004年、2008年添加了loc_01==isro。
-al
发布于 2014-08-06 16:33:50
对于这类问题,expand.grid + merge是有用的。
你可以试试这样的东西:
步骤1:使用您希望包含的所有组合创建一个data.frame。
toMerge <- rbind(
expand.grid(park = "apis",
loc_01 = c("apis", "non_apis"),
year = 2004:2009),
expand.grid(park = "isro",
loc_01 = c("isro", "non_isro"),
year = 2003:2008)
)步骤2:使用原始的merge进行data.frame。
merge(start_data, toMerge, all = TRUE)
# park loc_01 year agriculture beaver blowdown
# 1 apis apis 2004 NA NA NA
# 2 apis apis 2005 0.000000 0.0000000 0.0000000
# 3 apis apis 2006 0.000000 0.2970636 0.0000000
# 4 apis apis 2007 0.000000 1.2599721 0.0000000
# 5 apis apis 2008 0.000000 1.5123175 0.0000000
# 6 apis apis 2009 NA NA NA
# 7 apis non_apis 2004 2.835494 9.1448309 0.0000000
# 8 apis non_apis 2005 0.000000 0.7021409 0.0000000
# 9 apis non_apis 2006 26.411261 2.7815744 2.2320797
# 10 apis non_apis 2007 9.073702 4.4282599 0.0000000
# 11 apis non_apis 2008 11.000434 0.9900763 0.0000000
# 12 apis non_apis 2009 1.440345 2.3401235 0.0000000
# 13 isro isro 2003 0.000000 2.8808849 0.0000000
# 14 isro isro 2004 NA NA NA
# 15 isro isro 2005 0.000000 1.2604019 0.0000000
# 16 isro isro 2006 0.000000 0.5401166 0.0000000
# 17 isro isro 2007 0.000000 0.7292457 0.8101178
# 18 isro isro 2008 NA NA NA
# 19 isro non_isro 2003 0.000000 5.4550200 0.0000000
# 20 isro non_isro 2004 0.000000 2.7912117 0.0000000
# 21 isro non_isro 2005 0.000000 3.0604650 0.0000000
# 22 isro non_isro 2006 0.000000 1.5125335 0.0000000
# 23 isro non_isro 2007 0.000000 0.9002515 0.0000000
# 24 isro non_isro 2008 0.000000 2.7548092 0.0000000从这里开始,用0替换NA并不困难。
https://stackoverflow.com/questions/25165299
复制相似问题