我有两个专栏,我试图同时完成和扩展。这是一个样本数据集。
library(tibble)
library(dplyr)
library(tidyr)
# Sample data
df <- tibble(
type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
val = c(1:8))
df
# A tibble: 8 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2012 8首先,type的“梨”错过了“2011年”。另外,type丢失了一个值,这个值可能在数据集中,但目前没有。type的缺失值是“香蕉”。我想包括“香蕉”,同时也填写与所有类型相关的缺失年份(2010:2012)。
到现在为止,我只能做一个或另一个。我想有办法两者兼得。fill参数在complete()中的问题是它只允许一个值来填充缺少的元素。
# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012
# complete
df %>%
complete(type = c("apple", "orange", "pear", "banana"),
fill = list(val = 0))
# A tibble: 9 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana NA 0
5 orange 2010 4
6 orange 2011 5
7 orange 2012 6
8 pear 2010 7
9 pear 2012 8
# expand
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
type year
<chr> <dbl>
1 apple 2010
2 apple 2011
3 apple 2012
4 banana 2010
5 banana 2011
6 banana 2012
7 orange 2010
8 orange 2011
9 orange 2012
10 pear 2010
11 pear 2011
12 pear 2012我的预期产出是:
# A tibble: 12 x 3
type year val
<chr> <dbl> <dbl>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0我可以引用df两次,如下所示,但如果可能的话,我想找到一种不必这样做的方法。
df %>%
expand(type = c("apple", "orange", "pear", "banana"), year) %>%
left_join(df, by = c("type", "year")) %>%
mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
type year val
<chr> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 banana 2010 0
5 banana 2011 0
6 banana 2012 0
7 orange 2010 4
8 orange 2011 5
9 orange 2012 6
10 pear 2010 7
11 pear 2011 0
12 pear 2012 8发布于 2022-05-05 00:10:04
将type作为一个以banana为级别的因素,然后完成将如您所期望的那样工作:
library(dplyr)
library(tidyr)
df %>%
mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
complete(type, year, fill = list(val = 0))
# A tibble: 12 × 3
type year val
<fct> <dbl> <int>
1 apple 2010 1
2 apple 2011 2
3 apple 2012 3
4 orange 2010 4
5 orange 2011 5
6 orange 2012 6
7 pear 2010 7
8 pear 2011 0
9 pear 2012 8
10 banana 2010 0
11 banana 2011 0
12 banana 2012 0https://stackoverflow.com/questions/72120620
复制相似问题