文章/答案/技术大牛

发布

社区首页 >问答首页 >在两列中完成和展开缺失的数据

问在两列中完成和展开缺失的数据
EN

Stack Overflow用户

提问于 2022-05-04 23:57:19

回答 1查看 27关注 0票数 1

我有两个专栏，我试图同时完成和扩展。这是一个样本数据集。

library(tibble)
library(dplyr)
library(tidyr)    

# Sample data
df <- tibble(
  type = c("apple", "apple", "apple", "orange", "orange", "orange", "pear", "pear"),
  year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2012),
  val = c(1:8))

df
# A tibble: 8 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 orange  2010     4
5 orange  2011     5
6 orange  2012     6
7 pear    2010     7
8 pear    2012     8

首先，type的“梨”错过了“2011年”。另外，type丢失了一个值，这个值可能在数据集中，但目前没有。type的缺失值是“香蕉”。我想包括“香蕉”，同时也填写与所有类型相关的缺失年份(2010:2012)。

到现在为止，我只能做一个或另一个。我想有办法两者兼得。fill参数在complete()中的问题是它只允许一个值来填充缺少的元素。

# Want to complete and expand
# Missing year 2011 in "pear" type and missing "banana" type so want to include and fill years 2010:2012

# complete
df %>% 
    complete(type = c("apple", "orange", "pear", "banana"), 
             fill = list(val = 0))
# A tibble: 9 x 3
  type    year   val
  <chr>  <dbl> <int>
1 apple   2010     1
2 apple   2011     2
3 apple   2012     3
4 banana    NA     0
5 orange  2010     4
6 orange  2011     5
7 orange  2012     6
8 pear    2010     7
9 pear    2012     8

# expand
df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year)
# A tibble: 12 x 2
   type    year
   <chr>  <dbl>
 1 apple   2010
 2 apple   2011
 3 apple   2012
 4 banana  2010
 5 banana  2011
 6 banana  2012
 7 orange  2010
 8 orange  2011
 9 orange  2012
10 pear    2010
11 pear    2011
12 pear    2012

我的预期产出是：

# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <dbl>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0

我可以引用df两次，如下所示，但如果可能的话，我想找到一种不必这样做的方法。

df %>% 
    expand(type = c("apple", "orange", "pear", "banana"), year) %>% 
    left_join(df, by = c("type", "year")) %>% 
    mutate(val = replace_na(val, 0))
# A tibble: 12 x 3
   type    year   val
   <chr>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 banana  2010     0
 5 banana  2011     0
 6 banana  2012     0
 7 orange  2010     4
 8 orange  2011     5
 9 orange  2012     6
10 pear    2010     7
11 pear    2011     0
12 pear    2012     8

tidyr

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-05 00:10:04

将type作为一个以banana为级别的因素，然后完成将如您所期望的那样工作：

library(dplyr)
library(tidyr)

df %>%
  mutate(type = factor(type, levels = c(unique(type), "banana"))) %>%
  complete(type, year, fill = list(val = 0))

# A tibble: 12 × 3
   type    year   val
   <fct>  <dbl> <int>
 1 apple   2010     1
 2 apple   2011     2
 3 apple   2012     3
 4 orange  2010     4
 5 orange  2011     5
 6 orange  2012     6
 7 pear    2010     7
 8 pear    2011     0
 9 pear    2012     8
10 banana  2010     0
11 banana  2011     0
12 banana  2012     0

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72120620

复制

相似问题

问在两列中完成和展开缺失的数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在两列中完成和展开缺失的数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在两列中完成和展开缺失的数据
EN