首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >group_by与群体条件突变

group_by与群体条件突变
EN

Stack Overflow用户
提问于 2021-03-28 19:05:19
回答 2查看 90关注 0票数 1

我需要创建一个goal变量,如果dummy.ciiu_compared = 1大于50%的总案例的数量将是1 (否则为0 ),则该变量将神化。

代码语言:javascript
复制
17/26=0.65 -> 1

目标将是goal变量。

注:考虑按年份和id.分组

数据

代码语言:javascript
复制
db = structure(list(year = structure(c("2020", "2020", "2020", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019", "2019", "2019", "2019", "2019", "2019", "2019", "2019", 
                                      "2019"), label = "AÃ<U+0091>O", format.stata = "%9s"), id = structure(c(732437, 
                                                                                                              732437, 732437, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036, 178036, 178036, 178036, 178036, 
                                                                                                              178036, 178036, 178036, 178036), label = "EXPEDIENTE", format.stata = "%12.0g"), 
                   n_shareholder = c(3L, 3L, 3L, 26L, 26L, 26L, 26L, 26L, 26L, 
                                     26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 
                                     26L, 26L, 26L, 26L, 26L, 26L, 26L, 26L), dummy = structure(list(
                                       ciiu_comparado = c(0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 
                                                          1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1)), class = c("tbl_df", 
                                                                                                                         "tbl", "data.frame"), row.names = c(NA, -29L)), n_dummy = c(3L, 
                                                                                                                                                                                     3L, 3L, 17L, 17L, 9L, 17L, 9L, 9L, 9L, 17L, 17L, 17L, 9L, 
                                                                                                                                                                                     17L, 17L, 9L, 17L, 17L, 9L, 17L, 9L, 17L, 17L, 17L, 17L, 
                                                                                                                                                                                     17L, 9L, 17L), goal = c(0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 
                                                                                                                                                                                                             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
                                                                                                                                                                                                                                                                                   -29L), groups = structure(list(year = structure(c("2019", "2020"
                                                                                                                                                                                                                                                                                   ), label = "AÃ<U+0091>O", format.stata = "%9s"), id = structure(c(178036, 
                                                                                                                                                                                                                                                                                                                                                     732437), label = "EXPEDIENTE", format.stata = "%12.0g"), .rows = structure(list(
                                                                                                                                                                                                                                                                                                                                                       4:29, 1:3), ptype = integer(0), class = c("vctrs_list_of", 
                                                                                                                                                                                                                                                                                                                                                                                                 "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                            "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           "tbl_df", "tbl", "data.frame"))
代码语言:javascript
复制
# A tibble: 29 x 6
# Groups:   year, id [2]
   year      id n_shareholder dummy$ciiu_comparado n_dummy  goal
   <chr>  <dbl>         <int>                <dbl>   <int> <dbl>
 1 2020  732437             3                    0       3     0
 2 2020  732437             3                    0       3     0
 3 2020  732437             3                    0       3     0
 4 2019  178036            26                    1      17     1
 5 2019  178036            26                    1      17     1
 6 2019  178036            26                    0       9     1
 7 2019  178036            26                    1      17     1
 8 2019  178036            26                    0       9     1
 9 2019  178036            26                    0       9     1
10 2019  178036            26                    0       9     1
# ... with 19 more rows
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-03-28 19:21:15

下面创建问题定义的虚拟模型。

比较0/1;

  • sum(<logical>)返回FALSE/TRUE,在内部编码时,1's;

  • and n()是该组的行数。

输出未完成。

代码语言:javascript
复制
library(dplyr)

db %>%
  group_by(year, id) %>%
  mutate(goal = sum(dummy$ciiu_comparado == 1)/n(),
         goal = as.integer(goal > 0.5))

goal可以在一条指令中计算。

代码语言:javascript
复制
db %>%
  group_by(year, id) %>%
  mutate(goal = +(sum(dummy$ciiu_comparado)/n() > 0.5))
票数 1
EN

Stack Overflow用户

发布于 2021-03-28 19:24:17

你可以这样做:

代码语言:javascript
复制
libarary(dplyr)
db %>% 
    group_by(year, id) %>% 
    mutate(new_goal = ifelse(sum(dummy) > (0.5 * nrow(.)), 1, 0)) %>% 
    ungroup
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66845166

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档