首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何操作(汇总) R中的数据?

如何操作(汇总) R中的数据?
EN

Stack Overflow用户
提问于 2019-12-03 19:22:19
回答 2查看 165关注 0票数 3

我有一个数据集,如下所示:

代码语言:javascript
复制
df <- tribble(
  ~id,  ~price, ~number_of_book,        
  "1",    10,         3,        
  "1",     5,         1,         
  "2",     7,         4,
  "2",     6,         2, 
  "2",     3,         4,
  "3",     4,         1,
  "4",     5,         1,
  "4",     6,         1,
  "5",     1,         2,
  "5",     9,         3,
)

正如您在数据集中所看到的,如果id为"1“,则有3本书每本书花费10美元,还有1本书的价格为5美元。基本上,我想看到的份额(%)的书籍数量为每个价格仓。下面是我想要的数据集:

代码语言:javascript
复制
df <- tribble(
  ~id,    ~less_than_three,   ~three-five,  ~five-six, ~more_than_six,     
  "1",          "0%",              "25%",     "0%",         "75%",
  "2",          "0%",              "40%",     "20%",        "40%",
  "3",          "0%",              "100%",    "0%",         "0%",  
  "4",          "0%",              "50%",     "50%",        "0%",
  "5",          "40%",             "0%",      "0%",         "60%",
)

现在,我把价格放在第一位。为此,我运行以下代码:

代码语言:javascript
复制
out <- cut(df$price, breaks = c(0, 3, 5, 6, 10),
           labels = c("<3","3-5","5-6", ">6")) 

out = table(out) / sum(table(out)) 

但不幸的是,由于缺乏编码知识,我无法更进一步。你能帮我得到想要的数据吗?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-12-03 19:52:09

使用dplyr,您可以添加列cols,它将用于列名。然后,你可以把每一本书的数量相加在每一个id中。接下来,您可以计算百分比,方法是将这些数字除以该id的和,然后应用scales::percent格式化为百分比,而不是十进制。现在您只需要给出从其中获取名称和值的变量,并重新排序列以匹配原始的标签顺序。(这比另一个答案要复杂一些,因为当给定的(id,cols/interval)对有>1行时,当看门人简化了事情时,它就说明了这种情况)。

代码语言:javascript
复制
labels = c("less_than_three","three_to_five","five_to_six", "more_than_six")

df %>% 
  group_by(id, cols = cut(price, breaks = c(0, 3, 5, 6, 10), labels = labels)) %>% 
  summarise(n = sum(number_of_book)) %>% 
  group_by(id) %>% 
  mutate(pct = scales::percent(n/sum(n), 1)) %>% 
  pivot_wider(id_cols = id, names_from = cols, values_from = pct) %>% 
  select_at(c('id', labels)) %>% 
  ungroup

# # A tibble: 5 x 5
#   id    less_than_three three_to_five five_to_six more_than_six
#   <chr> <chr>           <chr>         <chr>       <chr>        
# 1 1     NA              25%           NA          75%          
# 2 2     40%             NA            20%         40%          
# 3 3     NA              100%          NA          NA           
# 4 4     NA              50%           50%         NA           
# 5 5     40%             NA            NA          60%       

如果您想用0%替换NAs (我认为这在这个上下文中是有意义的,并且与问题中显示的输出相匹配),您可以使用下面的注释中提到的方法。

代码语言:javascript
复制
df %>% 
  group_by(id, cols = cut(price, breaks = c(0, 3, 5, 6, 10), labels = labels)) %>% 
  summarise(n = sum(number_of_book)) %>% 
  group_by(id) %>% 
  mutate(pct = scales::percent(n/sum(n), 1)) %>% 
  pivot_wider(id_cols = id, names_from = cols, values_from = pct,
              values_fill = list(pct = '0%')) %>% 
  select_at(c('id', labels)) %>% 
  ungroup

# # A tibble: 5 x 5
#   id    less_than_three three_to_five five_to_six more_than_six
#   <chr> <chr>           <chr>         <chr>       <chr>        
# 1 1     0%              57%           0%          43%          
# 2 2     40%             0%            20%         40%          
# 3 3     0%              100%          0%          0%           
# 4 4     0%              50%           50%         0%           
# 5 5     40%             0%            0%          60%         
票数 1
EN

Stack Overflow用户

发布于 2019-12-03 19:47:37

我们可以使用cut获取间隔,然后使用tidyr转换数据到宽格式,最后使用janitor添加百分比。

代码语言:javascript
复制
library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  mutate(interval = cut(price, c(0,3,5,6,Inf))) %>% 
  select(-price) %>% 
  pivot_wider(names_from = interval, values_from = number_of_book) %>% 
  adorn_percentages()

#>  id (6,Inf] (3,5] (5,6] (0,3]
#>   1    0.75  0.25    NA    NA
#>   2    0.40    NA   0.2   0.4
#>   3      NA  1.00    NA    NA
#>   4      NA  0.50   0.5    NA
#>   5    0.60    NA    NA   0.4
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59164019

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档