文章/答案/技术大牛

发布

社区首页 >问答首页 >如何通过在R中保持一些变量的静态和动态来重构具有多变量的数据

问如何通过在R中保持一些变量的静态和动态来重构具有多变量的数据
EN

Stack Overflow用户

提问于 2020-04-09 02:13:18

回答 1查看 291关注 0票数 6

我在R中使用以下数据。

uid     Date                  batch_no       marking       seq
K-1     16/03/2020  12:11:33  7              S1            FRD
K-1     16/03/2020  12:11:33  7              S1            FHL
K-2     16/03/2020  12:11:33  8              SE_hold1      ABC
K-3     16/03/2020  12:11:33  9              SD_hold2      DEF
K-4     16/03/2020  12:11:33  8              S1            XYZ
K-5     16/03/2020  12:11:33                 NA            ABC
K-6     16/03/2020  12:11:33  7                            ZZZ
K-7     16/03/2020  12:11:33  NA             S2            NA
K-8     16/03/2020  12:11:33  6              S3            FRD

date.

batch_no列有八个唯一值(包括NA)；没有必要为每天的seq列都提供6个唯一值，包括NA和空格；没有必要为每天的日期提供所有六个值。

marking列将有~25个唯一值，但需要将带有后缀_hold#的值考虑为Hold；在此之后，将有6个唯一值，包括空格和NA.

。

要求将dcast数据按以下顺序合并，以便有一个单独的视图摘要进行分析。

我希望在代码中保持所有唯一的值是静态的，这样如果某个特定日期的特定值不可用，我将得到0或- in汇总表。

期望产出：

seq      count  percentage   Marking     count     Percentage     batch_no   count    Percentage
FRD      1      12.50%       S1          2         25.00%         6          1        12.50%
FHL      1      12.50%       S2          1         12.50%         7          2        25.00%
ABC      2      25.00%       S3          1         12.50%         8          2        25.00%
DEF      1      12.50%       Hold        2         25.00%         9          1        12.50%
XYZ      1      12.50%       NA          1         12.50%         NA         1        12.50%
ZZZ      1      12.50%       (Blank)     1         12.50%         (Blank)    1        12.50%
FRD      1      12.50%         -         -           -             -         -           -
NA       1      12.50%         -         -           -             -         -           -
(Blank)  0      0.00%          -         -           -             -         -           -
Total    8      112.50%        -         8         100.00%         -         8         100.00%

对于seq，我们有%> 100，因为对值FRD和FHL重复计算相同的uid。这是公认的情况。在Total中，uid的计数将是不同的。

dataframe

dplyr

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-04-09 03:41:26

有几种方法可以解决这个问题，一种方法是从清理数据开始，将其连接到一个表中，该表包含您明确需要的所有组合，然后进行总结。注:这将提供许多明确的0's，因为组合组合从这三列。

df = df_original %>% 
  mutate(marking = if_else(str_detect(marking,"hold"),"Hold", marking)) %>% 
  mutate_at(vars(c("seq", "batch_no", "marking")), forcats::fct_explicit_na, na_level = "(Blank)") 

## You need to do something similar with vectors of the possible values
## i.e. I don't know all the levels of your factors
#--------------------------------------------------------------------------
# Appending the NA and (Blank) levels ensures they are included in case the
# batch of data doesn't have them

df_seq = data.frame(seq = c(df$seq %>% levels(),"NA","(Blank)") %>% unique())
df_batch_no = data.frame(batch_no = c(df$batch_no %>% levels(),"NA","(Blank)") %>% unique())
df_marking = data.frame(marking = c(df$marking %>% levels(),"NA","(Blank)") %>% unique())

# would have been really nice to use janitor::tabyl but your output won't allow

df_seq_summary = df %>%
  group_by(seq) %>% 
  summarise(count = n()) %>% 
  right_join(df_seq, by = "seq") %>% 
  mutate(count = replace_na(count, 0),
  percentage = count / n()) %>% 
  mutate(row = row_number())

df_marking_summary =  df %>%
  group_by(marking) %>% 
  summarise(count = n()) %>% 
  right_join(df_marking, by = "marking") %>% 
  mutate(count = replace_na(count, 0),
         percentage = count / sum(count)) %>% 
  mutate(row = row_number())

df_batch_no_summary =  df %>%
  group_by(batch_no) %>% 
  summarise(count = n()) %>% 
  right_join(df_batch_no, by = "batch_no") %>% 
  mutate(count = replace_na(count, 0),
         percentage = count / sum(count)) %>% 
  mutate(row = row_number())

df = df_seq_summary %>% 
  full_join(df_marking_summary, by =  "row", suffix = c("", "_marking")) %>% 
  full_join(df_batch_no_summary, by =  "row", suffix = c("", "_batch_no")) %>% 
  select(-row) %>% 
bind_rows(summarise_all(., ~(if(is.numeric(.)) sum(if_else(.>0,as.double(.),0), na.rm = T) else "Total"))) %>% 
  mutate_at(vars(contains("percentage")), scales::percent, accuracy = 0.01)

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61112899

复制

相似问题

问如何通过在R中保持一些变量的静态和动态来重构具有多变量的数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何通过在R中保持一些变量的静态和动态来重构具有多变量的数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何通过在R中保持一些变量的静态和动态来重构具有多变量的数据
EN