首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按“`rleid`”组折叠行,除非存在重复值

按“`rleid`”组折叠行,除非存在重复值
EN

Stack Overflow用户
提问于 2021-02-26 10:21:54
回答 1查看 49关注 0票数 1

我在utterance中有语音数据,在A_aoiB_aoiC_aoi列中有凝视数据。一些utterance行是duplicated

代码语言:javascript
复制
df <- data.frame(
  line = c(1,2,3,4,4,4,5,6,6,7,8),
  speaker = c("b", "a", NA, "c", "c", "c", NA, "c", "c", "a", "a"),
  utterance = c("Hey sweetheart!", "Louise!", "(0.234)", "What?", "What?", "What?", "(0.778)", "um::", "um::", "Wake up,", "breakfast's ready"),
  A_aoi = c("B", "B", "C", "B", NA, "C", "C", NA, "C", "C", "C"),
  B_aoi = c("C", "C", "C", "C", "A", "C", NA, NA, "C", "C", NA),
  C_aoi = c("A", NA, NA, "B", NA, "C", "C", "A", "A", "A", "A")
)

我需要做的是按rleid组折叠行。然而,在utteranceduplicated的地方,不应该出现崩溃。

我不介意用rleid组折叠行:

代码语言:javascript
复制
library(dplyr)
library(data.table)
library(stringr)
df %>%
  group_by(grp = rleid(speaker)) %>% 
  summarise(across(c(line, speaker), first), 
            utterance = str_c(utterance, collapse = ' '), 
            A_aoi = str_c(if_else(!is.na(A_aoi), A_aoi, "*" ), collapse = ""), 
            B_aoi = str_c(if_else(!is.na(B_aoi), B_aoi, "*" ), collapse = ""),
            C_aoi = str_c(if_else(!is.na(C_aoi), C_aoi, "*" ), collapse = ""), .groups = 'drop') %>%
  select(- grp)

但是,这也会折叠duplicated utterance值。的预期结果是:

代码语言:javascript
复制
# A tibble: 7 x 6
   line speaker utterance                  A_aoi B_aoi C_aoi
  <dbl> <chr>   <chr>                      <chr> <chr> <chr>
1     1 b       Hey sweetheart!            B     C     A    
2     2 a       Louise!                    B     C     *    
3     3 NA      (0.234)                    C     C     *    
4     4 c       What?                      B*C   CAC   B*C  
5     5 NA      (0.778)                    C     *     C    
6     6 c       um::                       *C    *C    AA   
7     7 a       Wake up, breakfast's ready CC    C*    AA 

任何帮助都是非常感谢的!

编辑

我有一个逐步解决方案,但是如果有人有一个更好的、不那么复杂的解决方案,我将非常感激:

代码语言:javascript
复制
# step 1 -- collapse only `aoi` columns:
df_a <- df %>%
  group_by(grp = rleid(speaker)) %>% 
  summarise(across(c(line, speaker), first),  
            A_aoi = str_c(if_else(!is.na(A_aoi), A_aoi, "*" ), collapse = ""), 
            B_aoi = str_c(if_else(!is.na(B_aoi), B_aoi, "*" ), collapse = ""),
            C_aoi = str_c(if_else(!is.na(C_aoi), C_aoi, "*" ), collapse = ""), .groups = 'drop') %>%
  select(- c(grp, line, speaker))

# step 2 -- remove duplicates:
df_b <- df[-which(duplicated(df$line)),]

# step 3 -- collapse `utterance`:
df_c <- df_b %>%
  group_by(grp = rleid(speaker)) %>% 
  summarise(across(c(line, speaker), first), 
            utterance = str_c(utterance, collapse = ' '), .groups = 'drop') %>%
  select(- grp)

# step 4 -- bind:
bind_cols(df_c, df_a)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-26 15:10:50

那么使用unique(utterance)呢?这能帮你实现你想要的吗?

代码语言:javascript
复制
df %>%
  group_by(grp = rleid(speaker)) %>% 
  summarise(across(c(line, speaker), first), 
    utterance = str_c(unique(utterance), collapse = ' '), 
    A_aoi = str_c(if_else(!is.na(A_aoi), A_aoi, "*" ), collapse = ""), 
    B_aoi = str_c(if_else(!is.na(B_aoi), B_aoi, "*" ), collapse = ""),
    C_aoi = str_c(if_else(!is.na(C_aoi), C_aoi, "*" ), collapse = ""), .groups = 'drop') %>%
  select(- grp)

输出

代码语言:javascript
复制
# A tibble: 7 x 6
   line speaker utterance                  A_aoi B_aoi C_aoi
  <dbl> <chr>   <chr>                      <chr> <chr> <chr>
1     1 b       Hey sweetheart!            B     C     A    
2     2 a       Louise!                    B     C     *    
3     3 NA      (0.234)                    C     C     *    
4     4 c       What?                      B*C   CAC   B*C  
5     5 NA      (0.778)                    C     *     C    
6     6 c       um::                       *C    *C    AA   
7     7 a       Wake up, breakfast's ready CC    C*    AA
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66383981

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档