首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >group_by、do()和更多链/多个条件之后dplyr内的if语句

group_by、do()和更多链/多个条件之后dplyr内的if语句
EN

Stack Overflow用户
提问于 2021-02-02 10:21:20
回答 1查看 38关注 0票数 0

我希望你们都很好。

我有一个包含许多列的数据集,我正在尝试根据多个条件删除重复的数据集。下面我提供一个示例来演示我的问题。其思想是,对于每个ID,检查所有列,如果所有列都相同,则保留最新的列。如果有两个相同的行,而上面的注释是不同的,那么检查该行是否为"Add comment for down/upgrading client",如果所有行都有相同的注释,则保留第一行,否则保留最新的行,不包含上面的注释。

我一直在尝试以下几种方法

代码语言:javascript
复制
##dataframe
             ID <- c("H1", "H1"," H1"," H2", "H2", "H3", "H3"," H3", "H4")
            rating <-c("C", "C", "C+","D", "C", "C",  "C+", "C+", "C")
            Commnets<- c("Add comment for down/upgrading client", "updated", "Add comment for down/upgrading client","Add comment for down/upgrading client","Add comment for down/upgrading client", 
                        "down",  "down", "Add comment for down/upgrading client", "Add comment for down/upgrading client")
            Date<- c("2018-12-10", "2018-12-10", "2018-11-10",
                        "2018-11-10","2018-11-10", 
                        "2018-10-10",  "2018-10-02", "2018-10-02", "2020-09-03")
 df<-data.frame(ID,rating,Commnets,Date,stringsAsFactors=FALSE)






 df$Date<-as.Date(df$Date)
    df<-df%>%
      group_by(ID,rating,Date)%>%
      arrange(desc(Date)) %>% # in each group, arrange in desc by Date
      filter(row_number() == 1)#this will solve the first problem 



  
   



df$Date<-as.Date(df$Date)
        df<-df%>%
          group_by(ID,rating,Date)%>%
          arrange(desc(Date)) %>% #I think that I need **do** here but not sure how
ifelse(rowSums("Add comment for down/upgrading client" == $Comments)==length($Comments),
                  filter(row_number() == 1),rowSums("Add comment for down/upgrading client" == $Comments)[1,])
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-02 11:14:45

您可以通过递减Date顺序和计数每个IDratingDate的唯一Commnets数来arrange数据。如果始终都是相同的注释,则选择第一行,如果不同,则选择最后一行,即最新的。

代码语言:javascript
复制
library(dplyr)

df %>%
  mutate(ID = trimws(ID), 
         Date = as.Date(Date)) %>%
  arrange(ID, rating, Commnets, desc(Date)) %>%
  group_by(ID,rating,Date)  %>%
  slice(if(n_distinct(Commnets) == 1) 1L else n())

#  ID    rating Commnets                              Date      
#  <chr> <chr>  <chr>                                 <date>    
#1 H1    C      updated                               2018-12-10
#2 H1    C+     Add comment for down/upgrading client 2018-11-10
#3 H2    C      Add comment for down/upgrading client 2018-11-10
#4 H2    D      Add comment for down/upgrading client 2018-11-10
#5 H3    C      down                                  2018-10-10
#6 H3    C+     down                                  2018-10-02
#7 H4    C      Add comment for down/upgrading client 2020-09-03
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66002807

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档