首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >清洗数据

清洗数据
EN

Stack Overflow用户
提问于 2022-10-17 09:55:01
回答 2查看 45关注 0票数 0

我有一个像这样的数据集

代码语言:javascript
复制
temp <- structure(list(Date = c("23/06/2002", "24/06/2002", "25/06/2002", 
"25/06/2002", "26/06/2002", "02/07/2002", "03/07/2002", "24/07/2002", 
"15/07/2002", "17/07/2002", "22/07/2002"), payment = c(-1000, 
1000, -1000, -1000, 1000, -1000, -1000, -1000, 1200, 1200, 200
), Code = c("M567", "M567", "M567", "M567", "XYZ", "M567", 
"ABX", "M567", "M567", "M567", "M300"), ID = c("187", "98", 
"187", "187", "12ee", NA, NA, NA, "111", "111", "11")), class = c("data.table", 
"data.frame"), row.names = c(NA, -11L), groups = structure(list(
    assignment = c("ABX", "M300", "M567", "XYZ"), .rows = structure(list(
        7L, 11L, c(1L, 2L, 3L, 4L, 6L, 8L, 9L, 10L), 5L), ptype = integer(0), class = 
   c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), .internal.selfref = <pointer: 0x0000020274961ef0>)

输入:

代码语言:javascript
复制
          Date payment      Code ID
 1: 23/06/2002  -1000       M567  187
 2: 24/06/2002   1000       M567   98
 3: 25/06/2002  -1000       M567  187
 4: 25/06/2002  -1000       M567  187
 5: 26/06/2002   1000        XYZ 12ee
 6: 02/07/2002  -1000       M567 <NA>
 7: 03/07/2002  -1000        ABX <NA>
 8: 24/07/2002  -1000       M567 <NA>
 9: 15/07/2002   1200       M567  111
10: 17/07/2002   1200       M567  111
11: 22/07/2002   200       M300   11

我想删除给sum = 0用于相同的支付和代码列的行(不需要考虑ID列)

例如, 1000和M567应该与任意-1000取消,而M567剩下的行应该是相同的。

  • 只是我不得不取消dataset

中的+ve和-ve对

预期产出:

代码语言:javascript
复制
          Date payment      Code ID
 1: 25/06/2002  -1000       M567  187
 2: 25/06/2002  -1000       M567  187
 3: 26/06/2002   1000        XYZ 12ee
 4: 02/07/2002  -1000       M567 <NA>
 5: 03/07/2002  -1000        ABX <NA>
 6: 24/07/2002  -1000       M567 <NA>
 7: 15/07/2002   1200       M567  111
 8: 17/07/2002   1200       M567  111
 9: 22/07/2002   200       M300   11
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-10-17 10:38:59

以下是一种可能的方法:

代码语言:javascript
复制
library(dplyr)
temp %>% 
  group_by(group =  as.integer(gl(n(), 2,n()))) %>% 
  mutate(x = sum(payment)) %>% 
  filter(!(x == 0 & first(Code) == last(Code))) %>% 
  ungroup() %>% 
  select(-x, -group)
代码语言:javascript
复制
  Date       payment Code  ID   
  <chr>        <dbl> <chr> <chr>
1 25/06/2002   -1000 M567  187  
2 25/06/2002   -1000 M567  187  
3 26/06/2002    1000 XYZ   12ee 
4 02/07/2002   -1000 M567  NA   
5 03/07/2002   -1000 ABX   NA   
6 24/07/2002   -1000 M567  NA   
7 15/07/2002    1200 M567  111  
8 17/07/2002    1200 M567  111  
9 22/07/2002     200 M300  11   
票数 2
EN

Stack Overflow用户

发布于 2022-10-17 11:54:04

您可以计算出付款之和为零的代码,将它们放在向量中,并从包含此代码的原始data.frame中删除所有行:

代码语言:javascript
复制
totalpay = aggregate(temp$payment ~ temp$Code, FUN=sum)
zeropay = totalpay[totalpay[,2]==0,1]
temp = temp[!temp$Code %in% zeropay,]

更重要的是,您可以保留向量zeropay作为删除代码的文档。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74095492

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档