首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >合并两个数据帧而不重复度量值

合并两个数据帧而不重复度量值
EN

Stack Overflow用户
提问于 2020-01-31 01:15:06
回答 1查看 33关注 0票数 2

我有两个数据框架,我想根据领导值合并它们,这样我就可以看到每个组的总运行和行走。每个领导者可以在他们的团队中有多个成员,但我遇到的问题是,当我合并他们时,度量标准也会被复制到新添加的行中。

下面是我所拥有的两个数据集的一个示例:

数据集1:

代码语言:javascript
复制
+-------------+-----------+------------+-------------+
| leader name | leader id | total runs | total walks |
+-------------+-----------+------------+-------------+
| ab          |        11 |          4 |           9 |
| tg          |        47 |          8 |           3 |
+-------------+-----------+------------+-------------+

数据集2:

代码语言:javascript
复制
+-------------+-----------+--------------+-----------+
| leader name | leader id | member name  | member id |
+-------------+-----------+--------------+-----------+
| ab          |        11 | gfh          |       589 |
| ab          |        11 | tyu          |       739 |
| tg          |        47 | rtf          |       745 |
| tg          |        47 | jke          |       996 |
+-------------+-----------+--------------+-----------+

我想要合并这两个数据集,使它们变成这样:

代码语言:javascript
复制
+-------------+-----------+--------------+------------+------------+-------------+
| leader name | leader id | member name  | member id  | total runs | total walks |
+-------------+-----------+--------------+------------+------------+-------------+
| ab          |        11 | gfh          |        589 |          4 |           9 |
| ab          |        11 | tyu          |        739 |            |             |
| tg          |        47 | rtf          |        745 |          8 |           3 |
| tg          |        47 | jke          |        996 |            |             |
+-------------+-----------+--------------+------------+------------+-------------+

但现在我一直在想:

代码语言:javascript
复制
+-------------+-----------+--------------+------------+------------+-------------+
| leader name | leader id | member name  | member id  | total runs | total walks |
+-------------+-----------+--------------+------------+------------+-------------+
| ab          |        11 | gfh          |        589 |          4 |           9 |
| ab          |        11 | tyu          |        739 |          4 |           9 |
| tg          |        47 | rtf          |        745 |          8 |           3 |
| tg          |        47 | jke          |        996 |          8 |           3 |
+-------------+-----------+--------------+------------+------------+-------------+

不管它们是空白的,NA的还是0的,只要这些值不是重复的。有办法做到这一点吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-31 01:18:59

我们可以在那些‘replace’列上在left_join之后执行一个left_join

代码语言:javascript
复制
library(dplyr)
left_join(df2, df1 ) %>%
      group_by(leadername) %>%
      mutate_at(vars(starts_with('total')), ~ replace(., row_number() > 1, NA))
# A tibble: 4 x 6
# Groups:   leadername [2]
#  leadername leaderid membername memberid totalruns totalwalks
#  <chr>         <dbl> <chr>         <dbl>     <dbl>      <dbl>
#1 ab               11 gfh             589         4          9
#2 ab               11 tyu             739        NA         NA
#3 tg               47 rtf             745         8          3
#4 tg               47 jke             996        NA         NA

或者不使用group_by

代码语言:javascript
复制
left_join(df2, df1 ) %>%
     mutate_at(vars(starts_with('total')), ~ 
         replace(., duplicated(leadername), NA))

或者base R选项是

代码语言:javascript
复制
out <- merge(df2, df1, all.x = TRUE)
i1 <- duplicated(out$leadername)
out[i1, c("totalruns", "totalwalks")] <- NA
out
#  leadername leaderid membername memberid totalruns totalwalks
#1         ab       11        gfh      589         4          9
#2         ab       11        tyu      739        NA         NA
#3         tg       47        rtf      745         8          3
#4         tg       47        jke      996        NA         NA

数据

代码语言:javascript
复制
df1 <- structure(list(leadername = c("ab", "tg"), leaderid = c(11, 47
), totalruns = c(4, 8), totalwalks = c(9, 3)), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(leadername = c("ab", "ab", "tg", "tg"), leaderid = c(11, 
11, 47, 47), membername = c("gfh", "tyu", "rtf", "jke"), memberid = c(589, 
739, 745, 996)), class = "data.frame", row.names = c(NA, -4L))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59996754

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档