首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用一个值略有不同的列连接多列上的两个R数据格式

用一个值略有不同的列连接多列上的两个R数据格式
EN

Stack Overflow用户
提问于 2022-08-23 22:55:32
回答 1查看 26关注 0票数 0

我在R中有两个要连接在一起的数据,但是其中一个列的值是一个或两个值(特别是每个列中的yardline_100列)。下面是我用来加入这两者的代码:

代码语言:javascript
复制
fin_df <- df1 %>% 
  left_join(df2,
            by = c("posteam" = "posteam",
                   "qtr" = "qtr", 
                   "down" = "down",
                   "yardline_100" = "yardline_100"))

有什么办法可以使他们加入,即使这一列是一个或两个?您会注意到,最后两个值行在该列中有不同的数字。以下是数据格式的示例:

代码语言:javascript
复制
df1 <- structure(list(play_id = c(4596, 4629, 4658, 4682, 4723, 4766, 
4790, 4828, 4849, 4878, 4899, 4938), posteam = c("MIN", "MIN", 
"MIN", "MIN", "MIN", "CIN", "CIN", "CIN", "CIN", "CIN", "CIN", 
"CIN"), qtr = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), yardline_100 = c(63, 
58, 55, 50, 38, 61, 55, 52, 52, 20, 15, 15), down = c(2, 1, 2, 
3, 1, 1, 2, 3, 4, 1, 2, 3)), row.names = c(NA, -12L), class = c("nflverse_data", 
"tbl_df", "tbl", "data.table", "data.frame"), nflverse_timestamp = structure(1659046255.35538, class = c("POSIXct", 
"POSIXt")), nflverse_type = "play by play", nflfastR_version = structure(list(
    c(4L, 3L, 0L, 9020L)), class = c("package_version", "numeric_version"
)), .internal.selfref = <pointer: 0x0000021967f81ef0>)

df2 <- structure(list(posteam = c("MIN", "MIN", "MIN", "MIN", "MIN", 
"CIN", "CIN", "CIN", "CIN", "CIN", "CIN", "CIN"), qtr = c(5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), yardline_100 = c(63, 58, 55, 
50, 38, 61, 55, 53, 52, 20, 16, 16), down = c(2, 1, 2, 3, 1, 
1, 2, 3, 4, 1, 2, 3), play_id_SR = c("a9f97fb0-1407-11ec-ae9a-d77d9ecb2022", 
"d49d54d0-1407-11ec-ae9a-d77d9ecb2022", "e8f74ad0-1407-11ec-ae9a-d77d9ecb2022", 
"0208ae60-1408-11ec-ae9a-d77d9ecb2022", "257fd030-1408-11ec-ae9a-d77d9ecb2022", 
"fe058030-1408-11ec-ae9a-d77d9ecb2022", "0da68200-1409-11ec-ae9a-d77d9ecb2022", 
"26a5bd20-1409-11ec-ae9a-d77d9ecb2022", "70eacce0-1409-11ec-ae9a-d77d9ecb2022", 
"99e5fb10-1409-11ec-ae9a-d77d9ecb2022", "a7646b00-1409-11ec-ae9a-d77d9ecb2022", 
"de2683d0-1409-11ec-ae9a-d77d9ecb2022")), row.names = c(NA, -12L
), class = c("tbl_df", "tbl", "data.frame"))
EN

回答 1

Stack Overflow用户

发布于 2022-08-24 00:10:46

一个选项是使用fuzzyjoin

代码语言:javascript
复制
library(fuzzyjoin)
df1 %>% 
    fuzzy_left_join(
        df2,
        by = c("posteam", "qtr", "down", "yardline_100"),
        match_fun = list(`==`, `==`, `==`, function(x, y) abs(x - y) <= 2)) %>%
    select(-matches("(posteam|qtr|down).y")) %>%
    rename_with(~str_remove(.x, "(?<=(posteam|qtr|down)).x"))
## A tibble: 12 x 7
#   play_id posteam   qtr yardline_100.x  down yardline_100.y play_id_SR                          
#     <dbl> <chr>   <dbl>          <dbl> <dbl>          <dbl> <chr>                               
# 1    4596 MIN         5             63     2             63 a9f97fb0-1407-11ec-ae9a-d77d9ecb2022
# 2    4629 MIN         5             58     1             58 d49d54d0-1407-11ec-ae9a-d77d9ecb2022
# 3    4658 MIN         5             55     2             55 e8f74ad0-1407-11ec-ae9a-d77d9ecb2022
# 4    4682 MIN         5             50     3             50 0208ae60-1408-11ec-ae9a-d77d9ecb2022
# 5    4723 MIN         5             38     1             38 257fd030-1408-11ec-ae9a-d77d9ecb2022
# 6    4766 CIN         5             61     1             61 fe058030-1408-11ec-ae9a-d77d9ecb2022
# 7    4790 CIN         5             55     2             55 0da68200-1409-11ec-ae9a-d77d9ecb2022
# 8    4828 CIN         5             52     3             53 26a5bd20-1409-11ec-ae9a-d77d9ecb2022
# 9    4849 CIN         5             52     4             52 70eacce0-1409-11ec-ae9a-d77d9ecb2022
#10    4878 CIN         5             20     1             20 99e5fb10-1409-11ec-ae9a-d77d9ecb2022
#11    4899 CIN         5             15     2             16 a7646b00-1409-11ec-ae9a-d77d9ecb2022
#12    4938 CIN         5             15     3             16 de2683d0-1409-11ec-ae9a-d77d9ecb2022

请注意列function(x, y) abs(x - y) <= 2的匹配函数"yardline_100"

最后两行(select(...)rename_with(...))是删除重复列所必需的:fuzzyjoin似乎创建重复列(即".x“和".y"-suffixed)列,甚至在精确匹配时也是如此;最后两个命令删除这些重复的精确匹配列。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73465795

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档