首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在R中匹配大数据帧和rbind

如何在R中匹配大数据帧和rbind
EN

Stack Overflow用户
提问于 2021-04-06 15:26:55
回答 1查看 52关注 0票数 2

我正在做下面的数据帧。

File1:

代码语言:javascript
复制
UniqueId     Match_ID               Parameters     Value
UID-245       xty-tt09-23-ert        Age            28
UID-245       xty-tt09-23-ert        Sex            Male
UID-245       xty-tt09-23-ert        Height         5.7
UID-245       xty-tt09-23-ert        Balance        200
UID-246       xty-tt09-03-ert        Balance        303
UID-246       xty-tt09-03-ert        Weight         89

File2:

代码语言:javascript
复制
UniqueId     Match_ID               Parameters     Value
UID-245       xty-tt09-23-ert        Age            29
UID-245       xty-tt09-23-ert        Sex            Female
UID-245       xty-tt09-23-ert        Height         5.7
UID-245       xty-tt09-23-ert        Balance        200
UID-246       xty-tt09-03-ert        Balance        300

我需要在不同的UniqueId基础上比较数据帧和所需的以下输出。

其中,我们需要将File1参数值与File2进行比较,以检查两者是否完全匹配。此外,我们需要在输出数据帧中创建一个逗号分隔的不匹配参数字符串作为Variable_Mismatched

代码语言:javascript
复制
UniqueID   Count_File1  Count_File2 Matched_Parameters   Mismatched_Parameters   Variable_Mismatched
UID-245     4            4            2                   2                      Age,Sex
UID-246     2            1            0                   1                      Balance
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-04-06 18:01:22

这是一种data.table方法。在代码的注释中有解释

确保您的File1File2data.table-format中。这可以通过使用data.table::fread()或在现有data.frames上使用data.table::setDT()读取中的文件来完成。

代码语言:javascript
复制
library(data.table)
library(tibble) #for lst-function
# Sample data -----
File1 <- fread("UniqueId     Match_ID               Parameters     Value
UID-245       xty-tt09-23-ert        Age            28
UID-245       xty-tt09-23-ert        Sex            Male
UID-245       xty-tt09-23-ert        Height         5.7
UID-245       xty-tt09-23-ert        Balance        200
UID-246       xty-tt09-03-ert        Balance        303
UID-246       xty-tt09-03-ert        Weight         89")
File2 <- fread("UniqueId     Match_ID               Parameters     Value
UID-245       xty-tt09-23-ert        Age            29
UID-245       xty-tt09-23-ert        Sex            Female
UID-245       xty-tt09-23-ert        Height         5.7
UID-245       xty-tt09-23-ert        Balance        200
UID-246       xty-tt09-03-ert        Balance        300")

# Part 1: get counts per file & UniqueId -----
L <- tibble::lst(File1, File2)
DT <- rbindlist(L, use.names = TRUE, fill = TRUE, idcol = "File")
# Summarise by UniqueId to get file-count, the first part of the answer
ans1 <- dcast(DT[, .N, by = .(File, UniqueId)], 
             UniqueId ~ paste0("Count_", File),
             value.var = "N")
# Part 2: Find matches and mismatches -----
ans2 <- copy(File1)[ File2, Value2 := i.Value, on = .(UniqueId, Parameters)]
# Only keep complete cases
ans2 <- ans2[!is.na(Value) & !is.na(Value2), ]
ans2 <- ans2[, .(Matched_Parameters    = sum(Value == Value2),
                 Mismatched_Parameters = sum(!Value == Value2),
                 Variable_Mismatched   = paste0(Parameters[!Value == Value2], 
                                                collapse = ","),
                 Variable_Matched      = paste0(Parameters[Value == Value2], 
                                                collapse = ",")), 
             by = .(UniqueId)]
# Part 3: Join together -----
ans1[ans2, on = .(UniqueId)]
#    UniqueId Count_File1 Count_File2 Matched_Parameters Mismatched_Parameters Variable_Mismatched Variable_Matched
# 1:  UID-245           4           4                  2                     2             Age,Sex   Height,Balance
# 2:  UID-246           2           1                  0                     1             Balance
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66964378

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档