我有很多数据集,我想要合并它们并使它们独一无二。我想在这里做一个有代表性的数据
df1 <- read.table(text="info var1 var2
1 C001 mytest1 NA
2 C002 mytest2 NA
3 C003 myse1 data1
4 C004 NA NA
5 C007 where1 India
6 C010 ohio city
11 C016 number fifty
12 C017 city rome", header=T, stringsAsFactors=F)
and this
df2 <- read.table(text="info var1 var2
1 C003 myse1 data1
2 C007 where1 India
3 C010 ohio city
4 C016 number fifty
5 C017 city rome
6 C022 country India
7 C023 number 10", header=T, stringsAsFactors=F)
df3 <- read.table(text="info var1 var2 var3
1 C017 city rome ind
2 C022 country India bes
3 C027 this there NA", header=T, stringsAsFactors=F)我想把它们结合在一起,建立在信息的基础上,但要使它们独一无二。当我想要合并所有文件时,我会这样做
library(tidyverse)
library(dplyr)
list(df1, df2, df3) %>% reduce(full_join, by = "info")但是我希望输出是这样的
info var1.x var2.x var3
C001 mytest1 NA NA
C002 mytest2 NA NA
C003 myse1 data1 NA
C004 NA NA NA
C007 where1 India NA
C010 ohio city NA
C016 number fifty NA
C017 city rome ind
C022 country India bes
C023 number 10 NA
C027 this there NA发布于 2019-03-20 17:22:23
我觉得这应该对你有用。
bind_rows(df1, df2, df3) %>%
unique() %>%
mutate(rsum = rowSums(!is.na(.))) %>%
group_by(info) %>%
filter(rsum == max(rsum)) %>%
select(-rsum)
info var1 var2 var3
<chr> <chr> <chr> <chr>
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C023 number 10 <NA>
9 C017 city rome ind
10 C022 country India bes
11 C027 this there <NA> 发布于 2019-03-20 17:02:40
下面的解决方案首先生成您寻求合并数据集的唯一键,共享"info“列。然后使用左联接合并来添加var1中的df1和df2、df1和df2中的var2以及df3中的var3各自的列。
library(dplyr)
info <- data.frame(info=unique(c(df1$info,df2$info,df3$info)))
var1s <- unique(rbind(df1[,c("info","var1")],df2[,c("info","var1")],df3[,c("info","var1")]))
var2s <- unique(rbind(df1[,c("info","var2")],df2[,c("info","var2")],df3[,c("info","var2")]))
var3s <- unique(df3[,c("info","var3")])
merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)结果:
> merge(x=info,y=var1s,by="info",all.x=T) %>% merge(y=var2s,by="info",all.x=T) %>% merge(y=var3s,by="info",all.x=T)
info var1 var2 var3
1 C001 mytest1 <NA> <NA>
2 C002 mytest2 <NA> <NA>
3 C003 myse1 data1 <NA>
4 C004 <NA> <NA> <NA>
5 C007 where1 India <NA>
6 C010 ohio city <NA>
7 C016 number fifty <NA>
8 C017 city rome ind
9 C022 country India bes
10 C023 number 10 <NA>
11 C027 this there <NA>https://stackoverflow.com/questions/55266094
复制相似问题