我有两个数据表:
before表示处于“原始”状态的数据表(在进行任何清洗之前,operations).after表示各种清洗和操作后的数据表。它们大多有匹配的列名。
是否有可能构建第三个数据框架,其中具有匹配名称的列彼此相邻,名称可能被修改(name.before、name.after)和所有多余的列放在末尾?
例如:
before数据帧:
data.table::data.table(a = c(1,2,3), b = c(1,2,3), c = c(1,2,3))
a b c
1: 1 1 1
2: 2 2 2
3: 3 3 3after数据帧:
data.table::data.table(a = c("a","b","c"), c = c("a","b","c"), d = c(1,2,3))
a c d
1: a a 1
2: b b 2
3: c c 3预期的产出将是:
a.before a.after c.before c.after d
1: 1 a 1 a 1
2: 2 b 2 b 2
3: 3 c 3 c 3这样做的目的是为了方便地比较相同的列,以验证在将各种函数应用于data.table之后列输出是否合适。
发布于 2020-09-20 21:03:41
一个选项是将cbind并在ordered列名上使用setcolorder连接起来,如果目的是在重复的列名上识别前后,则使用make.unique。
library(data.table)
out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]
setnames(out, make.unique(names(out)))[]
out[, setdiff(names(dt1), names(dt2)) := NULL][]
# a.before a.after c.before c.after d
#1: 1 a 1 a 1
#2: 2 b 2 b 2
#3: 3 c 3 c 3如果我们需要特别使用before/after
out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]
out[, setdiff(names(dt1), names(dt2)) := NULL][]
i1 <- duplicated(names(out), fromLast = TRUE)
i2 <- duplicated(names(out))
names(out)[i1] <- paste0(names(out)[i1], ".before")
names(out)[i2] <- paste0(names(out)[i2], ".after")
out
# a.before a.after c.before c.after d
#1: 1 a 1 a 1
#2: 2 b 2 b 2
#3: 3 c 3 c 3发布于 2020-09-20 21:08:46
R基地操场:
cols_after <- colnames(after)
cols_before <- colnames(before)
inter <- intersect(cols_after, cols_before)
in_after <- cols_after %in% inter
n_after <- paste0(cols_after[in_after], ".after")
colnames(after)[in_after] <- n_after
in_before <- cols_before %in% inter
n_before <- paste0(cols_before[in_before], ".before")
colnames(before)[in_before] <- n_before
# some merge procedure merge_df or simple cbind
merge_df <- cbind(after, before)
merge_df_names <- merge_df[, c(as.vector(t(data.frame(n_before ,n_after))),
colnames(merge_df)[!(colnames(merge_df) %in% c(n_before, n_after))])]
# if merge_df is data.table we need with = FALSE
# merge_df[, c(as.vector(t(data.frame(n_before ,n_after))), colnames(merge_df)[!(colnames(merge_df) %in% c(n_before, n_after))]), with = FALSE]
merge_df_names
# to remove b column if needed
# merge_df_names <- merge_df_names[, setdiff(colnames(merge_df_names), "b")]
# if merge_df is data.table we need with = FALSE发布于 2020-09-20 21:09:18
最后显示更新的未受影响列(b、d)。
这种tidyverse方法接受data.table对象,并返回data.table对象:
library(tidyverse)
cols_to_rename <- intersect(colnames(before), colnames(after))
rename_cols <- function(data, suffix)
data %>% rename_with(~paste0(., suffix), all_of(cols_to_rename))
bind_cols(rename_cols(before, ".before"), rename_cols(after, ".after")) %>%
select(starts_with(paste0(cols_to_rename, ".")), everything())
a.before a.after c.before c.after b d
1: 1 a 1 a 1 1
2: 2 b 2 b 2 2
3: 3 c 3 c 3 3https://stackoverflow.com/questions/63983568
复制相似问题