我有一个包含很多遗传变异信息的数据框架列表。现在我想从这些DFs中提取一些cols。一个问题是,在某些DFs中,学院的名称不同。是否有解决这一问题的推荐方法?
# example data
df1 <- data.frame(Gene = 1:10, Consequence1= 11:20, other_stuff = 21:30)
df2 <- data.frame(Gene = 1:10, Consequence= 11:20, other_stuff = 21:30)
df3 <- data.frame(Gene = 1:10, Consequence= 11:20, other_stuff = 21:30)
family1 <- list(cpht = df1, hm = df2, ht = df3)
family2 <- list(cpht = df1, hm = df2, ht = df3)
gene_lists <- list(family1 = family1, family2 = family2)我想要提取的科尔的名字是:“Gene”和“Consequence1”或“Consequence”。一个可行的选择是将名为“Consequence1”的名称重命名为“rename”,但到目前为止还没有成功。
非常感谢你的帮助!
塞巴斯蒂安
发布于 2021-01-07 15:17:51
不确定最终结果,但有一种方法是使用lapply循环列表,例如使用grepl提取所需的列:
# example data
df1 <- data.frame(Gene = 1:10, Consequence1= 11:20, other_stuff = 21:30)
df2 <- data.frame(Gene = 1:10, Consequence= 11:20, other_stuff = 21:30)
df3 <- data.frame(Gene = 1:10, Consequence= 11:20, other_stuff = 21:30)
family1 <- list(cpht = df1, hm = df2, ht = df3)
family2 <- list(cpht = df1, hm = df2, ht = df3)
gene_lists <- list(family1 = family1, family2 = family2)
gene_columns <- lapply(gene_lists, function(x) lapply(x, function(x) x[, names(x)[grepl("^(Gene|Consequence)", names(x))]]))
gene_columns$family1$ht
#> Gene Consequence
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 4 14
#> 5 5 15
#> 6 6 16
#> 7 7 17
#> 8 8 18
#> 9 9 19
#> 10 10 20编辑以重命名列Consequence1,并且只选择可以执行的Consequence:
gene_columns <- lapply(gene_lists, function(x) lapply(x, function(x) {
names(x)[grepl("^Consequence1$", names(x))] <- "Consequence"
x[, c("Gene", "Consequence")]
}
))https://stackoverflow.com/questions/65614719
复制相似问题