我有6个txt文件分成两个组(A和T文件)。我希望在R中导入所有这些文件,并将每个A文件与每个T文件相交,并在这个示例中获得一个矩阵,其比例为A与T的比率。我想做两个向量的列表,然后从它们开始,找到一种计算这个矩阵的方法。
A_1.txt
tomato
zucchini
potato
banana
coconut
salt
A_2.txt
tomato
zucchini
potato
A_3.txt
zucchini
potato
T_1.txt
tomato
zucchini
potato
banana
coconut
salt
T_2.txt
tomato
zucchini
potato
banana
T_3.txt
potato
banana
coconut我想得到的是这个矩阵:
T_1 T_2 T_3
A_1 6 4 3
A_2 3 3 1
A_3 2 2 1有人能告诉我如何在R中做这件事吗?
我以这样的方式读到了这些信息:
A_files <- list.files("/home/A/", full.names = TRUE)
T_files <- list.files("/home/T/", full.names = TRUE)
myAlist <- lapply(A_files, read.delim, header=FALSE)
myTlist <- lapply(T_files, read.delim, header=FALSE)发布于 2020-02-03 15:43:30
这就是我希望使用的一套工具:
library(data.table)
library(magrittr)
filenames <- dir(pattern = "^[AT]_\\d.txt$")
vec <-
lapply(filenames, fread, header = FALSE) %>%
set_names(filenames %>% stringr::str_remove("\\.txt$")) %>%
rbindlist(idcol = "file")
vecA <- vec[file %like% "^A"]
vecT <- vec[file %like% "^T"]
vecA[vecT, on = .(V1), allow.cartesian = TRUE] %>%
dcast(file ~ i.file, length)file T\_1 T\_2 T\_3 1: A\_1 6 4 3 2: A\_2 3 3 1 3: A\_3 2 2 1
解释
A_1.txt、A_2.txt、……、T_2.txt、T_3.txt )都存储在同一个文件夹中,选择文件名。H 113然后,这两个数据集在vecA和vecT中分离。(这只是为了清晰和减少代码convoluted).
连接的结果是
vecA[vecT, on = .(V1), allow.cartesian = TRUE]file V1 i.file 1: A\_1 tomato T\_1 2: A\_2 tomato T\_1 3: A\_1 zucchini T\_1 4: A\_2 zucchini T\_1 5: A\_3 zucchini T\_1 6: A\_1 potato T\_1 7: A\_2 potato T\_1 8: A\_3 potato T\_1 9: A\_1 banana T\_1 10: A\_1 coconut T\_1 11: A\_1 salt T\_1 12: A\_1 tomato T\_2 13: A\_2 tomato T\_2 14: A\_1 zucchini T\_2 15: A\_2 zucchini T\_2 16: A\_3 zucchini T\_2 17: A\_1 potato T\_2 18: A\_2 potato T\_2 19: A\_3 potato T\_2 20: A\_1 banana T\_2 21: A\_1 potato T\_3 22: A\_2 potato T\_3 23: A\_3 potato T\_3 24: A\_1 banana T\_3 25: A\_1 coconut T\_3 file V1 i.file
可复制数据
这是一种从问题中提供的示例数据集中创建6个输入文件的方法:
library(data.table)
library(magrittr)
fread("A_1.txt
tomato
zucchini
potato
banana
coconut
salt
A_2.txt
tomato
zucchini
potato
A_3.txt
zucchini
potato
T_1.txt
tomato
zucchini
potato
banana
coconut
salt
T_2.txt
tomato
zucchini
potato
banana
T_3.txt
potato
banana
coconut", header = FALSE) %>%
.[, fwrite(.(V1[-1]), V1[1]), by = cumsum(V1 %like% "^[AT]_\\d.txt$")]发布于 2020-02-03 18:50:49
下面是一种使用基本R命令的方法。R默认为从字符向量创建因子。重要的是你不能允许这样做。将参数as.is=TRUE包含在read.csv命令中将保留字符数据。首先,让数据更容易获得:
myAlist <- list(A_1 = c("tomato", "zucchini", "potato", "banana", "coconut",
"salt"), A_2 = c("tomato", "zucchini", "potato"), A_3 = c("zucchini",
"potato"))
myTlist <- list(T_1 = c("tomato", "zucchini", "potato", "banana", "coconut",
"salt"), T_2 = c("tomato", "zucchini", "potato", "banana"), T_3 = c("potato",
"banana", "coconut"))现在,我们创建一个函数来查找两个组的交集并计算共享项的数量:
Shared <- function(a, t) {
length(intersect(myAlist[[a]], myTlist[[t]]))
}我们将A中的每一组与B中的每一组进行比较,例如A1与B1、B2、B3等:
(A <- rep(1:3, each=3))
# [1] 1 1 1 2 2 2 3 3 3
(T <- rep(1:3, 3))
# [1] 1 2 3 1 2 3 1 2 3最后,我们计算共享项的数量:
nshare <- mapply(Shared, A, T)
myTbl <- matrix(nshare, 3, byrow=TRUE, dimnames=list(A=names(myAlist), T=names(myTlist)))
myTbl
# T
# A T_1 T_2 T_3
# A_1 6 4 3
# A_2 3 3 1
# A_3 2 2 1https://stackoverflow.com/questions/60038901
复制相似问题