我是R的新手,我想对我的文件的列进行简单的操作。有人能帮我做这件事吗?
我有两个大文件A和B。在我的文件A的列I@II中有一个特定的模式。我想捕获它并将其转换到文件B的第二列。基本上,对于文件A的列I中的每个名称,它的第二列中都有不同的名称。因此,我希望将第一列中每个名称的所有关联名称写入文件B
下面是我的文件的结构和所需的输出:
文件A:
family ID
let-7/98/4458/4500 hsa-let-7a
let-7/98/4458/4500 hsa-let-7b
let-7/98/4458/4500 hsa-let-7c
let-7/98/4458/4500 hsa-let-7d
let-7/98/4458/4500 hsa-let-7e
let-7/98/4458/4500 hsa-let-7f
let-7/98/4458/4500 hsa-miR-98
miR-1ab/206/613 hsa-miR-1
miR-1ab/206/613 hsa-miR-206
.
.
.文件A的输出:
输出A:
miR family ID
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
.
.
.文件B:
let-7/98/4458/4500
let-7/98/4458/4500
miR-1ab/206/613
miR-1ab/206/613
miR-1ab/206/613
miR-1ab/206/613
.
.文件B的所需输出:
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
let-7/98/4458/4500 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7/hsa-miR-98
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
miR-1ab/206/613 hsa-miR-1/hsa-miR-206
.
.发布于 2014-05-05 14:56:34
我的评论的演示:
out <- merge(aggregate(ID ~ family, A, paste, collapse="/"), B)
out
# family
# 1 let-7/98/4458/4500
# 2 let-7/98/4458/4500
# 3 miR-1ab/206/613
# 4 miR-1ab/206/613
# 5 miR-1ab/206/613
# 6 miR-1ab/206/613
# ID
# 1 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 2 hsa-let-7a/hsa-let-7b/hsa-let-7c/hsa-let-7d/hsa-let-7e/hsa-let-7f/hsa-miR-98
# 3 hsa-miR-1/hsa-miR-206
# 4 hsa-miR-1/hsa-miR-206
# 5 hsa-miR-1/hsa-miR-206
# 6 hsa-miR-1/hsa-miR-206下面是"A“和”B“的示例数据:
A <- structure(
list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
"let-7/98/4458/4500","let-7/98/4458/4500","let-7/98/4458/4500",
"let-7/98/4458/4500","miR-1ab/206/613","miR-1ab/206/613"),
ID = c("hsa-let-7a","hsa-let-7b","hsa-let-7c","hsa-let-7d","hsa-let-7e",
"hsa-let-7f", "hsa-miR-98", "hsa-miR-1","hsa-miR-206")),
.Names = c("family", "ID"), class = "data.frame", row.names = c(NA, -9L))
B <- structure(
list(family = c("let-7/98/4458/4500","let-7/98/4458/4500","miR-1ab/206/613",
"miR-1ab/206/613","miR-1ab/206/613", "miR-1ab/206/613")),
.Names = "family", class = "data.frame", row.names = c(NA, -6L))https://stackoverflow.com/questions/23465920
复制相似问题