我还在学习如何使用tidyr。我想使用“gene_ID()”将列分成多个行,并通过在适用的情况下复制“”列来保留它。示例输入数据:
gene_ID path1 path2 path3 path4 path5 path6 path7 path8
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Sphingolipid signaling pathway Lysosome
CAMNT_0020981363 mRNA surveillance pathway Hippo signaling pathway cAMP signaling pathway cGMP - PKG signaling pathway Regulation of actin cytoskeleton Meiosis - yeast Oocyte meiosis Focal adhesion
CAMNT_0020021387 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis
CAMNT_0003293445 Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway Endocytosis 所需输出数据示例:
gene_ID Pathway
CAMNT_0043146643 RNA transport
CAMNT_0029561721 Ribosome
CAMNT_0024703307 Lysosome
CAMNT_0024703307 Sphingolipid signaling pathway
CAMNT_0020981363 mRNA surveillance pathway
CAMNT_0020981363 Hippo signaling pathway
CAMNT_0020981363 cAMP signaling pathway
CAMNT_0020981363 cGMP - PKG signaling pathway
CAMNT_0020981363 Regulation of actin cytoskeleton
CAMNT_0020981363 Meiosis - yeast
CAMNT_0020981363 Oocyte meiosis
CAMNT_0020981363 Focal adhesion
CAMNT_0020021387 Spliceosome
CAMNT_0020021387 Protein processing in endoplasmic reticulum
CAMNT_0020021387 MAPK signaling pathway
CAMNT_0020021387 Endocytosis
CAMNT_0003293445 Spliceosome
CAMNT_0003293445 Protein processing in endoplasmic reticulum
CAMNT_0003293445 MAPK signaling pathway
CAMNT_0003293445 Endocytosis目前,我正在尝试:
temp<-gather(extract,"gene_ID",path1:path8)但是我得到了一条错误消息:" error :无效的列规范“--我尝试过在输入df时使用和不带标头的方法,但是同样的错误也会发生。我愿意使用另一种方法,但我对"NAs“有问题,因为并非所有行"gene_IDs”都有相同的列数。
关于如何进行的建议?
发布于 2015-12-15 19:08:59
df <- data.frame(x = c("a", "b", "c","d","e"),
path1=c("test1","test1","test2","test2","test3"),
path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
# x variable value
# 1 a path1 test1
# 2 b path1 test1
# 3 c path1 test2
# 4 d path1 test2
# 5 e path1 test3
# 6 a path2 testa
# 8 c path2 testg
# 9 d path2 testd发布于 2015-12-15 19:15:59
下面是一个tidyr解决方案:
df %>%
gather(path, Pathway, path1, path2) %>%
filter(Pathway != "") %>%
select(-path)
x Pathway
1 a test1
2 b test1
3 c test2
4 d test2
5 e test3
6 a testa
7 c testg
8 d testdhttps://stackoverflow.com/questions/34296778
复制相似问题