首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中使用tidyr包,使用道集()“无效列规范”

在R中使用tidyr包,使用道集()“无效列规范”
EN

Stack Overflow用户
提问于 2015-12-15 18:29:35
回答 2查看 3.8K关注 0票数 1

我还在学习如何使用tidyr。我想使用“gene_ID()”将列分成多个行,并通过在适用的情况下复制“”列来保留它。示例输入数据:

代码语言:javascript
复制
    gene_ID path1   path2   path3   path4   path5   path6   path7   path8
CAMNT_0043146643    RNA transport                           
CAMNT_0029561721    Ribosome                            
CAMNT_0024703307    Sphingolipid signaling pathway  Lysosome                        
CAMNT_0020981363    mRNA surveillance pathway   Hippo signaling pathway cAMP signaling pathway  cGMP - PKG signaling pathway    Regulation of actin cytoskeleton    Meiosis - yeast Oocyte meiosis  Focal adhesion
CAMNT_0020021387    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             
CAMNT_0003293445    Spliceosome Protein processing in endoplasmic reticulum MAPK signaling pathway  Endocytosis             

所需输出数据示例:

代码语言:javascript
复制
gene_ID Pathway
CAMNT_0043146643    RNA transport
CAMNT_0029561721    Ribosome
CAMNT_0024703307    Lysosome
CAMNT_0024703307    Sphingolipid signaling pathway
CAMNT_0020981363    mRNA surveillance pathway
CAMNT_0020981363    Hippo signaling pathway
CAMNT_0020981363    cAMP signaling pathway
CAMNT_0020981363    cGMP - PKG signaling pathway
CAMNT_0020981363    Regulation of actin cytoskeleton
CAMNT_0020981363    Meiosis - yeast
CAMNT_0020981363    Oocyte meiosis
CAMNT_0020981363    Focal adhesion
CAMNT_0020021387    Spliceosome
CAMNT_0020021387    Protein processing in endoplasmic reticulum
CAMNT_0020021387    MAPK signaling pathway
CAMNT_0020021387    Endocytosis
CAMNT_0003293445    Spliceosome
CAMNT_0003293445    Protein processing in endoplasmic reticulum
CAMNT_0003293445    MAPK signaling pathway
CAMNT_0003293445    Endocytosis

目前,我正在尝试:

代码语言:javascript
复制
temp<-gather(extract,"gene_ID",path1:path8)

但是我得到了一条错误消息:" error :无效的列规范“--我尝试过在输入df时使用和不带标头的方法,但是同样的错误也会发生。我愿意使用另一种方法,但我对"NAs“有问题,因为并非所有行"gene_IDs”都有相同的列数。

关于如何进行的建议?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2015-12-15 19:08:59

代码语言:javascript
复制
df <- data.frame(x = c("a", "b", "c","d","e"),
                 path1=c("test1","test1","test2","test2","test3"),
                 path2=c("testa","","testg","testd",""))
library(reshape2)
df[df==""] <- NA
melt(df, id.vars="x", na.rm=T)
#   x variable value
# 1 a    path1 test1
# 2 b    path1 test1
# 3 c    path1 test2
# 4 d    path1 test2
# 5 e    path1 test3
# 6 a    path2 testa
# 8 c    path2 testg
# 9 d    path2 testd
票数 1
EN

Stack Overflow用户

发布于 2015-12-15 19:15:59

下面是一个tidyr解决方案:

代码语言:javascript
复制
df %>%
  gather(path, Pathway, path1, path2) %>%
  filter(Pathway != "") %>%
  select(-path)

  x Pathway
1 a   test1
2 b   test1
3 c   test2
4 d   test2
5 e   test3
6 a   testa
7 c   testg
8 d   testd
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34296778

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档