我有一个看起来像这样的df:
> df
Names Symbol GeneID Description Paths Colors
IL-1 CP1 3553 Receptor Path1|Path2|Path5 Green|Blue|Pink
IL-6 CFT5 3569 Receptor Path3|Path1|Path2 Red|Green|Blue
TNF DFR4 7124 Receptor Path4|Path3|Path1 Yellow|Red|Green
CCL2 FGTZ 6347 Receptor Path4|Path5|Path2 Yellow|Pink|Blue
IL-1 SED 3552 Receptor Path6|Path5|Path3 Purple|Pink|Red
PAI1 SWA 5054 Receptor Path1 Green
IL-12 SSS 3593 Receptor Path1|Path2 Green|Blue
IL-8 SDE 3576 Receptor Path1|Path3|Path5 Green|Red|Pink
CTGF SDFR 1490 Receptor Path4|Path5|Path1 Yellow|Pink|Green
TGF FDGT 7046 Receptor Path5|Path3 Pink|Red我想拆分名为Paths和Colors的cols,然后计算每个Path#在名为Paths的列中出现的次数。因此,我可以获得如下所示的df,其中Path1出现7次,其对应的颜色为Green。Path5出现了5次,这就是为什么它会以其对应的颜色(pink)显示在第二位,依此类推。
> df2
Paths Colors
Path1 Green
Path5 Pink
Path3 Red
Path2 Blue
Path4 Yellow
Path6 Purple我试着用下面的代码这样做:
Paths <- data.frame(do.call('rbind', strsplit(as.character(df$paths), '|', fixed = TRUE)))
df2 <- table(unlist(Paths))
df2 <- data.frame(sort(df2, decreasing = T))但这只是一行代码,而不是拆分Paths和Colors。
有什么建议吗?最好使用base R
发布于 2018-05-25 20:30:54
在base R中,答案是:
## This part of the script is to separate the Paths and Colors columns
Paths <- data.frame(do.call('rbind', strsplit(as.character(df$Paths), '|', fixed = TRUE)))
Colors <- data.frame(do.call('rbind', strsplit(as.character(df$Colors), '|', fixed = TRUE)))
## Here Paths and Colors are pasted together and separated by underscore
Pathways = data.frame(matrix(NA, ncol = ncol(Paths), nrow = nrow(Colors)))
for (i in 1:ncol(Paths)){
Pathways[,i] = paste(Paths[,i], Colors[,i], sep = " ")
}
## Then, the number of times that each Path appears is counted.
occurrences <- table(unlist(Pathways))
occurrences <- data.frame(sort(occurrences, decreasing = T))
occurrences发布于 2018-05-25 15:45:56
你的问题有两个部分。首先,我们需要将数据整形为整洁的格式:每条路径一行对应的颜色。
df %>% tidyr::separate_rows(Paths, Colors, sep = "\\|")
Names Symbol GeneID Description Paths Colors
1 IL-1 CP1 3553 Receptor Path1 Green
2 IL-1 CP1 3553 Receptor Path2 Blue
3 IL-1 CP1 3553 Receptor Path5 Pink
...现在我们需要计算最受欢迎的路径:
counts <- df %>% count(Paths) %>% arrange(desc(n))
# A tibble: 6 x 2
Paths n
<chr> <int>
1 Path1 7
2 Path5 6
3 Path3 5
4 Path2 4
5 Path4 3
6 Path6 1最后,我们可以连接相应的颜色并删除n列。这里有一种方法。
counts %>% inner_join(df) %>% distinct(Paths, Colors)
Joining, by = "Paths"
# A tibble: 6 x 2
Paths Colors
<chr> <chr>
1 Path1 Green
2 Path5 Pink
3 Path3 Red
4 Path2 Blue
5 Path4 Yellow
6 Path6 Purplehttps://stackoverflow.com/questions/50511944
复制相似问题