文章/答案/技术大牛

发布

社区首页 >问答首页 >拆分cols，绑定dfs并保留最多重复值

问拆分cols，绑定dfs并保留最多重复值
EN

Stack Overflow用户

提问于 2018-05-24 22:31:17

回答 2查看 42关注 0票数 0

我有一个看起来像这样的df：

> df

Names   Symbol  GeneID  Description    Paths                Colors
IL-1    CP1     3553    Receptor       Path1|Path2|Path5    Green|Blue|Pink
IL-6    CFT5    3569    Receptor       Path3|Path1|Path2    Red|Green|Blue
TNF     DFR4    7124    Receptor       Path4|Path3|Path1    Yellow|Red|Green
CCL2    FGTZ    6347    Receptor       Path4|Path5|Path2    Yellow|Pink|Blue
IL-1    SED     3552    Receptor       Path6|Path5|Path3    Purple|Pink|Red
PAI1    SWA     5054    Receptor       Path1                Green 
IL-12   SSS     3593    Receptor       Path1|Path2          Green|Blue 
IL-8    SDE     3576    Receptor       Path1|Path3|Path5    Green|Red|Pink
CTGF    SDFR    1490    Receptor       Path4|Path5|Path1    Yellow|Pink|Green
TGF     FDGT    7046    Receptor       Path5|Path3          Pink|Red

我想拆分名为Paths和Colors的cols，然后计算每个Path#在名为Paths的列中出现的次数。因此，我可以获得如下所示的df，其中Path1出现7次，其对应的颜色为Green。Path5出现了5次，这就是为什么它会以其对应的颜色(pink)显示在第二位，依此类推。

> df2
Paths Colors
Path1 Green
Path5 Pink
Path3 Red
Path2 Blue
Path4 Yellow
Path6 Purple

我试着用下面的代码这样做：

Paths <- data.frame(do.call('rbind', strsplit(as.character(df$paths), '|', fixed = TRUE)))
df2 <- table(unlist(Paths))
df2 <- data.frame(sort(df2, decreasing = T))

但这只是一行代码，而不是拆分Paths和Colors。

有什么建议吗？最好使用base R

dataframe

split

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-05-25 20:30:54

在base R中，答案是：

## This part of the script is to separate the Paths and Colors columns
Paths <- data.frame(do.call('rbind', strsplit(as.character(df$Paths), '|', fixed = TRUE)))
Colors <- data.frame(do.call('rbind', strsplit(as.character(df$Colors), '|', fixed = TRUE)))

## Here Paths and Colors are pasted together and separated by underscore
Pathways = data.frame(matrix(NA, ncol = ncol(Paths), nrow = nrow(Colors)))

for (i in 1:ncol(Paths)){
  Pathways[,i] = paste(Paths[,i], Colors[,i], sep = " ")
}

## Then, the number of times that each Path appears is counted.
occurrences <- table(unlist(Pathways))
occurrences <- data.frame(sort(occurrences, decreasing = T))
occurrences

票数 0

Stack Overflow用户

发布于 2018-05-25 15:45:56

你的问题有两个部分。首先，我们需要将数据整形为整洁的格式:每条路径一行对应的颜色。

df %>% tidyr::separate_rows(Paths, Colors, sep = "\\|") 

    Names Symbol GeneID Description Paths Colors
1   IL-1    CP1   3553    Receptor Path1  Green
2   IL-1    CP1   3553    Receptor Path2   Blue
3   IL-1    CP1   3553    Receptor Path5   Pink
...

现在我们需要计算最受欢迎的路径：

counts <- df %>% count(Paths) %>% arrange(desc(n))

# A tibble: 6 x 2
  Paths     n
  <chr> <int>
1 Path1     7
2 Path5     6
3 Path3     5
4 Path2     4
5 Path4     3
6 Path6     1

最后，我们可以连接相应的颜色并删除n列。这里有一种方法。

counts %>% inner_join(df) %>% distinct(Paths, Colors)

Joining, by = "Paths"
# A tibble: 6 x 2
  Paths Colors
  <chr> <chr> 
1 Path1 Green 
2 Path5 Pink  
3 Path3 Red   
4 Path2 Blue  
5 Path4 Yellow
6 Path6 Purple

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50511944

复制

相似问题

问拆分cols，绑定dfs并保留最多重复值
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问拆分cols，绑定dfs并保留最多重复值EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问拆分cols，绑定dfs并保留最多重复值
EN