首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >根据R中其他列中重复的值粘贴数据帧中的值

根据R中其他列中重复的值粘贴数据帧中的值
EN

Stack Overflow用户
提问于 2018-07-29 10:32:03
回答 2查看 736关注 0票数 1

我有如下数据框架

代码语言:javascript
复制
dt <- data.frame(genotype = c("X1", "X2", "X3", "X4", "X5", "X6", "X7",  "X8", "X1", "X2", "X3", "X4",
                              "X5", "X6", "X7",  "X8", "X1", "X2", "X3", "X4", "X5", "X6", "X7",  "X8"),
                 variable = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", 
                              "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), 
                 value = c(1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 3L,  3L, 4L, 5L, 5L,
                           1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), stringsAsFactors = F)    
dt
   genotype variable value
1        X1        A     1
2        X2        A     1
3        X3        A     2
4        X4        A     3
5        X5        A     4
6        X6        A     5
7        X7        A     6
8        X8        A     7
9        X1        B     1
10       X2        B     2
11       X3        B     3
12       X4        B     3
13       X5        B     3
14       X6        B     4
15       X7        B     5
16       X8        B     5
17       X1        C     1
18       X2        C     2
19       X3        C     3
20       X4        C     4
21       X5        C     5
22       X6        C     6
23       X7        C     7
24       X8        C     8

我想创建一个新的列,方法是根据每个变量中的值列中的重复将基因型数据粘贴在一起。

所需的输出如下。

代码语言:javascript
复制
out <- data.frame(genotype = c("X1", "X2", "X3", "X4", "X5", "X6", "X7",  "X8", "X1", "X2", "X3", "X4",
                           "X5", "X6", "X7",  "X8", "X1", "X2", "X3", "X4", "X5", "X6", "X7",  "X8"),
              variable = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", 
                           "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), 
              value = c(1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 3L, 3L, 4L, 5L, 
                        5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
              lab = c("X1, X2", "X1, X2", "X3", "X4", "X5", "X6", "X7", "X8", "X1", 
                      "X2", "X3, X4, X5", "X3, X4, X5", "X3, X4, X5", "X6", "X7, X7", 
                      "X8, X7", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), stringsAsFactors = F)
out
genotype variable value        lab
1        X1        A     1     X1, X2
2        X2        A     1     X1, X2
3        X3        A     2         X3
4        X4        A     3         X4
5        X5        A     4         X5
6        X6        A     5         X6
7        X7        A     6         X7
8        X8        A     7         X8
9        X1        B     1         X1
10       X2        B     2         X2
11       X3        B     3 X3, X4, X5
12       X4        B     3 X3, X4, X5
13       X5        B     3 X3, X4, X5
14       X6        B     4         X6
15       X7        B     5     X7, X7
16       X8        B     5     X8, X7
17       X1        C     1         X1
18       X2        C     2         X2
19       X3        C     3         X3
20       X4        C     4         X4
21       X5        C     5         X5
22       X6        C     6         X6
23       X7        C     7         X7
24       X8        C     8         X8

我试图使用aggregate,如下所示,但由于重复的值丢失,无法获得所需的结果。

代码语言:javascript
复制
cons <- aggregate(. ~value+variable, data=dt,
                  function(x) paste(unique(x), collapse = ","))
cons
   value variable genotype
1      1        A    X1,X2
2      2        A       X3
3      3        A       X4
4      4        A       X5
5      5        A       X6
6      6        A       X7
7      7        A       X8
8      1        B       X1
9      2        B       X2
10     3        B X3,X4,X5
11     4        B       X6
12     5        B    X7,X8
13     1        C       X1
14     2        C       X2
15     3        C       X3
16     4        C       X4
17     5        C       X5
18     6        C       X6
19     7        C       X7
20     8        C       X8

如何获得理想的输出最好在基R?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-07-29 10:38:55

您可以很容易地使用dplyr来完成它。

代码语言:javascript
复制
library(dplyr)

dt %>% group_by(variable, value) %>%
  mutate(lab = toString(genotype)) %>%
  as.data.frame()

#    genotype variable value        lab
# 1        X1        A     1     X1, X2
# 2        X2        A     1     X1, X2
# 3        X3        A     2         X3
# 4        X4        A     3         X4
# 5        X5        A     4         X5
# 6        X6        A     5         X6
# 7        X7        A     6         X7
# 8        X8        A     7         X8
# 9        X1        B     1         X1
# 10       X2        B     2         X2
# 11       X3        B     3 X3, X4, X5
# 12       X4        B     3 X3, X4, X5
# 13       X5        B     3 X3, X4, X5
# 14       X6        B     4         X6
# 15       X7        B     5     X7, X8
# 16       X8        B     5     X7, X8
# 17       X1        C     1         X1
# 18       X2        C     2         X2
# 19       X3        C     3         X3
# 20       X4        C     4         X4
# 21       X5        C     5         X5
# 22       X6        C     6         X6
# 23       X7        C     7         X7
# 24       X8        C     8         X8

编辑:@markus建议的可以在base-R中使用transform函数

代码语言:javascript
复制
 transform(dt, lab = ave(genotype, variable, value, FUN = toString))
票数 7
EN

Stack Overflow用户

发布于 2018-07-29 10:54:53

aggregate没有什么问题,只要您使用merge来恢复重复的行就行了。

代码语言:javascript
复制
res <- aggregate(genotype ~ variable + value, dt, paste, collapse = ", ")
res <- merge(dt, res, by = c("value", "variable"))[-3]
names(res)[3] <- "genotype"

head(res, 15)
#   value variable   genotype
#1      1        A     X1, X2
#2      1        A     X1, X2
#3      1        B         X1
#4      1        C         X1
#5      2        A         X3
#6      2        B         X2
#7      2        C         X2
#8      3        A         X4
#9      3        B X3, X4, X5
#10     3        B X3, X4, X5
#11     3        B X3, X4, X5
#12     3        C         X3
#13     4        A         X5
#14     4        B         X6
#15     4        C         X4
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51579405

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档