我有一个带有2列的data.frame,其中第二列中的值是重复的。例如:
HUGO Cell
1 CD28 T cells
2 CD3D T cells
3 CD3G T cells
4 CD8A lymphocytes
5 EOMES lymphocytes
6 FGFBP2 lymphocytes
7 GNLY lymphocytes
8 NCR1 NK cells
9 PTGDR NK cells
10 SH2D1B NK cells我希望与列单元格中唯一名称对应的列HUGO中的所有值在每个唯一名称之后进入名称列表。
例如
T cells: CD28 CC3D C34
lymphocytes: CD8A EOMES FGFBP2 FGFBP2 GNLY
... 我试过了
reshape(data.frame, timevar = "HUGO",idvar = "Cell",direction = "wide"),但它只返回单元格列中每个名称的值数。
发布于 2017-10-26 14:01:16
这里有一些可能性取决于你想要的是什么。前5不使用包。
1)聚合/c--这给出了一个数据框架,其第二列是HUGO的字符向量。
aggregate(HUGO ~ Cell, DF, c)给予:
Cell HUGO
1 lymphocytes CD8A, EOMES, FGFBP2, GNLY
2 NK cells NCR1, PTGDR, SH2D1B
3 T cells CD28, CD3D, CD3G2)聚合/toString--这给出了一个数据框架,它的第二列包含用逗号分隔HUGO的字符串。
aggregate(HUGO ~ Cell, DF, toString)给予:
Cell HUGO
1 lymphocytes CD8A, EOMES, FGFBP2, GNLY
2 NK cells NCR1, PTGDR, SH2D1B
3 T cells CD28, CD3D, CD3G3)这给出了一个列表,每个单元格都有一个组件,每个组件都是该单元格的HUGO名称。
unstack(DF)给予:
$lymphocytes
[1] "CD8A" "EOMES" "FGFBP2" "GNLY"
$`NK cells`
[1] "NCR1" "PTGDR" "SH2D1B"
$`T cells`
[1] "CD28" "CD3D" "CD3G"4)这给出了一个矩阵,它的行是单元格,其列是HUGO名称的序号。
DF2 <- transform(DF, seq = ave(seq_along(HUGO), Cell, FUN t= seq_along))
tapply(DF2$HUGO, DF2[-1], c)给予:
seq
Cell 1 2 3 4
lymphocytes "CD8A" "EOMES" "FGFBP2" "GNLY"
NK cells "NCR1" "PTGDR" "SH2D1B" NA
T cells "CD28" "CD3D" "CD3G" NA 5)重塑--使用最后一个选项的DF2与reshape一起给出一个数据框架:
reshape(DF2, timevar = "seq", idvar = "Cell", dir = "wide")给予:
Cell HUGO.1 HUGO.2 HUGO.3 HUGO.4
1 T cells CD28 CD3D CD3G <NA>
4 lymphocytes CD8A EOMES FGFBP2 GNLY
8 NK cells NCR1 PTGDR SH2D1B <NA>6)扩展--这给出了一个"tbl_df"类对象作为输出(它是"data.frame"的子类)
library(dplyr)
library(tidyr)
DF %>%
group_by(Cell) %>%
mutate(seq = 1:n()) %>%
ungroup() %>%
spread(seq, HUGO)给予:
Cell 1 2 3 4
1 lymphocytes CD8A EOMES FGFBP2 GNLY
2 NK cells NCR1 PTGDR SH2D1B <NA>
3 T cells CD28 CD3D CD3G <NA>7) read.zoo read.zoo给出了一个以细胞为时间的动物园物体。
由于时间实际上是字符串,所以我们使用FUN=identity来避免解释。fortify.zoo将其转换为数据帧。DF2来自上面。
library(zoo)
fortify.zoo(read.zoo(DF2, split = "seq", index = "Cell", FUN = identity))给予:
Index 1 2 3 4
1 lymphocytes CD8A EOMES FGFBP2 GNLY
2 NK cells NCR1 PTGDR SH2D1B <NA>
3 T cells CD28 CD3D CD3G <NA>8) dcast --这给出了一个data.table作为输出。
library(data.table)
DT <- data.table(DF)
DT[, seq:=1:.N, by = Cell]
dcast(DT, Cell ~ seq, value.var = "HUGO")给予:
Cell 1 2 3 4
1: NK cells NCR1 PTGDR SH2D1B NA
2: T cells CD28 CD3D CD3G NA
3: lymphocytes CD8A EOMES FGFBP2 GNLY注:
DF <- structure(list(HUGO = c("CD28", "CD3D", "CD3G", "CD8A", "EOMES",
"FGFBP2", "GNLY", "NCR1", "PTGDR", "SH2D1B"), Cell = c("T cells",
"T cells", "T cells", "lymphocytes", "lymphocytes", "lymphocytes",
"lymphocytes", "NK cells", "NK cells", "NK cells")), .Names = c("HUGO",
"Cell"), class = "data.frame", row.names = c(NA, -10L))https://stackoverflow.com/questions/46955522
复制相似问题