我有一个超过20列和超过2000行的大型数据集。我想知道不同变量同时发生的次数。另外,如果能用它制作一个热图(共现热图或相关热图),那就更好了。但是,我不确定您是否可以使用伪/二进制变量来做到这一点。有什么建议吗?
我需要转换这个示例数据集(x)
A B C D E F
1 0 1 1 1 1 0
2 0 1 1 0 0 1
3 1 0 0 0 1 0
4 0 0 1 1 1 1
5 0 0 1 1 0 0变成类似这样的东西:
A B C D E F
A 0 0 0 0 1 0
B 0 0 2 1 1 1
C 0 2 0 3 2 2
D 0 1 3 0 2 1
E 1 1 2 2 0 1
F 0 1 2 2 1 0发布于 2018-02-06 03:20:37
给定一个矩阵X,我们有
(A <- t(X) %*% X)
# A B C D E F
# A 1 0 0 0 1 0
# B 0 2 2 1 1 1
# C 0 2 4 3 2 2
# D 0 1 3 3 2 1
# E 1 1 2 2 3 1
# F 0 1 2 1 1 2如果您希望对角线包含零,则添加diag(A) <- 0。热图然后可以用,例如,
heatmap(A, Rowv = NA, Colv = NA)发布于 2018-02-06 03:22:37
temp = sapply(colnames(A), function(x)
sapply(colnames(A), function(y)
sum(rowSums(A[,c(x, y)]) == 2)))
diag(temp) = 0
temp
# A B C D E F
#A 0 0 0 0 1 0
#B 0 0 2 1 1 1
#C 0 2 0 3 2 2
#D 0 1 3 0 2 1
#E 1 1 2 2 0 1
#F 0 1 2 1 1 0
library(reshape2)
library(ggplot2)
df1 = melt(temp)
graphics.off()
ggplot(df1, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
theme_classic()

DATA
A = structure(list(A = c(0L, 0L, 1L, 0L, 0L), B = c(1L, 1L, 0L, 0L,
0L), C = c(1L, 1L, 0L, 1L, 1L), D = c(1L, 0L, 0L, 1L, 1L), E = c(1L,
0L, 1L, 1L, 0L), F = c(0L, 1L, 0L, 1L, 0L)), .Names = c("A",
"B", "C", "D", "E", "F"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))https://stackoverflow.com/questions/48629718
复制相似问题