我想得到一个Mahalanobis差,每一组的两个分数,在分组后的另一个变量。在这种情况下,对于每个属性,它将是一个Mahalanobis差异(跨越每组2分)。输出应该是3个Mahalanobis距离( A,B和C的一个)。
目前,我正在使用(在我最初的dataframe中,有一些NAs,因此我在reprex中包含了一个):
library(tidyverse)
library(purrr)
df <- tibble(Attribute = unlist(map(LETTERS[1:3], rep, 5)),
Score1 = c(runif(7), NA, runif(7)),
Score2 = runif(15))
mah_db <- df %>%
dplyr::group_by(Attribute) %>%
dplyr::summarise(MAH = mahalanobis(Score1:Score2,
center = base::colMeans(Score1:Score2),
cov(Score1:Score2, use = "pairwise.complete.obs")))这会引发错误:
由
base::colMeans():!中的错误引起的“x”必须是至少两个维度的数组。
但据我所知,我给了colMeans两列。
那么这里出了什么问题?我想知道,即使是修复这个问题,是否也能给出一个完整的解决方案?
发布于 2022-05-13 11:22:00
看来你的问题更多的是关于统计数字,而不是dplyr。因此,我只给出一个基于您的数据的小示例和来自?mahalanobis的一个调整的示例。也许还可以看看这里或这里。
df <- subset(x = df0, Attribute == "A", select = c("Score1", "Score2"))
df$mahalanobis <- mahalanobis(x = df, center = colMeans(df), cov = cov(df))
df$p <- pchisq(q = df$mahalanobis, df = 2, lower.tail = FALSE)
plot(density(df$mahalanobis, bw = 0.3), ylim = c(0, 0.8),
main="Squared Mahalanobis distances");
grid()
rug(df$mahalanobis)
df <- subset(x = df0, Attribute == "B", select = c("Score1", "Score2"))
df <- df[complete.cases(df), ]
df$mahalanobis <- mahalanobis(x = df, center = colMeans(df), cov = cov(df))
df$p <- pchisq(q = df$mahalanobis, df = 2, lower.tail = FALSE)
lines(density(df$mahalanobis, bw = 0.3), col = "red",
main="Squared Mahalanobis distances");
rug(df$mahalanobis, col = "red")
df <- subset(x = df0, Attribute == "C", select = c("Score1", "Score2"))
df$mahalanobis <- mahalanobis(x = df, center = colMeans(df), cov = cov(df))
df$p <- pchisq(q = df$mahalanobis, df = 2, lower.tail = FALSE)
lines(density(df$mahalanobis, bw = 0.3), col = "green",
main="Squared Mahalanobis distances");
rug(df$mahalanobis, col = "green")希望,这是有帮助的(而且时间太长,无法发表评论)。
(当然,您可以使代码变得更短,但它在每一步中都显示了发生了什么。)
https://stackoverflow.com/questions/72226396
复制相似问题