我试着用手计算R包输出的距离和重量。当数据没有缩放时,我能够正确地计算欧几里德距离和逆权重,如下所示:
欧氏距离
(6-8)^2+ (4-5)^2) = 2.236068
(6-3)^2+ (4-7)^2) = 4.242641
(6-7)^2+ (4-3)^2) = 1.414214
逆权
1/ (2.236068 / 4.242641) = 1.897368
1 (1.414214 / 4.242641) = 3.000000。
我看不出矩形权重是如何计算的,因为我得到:
1/2 *1= 0.50
1/2 *1= 0.50
kknn包给出了1和1。
最后,当数据被缩放时,我对计算距离和权重一点也不走运。任何帮助都是非常感谢的,因为我正试图了解kknn包是如何工作的。
library(kknn)
training <- data.frame(class = c(1, 0, 1), height = c(8, 3, 7), weight = c(5, 7, 3))
training
holdouts <- data.frame(class = 1, height = 6, weight = 4)
holdouts
rectangular_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = FALSE)
rectangular_no_scale[["D"]]
rectangular_no_scale[["W"]]
inversion_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = FALSE)
inversion_no_scale[["D"]]
inversion_no_scale[["W"]]
rectangular_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = TRUE)
rectangular_with_scale[["D"]]
rectangular_with_scale[["W"]]
inversion_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = TRUE)
inversion_with_scale[["D"]]
inversion_with_scale[["W"]]发布于 2021-01-22 21:50:01
kknn的源代码(只需在控制台模式下键入kknn + return )有助于理解计算:
library(kknn)
training <- data.frame(class = c(1, 0, 1), height = c(8, 3, 7), weight = c(5, 7, 3))
training
#> class height weight
#> 1 1 8 5
#> 2 0 3 7
#> 3 1 7 3
holdouts <- data.frame(class = 1, height = 6, weight = 4)
holdouts
#> class height weight
#> 1 1 6 4
# Euclidian distance
d <- sqrt((training$height-holdouts$height)^2 +(training$weight-holdouts$weight)^2)
d <- d[order(d)]
d
#> [1] 1.414214 2.236068 4.242641
rectangular_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = FALSE)
rectangular_no_scale[["D"]]
#> [1] 1.414214 2.236068
d[1:2]
#> [1] 1.414214 2.236068
rectangular_no_scale[["W"]]
#> [,1] [,2]
#> [1,] 1 1
#
# source code:
# if (kernel == "rectangular")
# W <- matrix(1, nrow = p, ncol = k)
# This is why you get 1,1 : weights are the same and not normalized
inversion_no_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = FALSE)
inversion_no_scale[["D"]]
#> [1] 1.414214 2.236068
d[1:2]
#> [1] 1.414214 2.236068
inversion_no_scale[["W"]]
#> [,1] [,2]
#> [1,] 3 1.897367
#
# Source code :
# W <- D/maxdist
# if (kernel == "inv")
# W <- 1/W
max(d)/d[1:2]
#> [1] 3.000000 1.897367
rectangular_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "rectangular", k = 2, scale = TRUE)
height_sd <- sqrt(var(training$height))
weight_sd <- sqrt(var(training$weight))
training_scaled <- training
training_scaled$height <- training$height / height_sd
training_scaled$weight <- training$weight / weight_sd
holdouts_scaled <- holdouts
holdouts_scaled$height <- holdouts$height / height_sd
holdouts_scaled$weight <- holdouts$weight / weight_sd
rectangular_with_scale[["D"]]
#> [1] 0.6267832 0.9063270
d_scaled <- sqrt((training_scaled$height-holdouts_scaled$height)^2 +(training_scaled$weight-holdouts_scaled$weight)^2)
d_scaled <- d[order(d_scaled)]
d_scaled
#> [1] 0.6267832 0.9063270 1.8803495
rectangular_with_scale[["W"]]
#> [,1] [,2]
#> [1,] 1 1
# Same as before : 1,1
inversion_with_scale <- kknn(class ~., training, holdouts, distance = 2, kernel = "inv", k = 2, scale = TRUE)
inversion_with_scale[["D"]]
#> [1] 0.6267832 0.9063270
d_scaled[1:2]
#> [1] 0.6267832 0.9063270
inversion_with_scale[["W"]]
#> [,1] [,2]
#> [1,] 3 2.074692
max(d_scaled)/d_scaled[1:2]
#> [1] 3.000000 2.074692总之,rectangular内核使用相同的权重,而规范化并不需要找到k个最近的邻居,这就是为什么权值被简单地设置为1。
缩放只是除以每一列的标准差,然后继续计算。
https://stackoverflow.com/questions/65816165
复制相似问题