文章/答案/技术大牛

发布

社区首页 >问答首页 >压力属性-- sklearn.manifold.MDS / Python

问压力属性-- sklearn.manifold.MDS / Python
EN

Stack Overflow用户

提问于 2016-04-05 13:45:24

回答 1查看 3.4K关注 0票数 8

我正在使用scikit学习方法MDS在一些数据中执行维数约简。我想检查的压力值，以获得质量的减少。我以为在0到1之间会有什么结果。但是，我得到的值超出了这个范围。下面是一个很小的例子：

%matplotlib inline

from sklearn.preprocessing import normalize
from sklearn import manifold
from matplotlib import pyplot as plt
from matplotlib.lines import Line2D

import numpy


def similarity_measure(vec1, vec2):
    vec1_x = numpy.arctan2(vec1[1], vec1[0])
    vec2_x = numpy.arctan2(vec2[1], vec2[0])
    vec1_y = numpy.sqrt(numpy.sum(vec1[0] * vec1[0] + vec1[1] * vec1[1]))
    vec2_y = numpy.sqrt(numpy.sum(vec2[0] * vec2[0] + vec2[1] * vec2[1]))

    dot  = numpy.sum(vec1_x * vec2_x + vec1_y * vec2_y)
    mag1 = numpy.sqrt(numpy.sum(vec1_x * vec1_x + vec1_y * vec1_y))
    mag2 = numpy.sqrt(numpy.sum(vec2_x * vec2_x + vec2_y * vec2_y))
    return dot / (mag1 * mag2)

plt.figure(figsize=(15, 15))

delta = numpy.zeros((100, 100))
data_x = numpy.random.randint(0, 100, (100, 100))
data_y = numpy.random.randint(0, 100, (100, 100))

for j in range(100):
    for k in range(100):
        if j <= k:
            dist = similarity_measure((data_x[j].flatten(), data_y[j].flatten()), (data_x[k].flatten(), data_y[k].flatten()))
            delta[j, k] = delta[k, j] = dist

delta = 1-((delta+1)/2)  
delta /= numpy.max(delta)

mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, random_state=0,
               dissimilarity="precomputed", n_jobs=1)
coords = mds.fit(delta).embedding_
print mds.stress_

plt.scatter(coords[:, 0], coords[:, 1], marker='x', s=50, edgecolor='None')
plt.tight_layout()

在我的测试中，打印了以下内容：

263.412196461

并制作了这幅图像：

在不知道最大值的情况下，如何分析这个值？或者如何将其规范化，使其介于0到1之间？

谢谢。

scikit-learn

stress-testing

mds

python

machine-learning

回答 1

Stack Overflow用户

发布于 2020-10-08 22:29:31

在寻找Kruskal压力的同时，我发现了Rakotomalala的法语课程。它包含一个代码示例，该代码似乎计算了正确的Kruskal压力：

import pandas
import numpy
from sklearn import manifold
from sklearn.metrics import euclidean_distances

## Input data format (file.csv) : dissimilarity matrix
#   ;  A  ;  B  ;  C  ;  D  ; E
# A ; 0   ; 0.9 ; 0.8 ; 0.5 ; 0.8
# B ; 0.9 ; 0   ; 0.7 ; 0   ; 1
# C ; 0.8 ; 0.7 ; 0   ; 0.2 ; 0.4
# D ; 0.5 ; 0   ; 0.2 ; 0   ; 0.8
# E ; 0.8 ; 1   ; 0.4 ; 0.8 ; 0


## Load data
data = pandas.read_table("file.csv", ";", header=0, index_col=0)

## MDS
mds = manifold.MDS(n_components=2, random_state=1, dissimilarity="precomputed")
mds.fit(data)
# Coordinates of points in the plan (n_components=2)
points = mds.embedding_

## sklearn Stress
print("sklearn stress :")
print(mds.stress_)
print("")

## Manual calculus of sklearn stress
DE = euclidean_distances(points)
stress = 0.5 * numpy.sum((DE - data.values)**2)
print("Manual calculus of sklearn stress :")
print(stress)
print("")

## Kruskal's stress (or stress formula 1)
stress1 = numpy.sqrt(stress / (0.5 * numpy.sum(data.values**2)))
print("Kruskal's Stress :")
print("[Poor > 0.2 > Fair > 0.1 > Good > 0.05 > Excellent > 0.025 > Perfect > 0.0]")
print(stress1)
print("")

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/36428205

复制

相似问题

问压力属性-- sklearn.manifold.MDS / Python
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问压力属性-- sklearn.manifold.MDS / PythonEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问压力属性-- sklearn.manifold.MDS / Python
EN