文章/答案/技术大牛

发布

社区首页 >问答首页 >ScikitLearn，如何在外部数据集上使用局部线性嵌入

问ScikitLearn，如何在外部数据集上使用局部线性嵌入
EN

Stack Overflow用户

提问于 2021-01-08 20:44:55

回答 1查看 119关注 0票数 0

使用以下站点：https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html#sphx-glr-auto-examples-manifold-plot-lle-digits-py https://scikit-learn.org/stable/auto_examples/manifold/plot_swissroll.html#sphx-glr-auto-examples-manifold-plot-swissroll-py

我设法在MNIST数据集和swissroll数据集上获得了LLE，但不知何故我不知道该如何让它在像https://www.kaggle.com/manufacturingai/predicting-fraud-w-fast-ai这样的外部数据集上运行。

我的尝试如下：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
from matplotlib import offsetbox
from sklearn import (manifold, datasets)

n_neighbors = 30
f_fontsize = 8
data = np.genfromtxt('../content/creditcard.csv', skip_header=True)
features = data[:, :3]
targets = data[:, 3]   # The last column is identified as the target

def plotcreditfraudfig(X, color, X_sr, err):

  fig = plt.figure()

  ax = fig.add_subplot(211, projection='3d')
  ax.scatter(X[:, 0], X[:, 1], X[:, 2],cmap=plt.cm.Spectral)

  ax.set_title("Original data")
  ax = fig.add_subplot(212)
  ax.scatter(X_sr[:, 0], X_sr[:, 1],cmap=plt.cm.Spectral)
  plt.axis('tight')
  plt.xticks([]), plt.yticks([])
  plt.title('Projected data')
  plt.show()

clf = manifold.LocallyLinearEmbedding(n_neighbors=n_neighbors, n_components=2, method='standard')
clf.fit(X=features, y=targets)

print("Done. Reconstruction error: %g" %clf.reconstruction_error_)

X_llecf=clf.transform(X)
plot_embedding(X_llecf, "Locally Linear Embedding")

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-106-91224a1ba194> in <module>()
      1 data = np.genfromtxt('../content/creditcard.csv', skip_header=True)
----> 2 features = data[:, :3]
      3 targets = data[:, 3]   # The last column is identified as the target
      4 
      5 clf = manifold.LocallyLinearEmbedding(n_neighbors=n_neighbors, n_components=2, method='standard')

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

python

scikit-learn

dimensionality-reduction

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-11 23:16:31

我通过将功能和目标更改为：

X_features = data.drop('Class', axis=1)
y_targets = data['Class']

但我还需要做更多的事情:因为矩阵不是半正定的，所以在声明X_features和y_targets之前，我必须清理一些行：

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65629201

复制

相似问题

问ScikitLearn，如何在外部数据集上使用局部线性嵌入
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问ScikitLearn，如何在外部数据集上使用局部线性嵌入EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问ScikitLearn，如何在外部数据集上使用局部线性嵌入
EN