文章/答案/技术大牛

发布

社区首页 >问答首页 >隔离林

问隔离林
EN

Stack Overflow用户

提问于 2017-07-06 14:20:48

回答 1查看 2K关注 0票数 4

目前，我正在使用Python中的IsolationForest方法识别数据集中的异常值，但不完全理解sklearn上的示例：

forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py

具体来说，图到底给我们展示了什么？这些观测已经被定义为正常/离群值--所以我假设等高线图的阴影表明观测结果是否确实是一个异常值(例如，异常分数较高的观测在较暗的阴影区域？)。

最后，下面的代码部分是如何实际使用的(特别是y_pred函数)？

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

我猜，如果有人想打印输出，那么它只是为了完整性而提供的吗？

提前感谢您的帮助！

outliers

anomaly-detection

python

scikit-learn

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-07-06 14:39:14

对于每一次观测，它都说明是否(+1或-1**)应根据拟合模型将其视为孤立点。

使用Iris数据的简单示例

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

rng = np.random.RandomState(42)
data = load_iris()

X=data.data
y=data.target
X_outliers = rng.uniform(low=-4, high=4, size=(X.shape[0], X.shape[1]))

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=0)

clf = IsolationForest(random_state=0)
clf.fit(X_train)

y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

print(y_pred_test)
print(y_pred_outliers)

结果：

[-1 -1 -1 -1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1 -1
  1 -1 -1  1 -1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1 -1  1]

[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
 -1 -1 -1 -1 -1 -1]

Interpretation:

print(y_pred_test)返回1和-1。这意味着X_test 的一些样本不是异常值，有些是 (来源)。

另一方面，print(y_pred_outliers)只返回-1.这意味着X_outliers 的所有样本(虹膜数据共计150个)都是异常值。

使用您的代码

在您的代码之后，只需打印y_pred_outliers

# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers) 

print(y_pred_outliers)

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44951597

复制

相似问题

问隔离林
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问隔离林EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问隔离林
EN