文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在python中索引异常值？

问如何在python中索引异常值？
EN

Stack Overflow用户

提问于 2017-09-19 08:03:00

回答 2查看 1.3K关注 0票数 1

我正在尝试从python中的列表中删除异常值。我希望从原始列表中获得每个异常值的索引值，以便从(另一个)相应的列表中删除它。

简单的例子

我的异常值列表：

y = [1,2,3,4,500] #500 is the outlier; has a index of 4

我的对应列表：

x= [1,2,3,4,5] #I want to remove 5, has the same index of 4

我的结果/目标：

y=[1,2,3,4]

x=[1,2,3,4]

这是我的代码，我想用klist和avglatlist实现同样的效果

import numpy as np

klist=['1','2','3','4','5','6','7','8','4000']
avglatlist=['1','2','3','4','5','6','7','8','9']


klist = np.array(klist).astype(np.float)      
klist=klist[(abs(klist - np.mean(klist))) < (2 * np.std(klist))]

indices=[]
for k in klist:
    if (k-np.mean(klist))>((2*np.std(klist))):
        i=klist.index(k)
        indices.append(i)

print('indices'+str(indices))

avglatlist = np.array(avglatlist).astype(np.float) 


for index in sorted(indices, reverse=True):
    del avglatlist[index]


print(len(klist))
print(len(avglatlist))

python

python-3.x

numpy

machine-learning

outliers

回答 2

Stack Overflow用户

发布于 2017-09-19 12:07:13

如何获取列表中每个异常值的索引值？

假设异常值被定义为与平均值的2个标准差。这意味着您可能希望知道zscore绝对值大于2的列表中的值的索引。

我会使用np.where

import numpy as np
from scipy.stats import zscore

klist = np.array([1, 2, 3, 4, 5, 6, 7, 8, 4000])
avglatlist = np.arange(1, klist.shape[0] + 1)

indices = np.where(np.absolute(zscore(klist)) > 2)[0]
indices_filter = [i for i,n in enumerate(klist) if i not in indices]
print(avglatlist[indices_filter])

如果您实际上不需要知道索引，请改用布尔掩码：

import numpy as np
from scipy.stats import zscore

klist = np.array([1, 2, 3, 4, 5, 6, 7, 8, 4000])
avglatlist = np.arange(1, klist.shape[0] + 1)

mask = np.absolute(zscore(klist)) > 2
print(avglatlist[~mask])

两种解决方案都打印：

[1 2 3 4 5 6 7 8]

票数 1

Stack Overflow用户

发布于 2017-09-19 08:34:54

你们真的很接近了。您所需要做的就是将相同的过滤机制应用于avglatlist的numpy版本。为了清楚起见，我更改了几个变量名。

import numpy as np

klist = ['1', '2', '3', '4', '5', '6', '7', '8', '4000']
avglatlist = ['1', '2', '3', '4', '5', '6', '7', '8', '9']


klist_np = np.array(klist).astype(np.float)
avglatlist_np = np.array(avglatlist).astype(np.float)    

klist_filtered = klist_np[(abs(klist_np - np.mean(klist_np))) < (2 * np.std(klist_np))]
avglatlist_filtered = avglatlist_np[(abs(klist_np - np.mean(klist_np))) < (2 * np.std(klist_np))]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46289800

复制

相似问题

问如何在python中索引异常值？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中索引异常值？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中索引异常值？
EN