文章/答案/技术大牛

发布

问将重复元素设置为零
EN

Stack Overflow用户

提问于 2015-06-28 06:46:19

回答 4查看 334关注 0票数 2

如何将数组“数据”中的重复元素转换为0？这件事必须一排排地进行。

data = np.array([[1,8,3,3,4],
                 [1,8,9,9,4]])

答案如下：

ans = array([[1,8,3,0,4],
             [1,8,9,0,4]])

numpy

python

回答 4

Stack Overflow用户

回答已采纳

发布于 2015-06-28 06:54:27

方法#1

一种使用np.unique的方法-

# Find out the unique elements and their starting positions
unq_data, idx = np.unique(data,return_index=True)

# Find out the positions for each unique element, their duplicate positions
dup_idx = np.setdiff1d(np.arange(data.size),idx)

# Set those duplicate positioned elemnents to 0s
data[dup_idx] = 0

样本运行-

In [46]: data
Out[46]: array([1, 8, 3, 3, 4, 1, 3, 3, 9, 4])

In [47]: unq_data, idx = np.unique(data,return_index=True)
    ...: dup_idx = np.setdiff1d(np.arange(data.size),idx)
    ...: data[dup_idx] = 0
    ...: 

In [48]: data
Out[48]: array([1, 8, 3, 0, 4, 0, 0, 0, 9, 0])

方法#2

您也可以使用sorting和differentiation作为一种更快的方法-

# Get indices  for sorted data
sort_idx = np.argsort(data)

# Get duplicate indices and set those in data to 0s
dup_idx = sort_idx[1::][np.diff(np.sort(data))==0]
data[dup_idx] = 0

运行时测试-

In [110]: data = np.random.randint(0,100,(10000))
     ...: data1 = data.copy()
     ...: data2 = data.copy()
     ...: 

In [111]: def func1(data):
     ...:     unq_data, idx = np.unique(data,return_index=True)
     ...:     dup_idx = np.setdiff1d(np.arange(data.size),idx)
     ...:     data[dup_idx] = 0
     ...: 
     ...: def func2(data):
     ...:     sort_idx = np.argsort(data)
     ...:     dup_idx = sort_idx[1::][np.diff(np.sort(data))==0]
     ...:     data[dup_idx] = 0
     ...:     

In [112]: %timeit func1(data1)
1000 loops, best of 3: 1.36 ms per loop

In [113]: %timeit func2(data2)
1000 loops, best of 3: 467 µs per loop

扩展到2D情况的 :

方法2可以扩展到2D数组的情况下，避免像这样的循环-

# Get indices  for sorted data
sort_idx = np.argsort(data,axis=1)

# Get sorted linear indices
row_offset = data.shape[1]*np.arange(data.shape[0])[:,None]
sort_lin_idx = sort_idx[:,1::] + row_offset

# Get duplicate linear indices and set those in data as 0s
dup_lin_idx = sort_lin_idx[np.diff(np.sort(data,axis=1),axis=1)==0]
data.ravel()[dup_lin_idx] = 0

样本运行-

In [6]: data
Out[6]: 
array([[1, 8, 3, 3, 4, 0, 3, 3],
       [1, 8, 9, 9, 4, 8, 7, 9],
       [1, 8, 9, 9, 4, 8, 7, 3]])

In [7]: sort_idx = np.argsort(data,axis=1)
   ...: row_offset = data.shape[1]*np.arange(data.shape[0])[:,None]
   ...: sort_lin_idx = sort_idx[:,1::] + row_offset
   ...: dup_lin_idx = sort_lin_idx[np.diff(np.sort(data,axis=1),axis=1)==0]
   ...: data.ravel()[dup_lin_idx] = 0
   ...: 

In [8]: data
Out[8]: 
array([[1, 8, 3, 0, 4, 0, 0, 0],
       [1, 8, 9, 0, 4, 0, 7, 0],
       [1, 8, 9, 0, 4, 0, 7, 3]])

票数 3

Stack Overflow用户

发布于 2015-06-28 06:54:34

下面是一种简单的纯Python方法：

seen = set()
for i, x in enumerate(data):
    if x in seen:
        data[i] = 0
    else:
        seen.add(x)

票数 2

Stack Overflow用户

发布于 2015-06-28 07:02:21

您可以使用嵌套的for循环，将数组的每个元素与每个其他元素进行比较，以检查是否存在重复记录。语法可能有点错误，因为我对numpy并不十分熟悉。

for x in range(0, len(data))
   for y in range(x+1, len(data))
      if(data[x] == data[y])
         data[x] = 0

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31096939

复制

相似问题

问将重复元素设置为零
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将重复元素设置为零EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将重复元素设置为零
EN