Python 在Numpy数组中寻找非唯一元素的索引_Python_Arrays_Python 2.7_Numpy

Python 在Numpy数组中寻找非唯一元素的索引

python arrays python-2.7 numpy

Python 在Numpy数组中寻找非唯一元素的索引,python,arrays,python-2.7,numpy,Python,Arrays,Python 2.7,Numpy,我还发现了其他方法，例如从数组中删除重复元素。我的要求略有不同。如果我从以下几点开始： array([[1, 2, 3], [2, 3, 4], [1, 2, 3], [3, 2, 1], [3, 4, 5]]) 最后，我想说： array([[2, 3, 4], [3, 2, 1] [3, 4, 5]]) 这是我最终想要的结果，但还有一个额外的要求。我还想存储一个要丢弃的索引数组，或者保存一个la nump

我还发现了其他方法，例如从数组中删除重复元素。我的要求略有不同。如果我从以下几点开始：

array([[1, 2, 3],
       [2, 3, 4],
       [1, 2, 3],
       [3, 2, 1],
       [3, 4, 5]])

最后，我想说：

array([[2, 3, 4],
       [3, 2, 1]
       [3, 4, 5]])

这是我最终想要的结果，但还有一个额外的要求。我还想存储一个要丢弃的索引数组，或者保存一个la numpy.take

我正在使用Numpy 1.8.1

我们希望在保留顺序的同时查找数组中不重复的行

我使用它将a的每一行组合成一个元素，这样我们就可以使用np.unique、return\u index=True、return\u inverse=True找到唯一的行。然后，我对其进行了修改，以使用索引和反转输出唯一行的计数。从那里，我可以选择计数=1的所有唯一行

对于np.version>=1.9

如果要删除存在于重复版本中的所有元素实例，可以遍历数组，查找存在于多个版本中的元素索引，最后删除这些：

# The array to check:
array = numpy.array([[1, 2, 3],
        [2, 3, 4],
        [1, 2, 3],
        [3, 2, 1],
        [3, 4, 5]])

# List that contains the indices of duplicates (which should be deleted)
deleteIndices = []

for i in range(0,len(array)): # Loop through entire array
    indices = range(0,len(array)) # All indices in array
    del indices[i] # All indices in array, except the i'th element currently being checked

for j in indexes: # Loop through every other element in array, except the i'th element, currently being checked
    if(array[i] == array[j]).all(): # Check if element being checked is equal to the j'th element
        deleteIndices.append(j) # If i'th and j'th element are equal, j is appended to deleteIndices[]

# Sort deleteIndices in ascending order:
deleteIndices.sort()

# Delete duplicates
array = numpy.delete(array,deleteIndices,axis=0)

这将产生：

>>> array
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

>>> deleteIndices
[0, 2]

这样，您就可以删除重复项并获得一个要丢弃的索引列表。

软件包免责声明：我是其作者，可用于以矢量化方式解决此类问题：

index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

您可以按以下步骤进行操作：

假设您的数组是 uniq，uniq_idx，counts=np.uniquea，axis=0，return_index=True，return_counts=True 返回所需的数组 new_arr=uniq[计数=1] 非唯一行的索引 a_idx=np.arangea.shape[0]数组a的索引 nuniq_idx=a_idx[np.in1da_idx，uniq_idx[counts==1]，invert=True] 你会得到：

#new_arr
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

# nuniq_idx
array([0, 2])

您可以使用建议的方法计算每行显示的时间，例如，和。我想这就是你的问题所在。@ajcr我不能使用return\u计数，所以我不需要1。不幸的是，2似乎需要排序数组，我需要保留顺序。@codedog这两个答案中有一个有用吗？如果没有，您能告诉我们您还需要什么，除了请求排除[1,2,3]，因为它发生的时间超过once@DanPatterson感谢您指出这一点，我已经编辑了我的解决方案。昨天我发现我们已经在C扩展中实现了这一点。我没有明确测试这个解决方案，但它看起来与这里实现的非常相似。这就是我接受它作为解决方案的原因。谢谢

index = npi.as_index(arr)
keep = index.count == 1
discard = np.invert(keep)
print(index.unique[keep])

#new_arr
array([[2, 3, 4],
       [3, 2, 1],
       [3, 4, 5]])

# nuniq_idx
array([0, 2])