Python 如何避免numpy中的memoryview错误？_Python_Python 3.x_Numpy

Python 如何避免numpy中的memoryview错误？

python python-3.x numpy

Python 如何避免numpy中的memoryview错误？,python,python-3.x,numpy,Python,Python 3.x,Numpy,在此代码段中，train\u dataset，test\u dataset和valid\u dataset属于numpy.ndarray类型 def check_overlaps(images1, images2): images1.flags.writeable=False images2.flags.writeable=False print(type(images1)) print(type(images2)) start = time.clock(

在此代码段中，

train\u dataset

，

test\u dataset

和

valid\u dataset

属于

numpy.ndarray

类型

def check_overlaps(images1, images2):
    images1.flags.writeable=False
    images2.flags.writeable=False
    print(type(images1))
    print(type(images2))
    start = time.clock()
    hash1 = set([hash(image1.data) for image1 in images1])
    hash2 = set([hash(image2.data) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

r, execTime = check_overlaps(train_dataset, test_dataset)    
print("# overlaps between training and test sets:", len(r), "execution time:", execTime)
r, execTime = check_overlaps(train_dataset, valid_dataset)   
print("# overlaps between training and validation sets:", len(r), "execution time:", execTime) 
r, execTime = check_overlaps(valid_dataset, test_dataset) 
print("# overlaps between validation and test sets:", len(r), "execution time:", execTime)

但这会产生以下错误：（格式化为代码以使其可读！）

ValueError回溯（最近一次调用）
在（）
12返回所有重叠，time.clock（）-开始
13
--->14 r，execTime=检查重叠（训练数据集、测试数据集）
15打印（“训练集和测试集之间的重叠：”，len（r），“执行时间：”，execTime）
16 r，execTime=检查重叠（序列数据集，有效数据集）
检查重叠（图像1、图像2）
7打印（类型（图像2））
8开始=时间。时钟（）
---->9 hash1=set（[images1中images1的hash（image1.data）]）
10 hash2=set（[images2中image2的hash（image2.data）]）
11所有重叠=集合交叉点（哈希1，哈希2）
英寸（.0）
7打印（类型（图像2））
8开始=时间。时钟（）
---->9 hash1=set（[images1中images1的hash（image1.data）]）
10 hash2=set（[images2中image2的hash（image2.data）]）
11所有重叠=集合交叉点（哈希1，哈希2）
ValueError:memoryview:哈希限制为格式“B”、“B”或“c”

现在的问题是，我甚至不知道这个错误意味着什么，更不用说考虑纠正它了。有什么帮助吗？

问题是您的散列数组方法只适用于

python2

。因此，只要您尝试计算

散列（image1.data）

，代码就会失败。错误消息告诉您，只支持

memoryview

s格式的无符号字节（

'B'

），字节（

'B'

）的单个字节（

'c'

），我还没有找到一种方法，可以不复制就从

np.ndarray

中获取这样的视图。我想到的唯一方法包括复制数组，这在应用程序中可能不可行，具体取决于数据量。也就是说，您可以尝试将功能更改为：

def check_overlaps(images1, images2):
    start = time.clock()
    hash1 = set([hash(image1.tobytes()) for image1 in images1])
    hash2 = set([hash(image2.tobytes()) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start

是的，你说得对，我正在研究python3+。我将函数

bytes（）

用作：

hash（bytes（image1））

，它工作得非常好。谢谢你的帮助。它来自一个大约200000 MNIST图像的大数据集。@user6692576很高兴我能帮上忙

bytes（）

实际上给出的结果与

np.tobytes（）

完全相同，并且还制作了数据的副本。我怀疑它甚至在内部调用该函数。因此，出于您的目的，您可能可以互换使用它们。另外，

arr.tostring（）==arr.tobytes（）==bytes（arr）

。从我的简单基准测试来看，

hash（bytes（xxx））

似乎比

hash（xxx.tobytes（））

慢得多（大约4倍），因此使用

xxx.tobytes（）

可能是一个更好的主意。

def check_overlaps(images1, images2):
    start = time.clock()
    hash1 = set([hash(image1.tobytes()) for image1 in images1])
    hash2 = set([hash(image2.tobytes()) for image2 in images2])
    all_overlaps = set.intersection(hash1, hash2)
    return all_overlaps, time.clock()-start