Python 为什么dict比set更快地进行包含测试?

Python 为什么dict比set更快地进行包含测试?,python,python-3.x,dictionary,set,Python,Python 3.x,Dictionary,Set,我知道在幕后,Python集和Python dict非常相似。阅读他们的反应来源-而且-很明显,他们在做几乎相同的查找。在阅读中,我决定通过将以下模型拼凑在一起,来测试作者关于“设置查找比dict查找快”的说法: from timeit import timeit import random universe = range(1,100000) keys = random.sample(universe, 50000) lookups = random.sample(universe, 500

我知道在幕后,Python集和Python dict非常相似。阅读他们的反应来源-而且-很明显,他们在做几乎相同的查找。在阅读中,我决定通过将以下模型拼凑在一起,来测试作者关于“设置查找比dict查找快”的说法:

from timeit import timeit
import random

universe = range(1,100000)
keys = random.sample(universe, 50000)
lookups = random.sample(universe, 50000)
dict_set = dict((k,True) for k in keys)
set_set = set(keys)

def dict_lookup():
    for l in lookups:
        l in dict_set

def set_lookup():
    for l in lookups:
        l in set_set

if __name__ == '__main__':
    set_victories = 0
    dict_victories = 0
    for i in range(100):
        dict_time = timeit('dict_lookup()', setup="from __main__ import dict_lookup", number=10000)
        set_time = timeit('set_lookup()', setup="from __main__ import set_lookup", number=10000)
        print("dict time: {}".format(dict_time))
        print("set time:  {}".format(set_time))
        if set_time < dict_time:
            set_victories += 1
        else:
            dict_victories += 1
    print("Sets were faster in  {} trials".format(set_victories))
    print("Dicts were faster in {} trials".format(dict_victories))

当我测试你的代码时,我认为数字可能有点小。所以我把它们增加了10倍,让random.sample有一个1/100的数字比率

import random
from time import time


def timeit(func):
    def wrap(*args):
        start = time()
        result = func(*args)
        return time()-start
    return wrap


def get_set_and_dict():
    universe = range(1, 10**8)
    keys = random.sample(universe, 10**6)
    lookups = random.sample(universe,10**6)
    dict_set = dict((k,True) for k in keys)
    set_set = set(keys)
    return dict_set, set_set, lookups


@timeit
def test(container, lookups):

    for i in lookups:
        a = i in container


def main():
    dict_set, set_set, lookups = get_set_and_dict()
    acc_set = acc_dict = 0
    rounds = 100
    for _ in range(rounds):
        acc_dict += test(dict_set, lookups)
        acc_set += test(set_set, lookups)
    print("Set time: {:.4f}s\n Dict time: {:.4f}s".format(acc_set/rounds, acc_dict/rounds))

if __name__ == '__main__':
    main()

>> Set time: 0.1263s
>> Dict time: 0.1578s
但如果set和dict不同,这是有道理的,因为它们即使相似,也不是同一件事



也许只是取决于你如何设置你的实验,结论会有所不同

测试函数在dict/集合中进行50000次查找。他们还在globals dict中对dict/set本身进行50000次查找。如果
dict\u set
set\u set
的查找时间存在系统性差异(其中一个可能发生了哈希冲突),那么很容易就会淹没您试图测量的差异。尝试声明函数,如
def dict_lookup(dict_set=dict_set):
这样dict/set现在是一个本地查找,速度应该更一致(它基本上是一个列表索引)。@jasonharper,我刚刚在我的机器上做了您建议的更改,我担心这会使情况更加奇怪。看看这些数字,我的set查找现在大约需要50.5秒,dict查找大约需要48.0秒。您似乎提供了经验证据,证明dict键查找稍微快一点,这很有趣。我没有发现相反的证据,您提供的似乎是一个很好的迹象,表明dict键散列查找稍微快一点(至少平均如此)——注意,您只测试了整数键。试试浮子和绳子@卡巴纳斯,好主意。在运行此操作之前,我跨
universe
映射了
str
,因此所有字符串都应该在Python的Interment缓存中。然而,更多的观察显示,一组相当可靠地需要41.75秒,而听写则需要40.25秒。我会在一点时间内做浮动和非固定字符串。@kabanus,有趣!浮动在ints上也做了同样的事情,但是向后。最初,dicts约为50.9,set为50.8。大概是一堆缓存、JIT或其他东西完成后,dicts使用了50.5并设置了50.0。他们翻了个身。
$ python3 set-vs-dict.py
dict time: 57.754860900342464
set time:  56.8056653002277
dict time: 50.8890880998224
set time:  50.642351899296045
dict time: 49.936297399923205
set time:  50.66272980067879
dict time: 49.92973940074444
set time:  50.65518939960748
dict time: 49.949383799917996
set time:  50.66877659969032
dict time: 49.93578719999641
set time:  50.64872649963945
dict time: 49.96432110015303
set time:  50.676835800521076
dict time: 49.95099350064993
set time:  50.64867010060698
dict time: 49.98275039996952
set time:  50.648987299762666
dict time: 49.92164439987391
set time:  50.66931669972837
dict time: 49.98953749984503
set time:  50.652459900826216
dict time: 49.95234560035169
set time:  50.65124330017716
dict time: 49.98174169939011
set time:  50.6712632002309
dict time: 49.93824000004679
set time:  50.65437529981136
dict time: 49.95089349988848
set time:  50.65370349958539
dict time: 49.963413699530065
set time:  50.65550949983299
dict time: 49.955208600498736
set time:  50.66121090017259
dict time: 49.94347499962896
set time:  50.64449250046164
dict time: 49.95420549996197
set time:  50.66687630023807
dict time: 49.92143050022423
set time:  50.64667259994894
dict time: 50.05037229973823
set time:  50.67966340016574
dict time: 49.93846719991416
set time:  50.64651320036501
dict time: 49.921281000599265
set time:  50.67906459979713
dict time: 49.942994699813426
set time:  50.65166569966823
dict time: 49.94313340075314
set time:  50.656177499331534
dict time: 49.94610709976405
set time:  50.65122799947858
dict time: 49.93874369934201
set time:  50.661101600155234
dict time: 49.94996269978583
set time:  50.63938449975103
dict time: 49.9602530002594
set time:  50.65474760066718
dict time: 49.91891669947654
set time:  50.663624899461865
dict time: 49.959330099634826
set time:  50.653377699665725
dict time: 49.98555530048907
set time:  50.64655719976872
dict time: 49.945239200256765
set time:  50.65128379967064
dict time: 49.95342260040343
set time:  50.65899199992418
dict time: 49.92802210059017
set time:  50.67100259941071
dict time: 49.942902400158346
set time:  50.74889140017331
dict time: 49.994800799526274
set time:  50.731577299535275
dict time: 49.98310230020434
set time:  50.747778999619186
dict time: 49.99376400001347
set time:  50.73122859932482
dict time: 50.00640409998596
set time:  50.68737949989736
dict time: 49.94556000083685
set time:  50.722481600008905
dict time: 49.98192979954183
set time:  50.72525530029088
dict time: 49.99698970001191
set time:  50.736096899956465
dict time: 49.94320739991963
set time:  50.71096289996058
dict time: 49.972679699771106
set time:  50.71838010009378
dict time: 49.957800599746406
set time:  50.747396499849856
dict time: 49.97235369961709
set time:  50.69941039942205
dict time: 49.951399500481784
set time:  50.647985899820924
dict time: 49.94027389958501
set time:  50.66828709933907
dict time: 49.94174600020051
set time:  50.65279300045222
dict time: 49.96716000046581
set time:  50.64943030010909
dict time: 49.95117200072855
set time:  50.65525580011308
dict time: 49.962328700348735
set time:  50.66319840028882
dict time: 49.960031100548804
set time:  50.672181099653244
dict time: 49.93908840045333
set time:  50.651302699930966
dict time: 49.94130470044911
set time:  50.655242399312556
dict time: 50.04310019966215
set time:  50.67391949985176
dict time: 49.93010629992932
set time:  50.64970660023391
dict time: 49.991717299446464
set time:  50.65591560024768
dict time: 49.952454400248826
set time:  50.649492600001395
dict time: 49.92677689995617
set time:  50.635977199301124
dict time: 49.95432769972831
set time:  50.64075019955635
dict time: 49.94808299932629
set time:  50.664196100085974
dict time: 49.966013699769974
set time:  50.649582100100815
dict time: 49.9813024001196
set time:  50.64982909988612
dict time: 49.93897459935397
set time:  50.66509110014886
dict time: 49.95878900028765
set time:  50.649003400467336
dict time: 49.96674569975585
set time:  50.69693780038506
dict time: 49.91303739976138
set time:  50.675189800560474
dict time: 49.950330699793994
set time:  50.64532170072198
dict time: 49.95022019930184
set time:  50.65448010060936
dict time: 49.95197269972414
set time:  50.65391890052706
dict time: 49.94361769966781
set time:  50.67086180020124
dict time: 49.95455109979957
set time:  50.670443600043654
dict time: 49.94633509963751
set time:  50.65955980028957
dict time: 49.967472000047565
set time:  50.66301089990884
dict time: 49.95830660033971
set time:  50.67482869978994
dict time: 49.984512499533594
set time:  50.67321899998933
dict time: 50.01141999941319
set time:  50.84260869957507
dict time: 50.31206789985299
set time:  51.02959220018238
dict time: 50.28449110034853
set time:  51.03110689949244
dict time: 50.303432799875736
set time:  51.02032170072198
dict time: 50.281682999804616
set time:  51.05188430007547
dict time: 50.30898350011557
set time:  51.01742030028254
dict time: 50.3027657000348
set time:  51.02114639990032
dict time: 50.00038649979979
set time:  50.65360379964113
dict time: 49.93306410033256
set time:  50.63413709960878
dict time: 49.95266539976001
set time:  50.65499630011618
dict time: 49.94854210037738
set time:  50.703547400422394
dict time: 49.96691229939461
set time:  50.69470370002091
dict time: 49.95223430078477
set time:  50.70982529968023
dict time: 49.954243999905884
set time:  50.791720499284565
dict time: 49.97948960028589
set time:  50.69436000008136
dict time: 49.98102519940585
set time:  50.73820179980248
dict time: 49.96782180014998
set time:  50.722959300503135
dict time: 49.9863857999444
set time:  50.70789400022477
dict time: 49.9592831004411
set time:  50.707397900521755
dict time: 49.94034240022302
set time:  50.667025099508464
dict time: 49.96215169969946
set time:  50.72984409984201
dict time: 49.98776920046657
set time:  50.72097889985889
Sets were faster in  2 trials
Dicts were faster in 98 trials
import random
from time import time


def timeit(func):
    def wrap(*args):
        start = time()
        result = func(*args)
        return time()-start
    return wrap


def get_set_and_dict():
    universe = range(1, 10**8)
    keys = random.sample(universe, 10**6)
    lookups = random.sample(universe,10**6)
    dict_set = dict((k,True) for k in keys)
    set_set = set(keys)
    return dict_set, set_set, lookups


@timeit
def test(container, lookups):

    for i in lookups:
        a = i in container


def main():
    dict_set, set_set, lookups = get_set_and_dict()
    acc_set = acc_dict = 0
    rounds = 100
    for _ in range(rounds):
        acc_dict += test(dict_set, lookups)
        acc_set += test(set_set, lookups)
    print("Set time: {:.4f}s\n Dict time: {:.4f}s".format(acc_set/rounds, acc_dict/rounds))

if __name__ == '__main__':
    main()

>> Set time: 0.1263s
>> Dict time: 0.1578s