Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cassandra/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从integer获取Cassandra python驱动程序中的3哈希_Python_Cassandra - Fatal编程技术网

从integer获取Cassandra python驱动程序中的3哈希

从integer获取Cassandra python驱动程序中的3哈希,python,cassandra,Python,Cassandra,我想用下面的代码将一个整数分区转换为3哈希 from cassandra.metadata import Murmur3Token a = 3202012 h = Murmur3Token.hash_fn(a) 但我犯了以下错误 类型为“int”的对象没有len() 字符串没有问题。如果你不想读我的帖子,简而言之,最简单的解决方案就是对整数a使用hex(a),即h=murrur3token.hash_fn(hex(a)) 您的错误是由于multirry3token.hash\u fn(…)未

我想用下面的代码将一个整数分区转换为3哈希

from cassandra.metadata import Murmur3Token

a = 3202012
h = Murmur3Token.hash_fn(a)
但我犯了以下错误

类型为“int”的对象没有len()


字符串没有问题。

如果你不想读我的帖子,简而言之,最简单的解决方案就是对整数
a
使用
hex(a)
,即
h=murrur3token.hash_fn(hex(a))

您的错误是由于
multirry3token.hash\u fn(…)
未为integer(
int
)Python类型实现

通常只对任意大小的单个字节块进行散列,而不是对任何其他非字节类型进行散列。其他类型通常转换为字节,这称为

您的函数同时支持
bytes
str
类型。你必须把你的数字转换成这两种类型中的一种。我已经实施了下面三种转换解决方案(变体0、1、2):

  • 变量
    0
    正在转换为十六进制字符串,只需执行
    hex(n)
    。这是使用最简单的解决方案,但对于大整数,它可能比其他解决方案慢,但对于小整数,它会更快。这是一个非常稳定的解决方案,意味着十六进制表示和哈希值永远不会改变
  • 变量
    1
    正在通过
    pickle.dumps(n)
    转换为字节。这比以前的解决方案要复杂一些,需要模块
    pickle
    。对于大整数,它也更快,但对于小整数,它更慢。也可能有点不稳定,因为酸洗格式可能会在一段时间内发生变化,从而产生不同的字节和散列
  • 变量
    2
    正在通过me函数
    转换为字节(n)
    。对于小输入,它甚至比pickle解决方案快1.5x,对于大输入,它的速度几乎相同。此解决方案也很稳定,因为它与pickle格式不同,我的格式不会更改(除非您修改我的函数)
  • 通常,如果您只想简单,或者如果整数小于1024位,那么只需使用
    hex(n)
    。如果您的整数非常大(超过1024位),请使用my function
    to_bytes(n)
    ,或者您也可以仍然使用
    hex(n)
    它只是
    2.5x
    慢一些,尽管如果您有大量数据,这可能是相当大的慢一些,但还是宁愿使用
    而不是_bytes(n)

    注意:我的帖子中描述的所有3种不同的序列化函数/方式都会产生不同的哈希值。您只需选择其中一个,并始终将其用于您的数字,以便每次产生相同的结果

    我的代码需要通过
    python-m pip install cassandra驱动程序timerit matplotlib
    命令行一次性安装下一个python模块,模块
    timerit
    matplotlib
    仅用于时间/速度测量,以后不需要用于实际的哈希计算

    下面是1)代码2)时间测量图3)控制台输出


    整数应该以某种方式转换为字节或字符串,因为大多数散列算法只处理字节,所以请执行durru3token.hash_fn(str(a))我将3函数用于相同的输入(n=3202012),但结果不一样。杂音3token.hash_fn(hex(n))=>8130291462150033527杂音3token.hash_fn(to_bytes(n))=>7849657241401383516杂音3token.hash_fn(pickle.dumps(a))=>1742844748022614520。Cassandra还具有返回令牌的内置函数。结果是:-2971817988961560522@ghanad是的,我描述了将int序列化为字符串或字节的不同方法,因为这是3种不同的序列化,它们产生完全不同的字节序列,因此这些字节的哈希将完全不同。我只提供了3种不同的方法,以便作者可以选择其中一种,然后使用相同的序列化函数将为相同的整数提供相同的结果/哈希。
    # Needs: python -m pip install cassandra-driver timerit matplotlib
    from cassandra.metadata import Murmur3Token
    
    # Input
    n = 3202012
    
    # ---------- Variant 0, convert to string ----------
    print(Murmur3Token.hash_fn(hex(n)))
    
    # ---------- Variant 1, convert to bytes using pickle ----------
    import pickle
    print(Murmur3Token.hash_fn(pickle.dumps(n)))
    
    # ---------- Variant 2, convert signed int to min num of bytes ----------
    def to_bytes(n, order = 'little'): # order can be 'little' or 'big'
        # Zig-Zag encode signed integer to unsigned integer, i.e. map
        # 0 to 0, -1 to 1, 1 to 2, -2 to 3, 2 to 4, -3 to 5, 3 to 6, etc
        n = (n << 1) if n >= 0 else (((-n - 1) << 1) | 1)
        return n.to_bytes((n.bit_length() + 7) // 8, order)
        
    print(Murmur3Token.hash_fn(to_bytes(n)))
    
    
    
    # ---------- Time/Speed measure and Visualizing ----------
    
    import random
    from timerit import Timerit
    random.seed(0)
    Timerit._default_asciimode = True
    ncycle = 16
    
    def round_fixed(n, c):
        s = str(round(n, c))
        return s + '0' * (c - (len(s) - 1 - s.rfind('.')))
    
    stats = []
    for bit_len_log in range(0, 21, 1):
        stats.append([])
        bit_len = 1 << bit_len_log
        n = random.randrange(1 << bit_len)
        num_runs = round(max(1, 2 ** 11 / bit_len)) * 3
        print('bit length =', bit_len if bit_len < 1024 else f'{round(bit_len / 1024)}Ki')
        rt = None
        for fi, (f, fn)  in enumerate([
            #(str, 'str'),
            (hex, 'hex'),
            (lambda x: pickle.dumps(x), 'pickle'),
            (to_bytes, 'to_bytes'),
        ]):
            print(f'var{fi} ({str(fn).ljust(len("to_bytes"))}): ', end = '', flush = True)
            tim = Timerit(num = num_runs, verbose = 0)
            for t in tim:
                for i in range(ncycle):
                    Murmur3Token.hash_fn(f(n))
            ct = tim.mean() / ncycle
            print(f'{round_fixed(ct * 10 ** 6, 2)} mcs', end = '')
            if rt is None:
                rt = ct
                print()
            else:
                print(f', speedup {round(rt / ct, 2)}x')
            stats[-1].append({
                'bll': bit_len_log, 'fi': fi, 'fn': fn, 't': ct * 10 ** 6, 'su': rt / ct,
            })
    
    import math, matplotlib.pyplot as plt
    plt.rcParams['figure.figsize'] = (9.6, 5.4)
    
    for yt in ['t']:
        plt.xlabel('bit len')
        plt.yscale('log')
        plt.ylabel('time, mcs')
        for i in range(len(stats[0])):
            p, = plt.plot([e[i]['bll'] for e in stats], [e[i][yt] for e in stats])
            p.set_label(stats[0][i]['fn'])
        plt.xticks([stats[i][0]['bll'] for i in range(0, len(stats), 2)], [f"2^{stats[i][0]['bll']}" for i in range(0, len(stats), 2)])
        p10f, p10l = [r(3 * math.log(e) / math.log(10)) for e, r in zip(plt.ylim(), (math.floor, math.ceil))]
        pows = [i / 3 for i in range(p10f, p10l + 1)]
        plt.yticks([10. ** p for p in pows], [round(10. ** p, 1) for p in pows])
        plt.legend()
        plt.show()
        plt.clf()
    
    -8130291462150033527
    1742844748022614520
    -7849657241401383516
    bit length = 1
    var0 (hex     ): 1.21 mcs
    var1 (pickle  ): 3.06 mcs, speedup 0.39x
    var2 (to_bytes): 2.15 mcs, speedup 0.56x
    bit length = 2
    var0 (hex     ): 1.25 mcs
    var1 (pickle  ): 3.04 mcs, speedup 0.41x
    var2 (to_bytes): 2.14 mcs, speedup 0.58x
    bit length = 4
    var0 (hex     ): 1.23 mcs
    var1 (pickle  ): 3.03 mcs, speedup 0.41x
    var2 (to_bytes): 2.15 mcs, speedup 0.57x
    bit length = 8
    var0 (hex     ): 1.24 mcs
    var1 (pickle  ): 3.08 mcs, speedup 0.4x
    var2 (to_bytes): 2.19 mcs, speedup 0.56x
    bit length = 16
    var0 (hex     ): 1.26 mcs
    var1 (pickle  ): 2.98 mcs, speedup 0.42x
    var2 (to_bytes): 2.16 mcs, speedup 0.58x
    bit length = 32
    var0 (hex     ): 1.31 mcs
    var1 (pickle  ): 3.05 mcs, speedup 0.43x
    var2 (to_bytes): 2.18 mcs, speedup 0.6x
    bit length = 64
    var0 (hex     ): 1.32 mcs
    var1 (pickle  ): 3.43 mcs, speedup 0.38x
    var2 (to_bytes): 2.18 mcs, speedup 0.61x
    bit length = 128
    var0 (hex     ): 1.40 mcs
    var1 (pickle  ): 3.44 mcs, speedup 0.41x
    var2 (to_bytes): 2.22 mcs, speedup 0.63x
    bit length = 256
    var0 (hex     ): 1.59 mcs
    var1 (pickle  ): 3.47 mcs, speedup 0.46x
    var2 (to_bytes): 2.29 mcs, speedup 0.69x
    bit length = 512
    var0 (hex     ): 1.97 mcs
    var1 (pickle  ): 3.70 mcs, speedup 0.53x
    var2 (to_bytes): 2.47 mcs, speedup 0.8x
    bit length = 1Ki
    var0 (hex     ): 2.69 mcs
    var1 (pickle  ): 4.02 mcs, speedup 0.67x
    var2 (to_bytes): 2.84 mcs, speedup 0.95x
    bit length = 2Ki
    var0 (hex     ): 4.43 mcs
    var1 (pickle  ): 5.35 mcs, speedup 0.83x
    var2 (to_bytes): 3.45 mcs, speedup 1.28x
    bit length = 4Ki
    var0 (hex     ): 7.27 mcs
    var1 (pickle  ): 5.96 mcs, speedup 1.22x
    var2 (to_bytes): 5.16 mcs, speedup 1.41x
    bit length = 8Ki
    var0 (hex     ): 13.66 mcs
    var1 (pickle  ): 8.31 mcs, speedup 1.64x
    var2 (to_bytes): 8.37 mcs, speedup 1.63x
    bit length = 16Ki
    var0 (hex     ): 25.39 mcs
    var1 (pickle  ): 14.27 mcs, speedup 1.78x
    var2 (to_bytes): 13.78 mcs, speedup 1.84x
    bit length = 32Ki
    var0 (hex     ): 48.91 mcs
    var1 (pickle  ): 24.59 mcs, speedup 1.99x
    var2 (to_bytes): 23.80 mcs, speedup 2.06x
    bit length = 64Ki
    var0 (hex     ): 95.75 mcs
    var1 (pickle  ): 43.23 mcs, speedup 2.21x
    var2 (to_bytes): 44.24 mcs, speedup 2.16x
    bit length = 128Ki
    var0 (hex     ): 189.91 mcs
    var1 (pickle  ): 81.09 mcs, speedup 2.34x
    var2 (to_bytes): 84.14 mcs, speedup 2.26x
    bit length = 256Ki
    var0 (hex     ): 376.56 mcs
    var1 (pickle  ): 155.73 mcs, speedup 2.42x
    var2 (to_bytes): 164.22 mcs, speedup 2.29x
    bit length = 512Ki
    var0 (hex     ): 781.47 mcs
    var1 (pickle  ): 318.82 mcs, speedup 2.45x
    var2 (to_bytes): 324.04 mcs, speedup 2.41x
    bit length = 1024Ki
    var0 (hex     ): 1503.77 mcs
    var1 (pickle  ): 608.79 mcs, speedup 2.47x
    var2 (to_bytes): 648.54 mcs, speedup 2.32x