从integer获取Cassandra python驱动程序中的3哈希
我想用下面的代码将一个整数分区转换为3哈希从integer获取Cassandra python驱动程序中的3哈希,python,cassandra,Python,Cassandra,我想用下面的代码将一个整数分区转换为3哈希 from cassandra.metadata import Murmur3Token a = 3202012 h = Murmur3Token.hash_fn(a) 但我犯了以下错误 类型为“int”的对象没有len() 字符串没有问题。如果你不想读我的帖子,简而言之,最简单的解决方案就是对整数a使用hex(a),即h=murrur3token.hash_fn(hex(a)) 您的错误是由于multirry3token.hash\u fn(…)未
from cassandra.metadata import Murmur3Token
a = 3202012
h = Murmur3Token.hash_fn(a)
但我犯了以下错误
类型为“int”的对象没有len()
字符串没有问题。如果你不想读我的帖子,简而言之,最简单的解决方案就是对整数
a
使用hex(a)
,即h=murrur3token.hash_fn(hex(a))
您的错误是由于multirry3token.hash\u fn(…)
未为integer(int
)Python类型实现
通常只对任意大小的单个字节块进行散列,而不是对任何其他非字节类型进行散列。其他类型通常转换为字节,这称为
您的函数同时支持bytes
和str
类型。你必须把你的数字转换成这两种类型中的一种。我已经实施了下面三种转换解决方案(变体0、1、2):
0
正在转换为十六进制字符串,只需执行hex(n)
。这是使用最简单的解决方案,但对于大整数,它可能比其他解决方案慢,但对于小整数,它会更快。这是一个非常稳定的解决方案,意味着十六进制表示和哈希值永远不会改变1
正在通过pickle.dumps(n)
转换为字节。这比以前的解决方案要复杂一些,需要模块pickle
。对于大整数,它也更快,但对于小整数,它更慢。也可能有点不稳定,因为酸洗格式可能会在一段时间内发生变化,从而产生不同的字节和散列2
正在通过me函数转换为字节(n)
。对于小输入,它甚至比pickle解决方案快1.5x,对于大输入,它的速度几乎相同。此解决方案也很稳定,因为它与pickle格式不同,我的格式不会更改(除非您修改我的函数)hex(n)
。如果您的整数非常大(超过1024位),请使用my functionto_bytes(n)
,或者您也可以仍然使用hex(n)
它只是2.5x
慢一些,尽管如果您有大量数据,这可能是相当大的慢一些,但还是宁愿使用而不是_bytes(n)
注意:我的帖子中描述的所有3种不同的序列化函数/方式都会产生不同的哈希值。您只需选择其中一个,并始终将其用于您的数字,以便每次产生相同的结果
我的代码需要通过python-m pip install cassandra驱动程序timerit matplotlib
命令行一次性安装下一个python模块,模块timerit
和matplotlib
仅用于时间/速度测量,以后不需要用于实际的哈希计算
下面是1)代码2)时间测量图3)控制台输出
整数应该以某种方式转换为字节或字符串,因为大多数散列算法只处理字节,所以请执行durru3token.hash_fn(str(a))我将3函数用于相同的输入(n=3202012),但结果不一样。杂音3token.hash_fn(hex(n))=>8130291462150033527杂音3token.hash_fn(to_bytes(n))=>7849657241401383516杂音3token.hash_fn(pickle.dumps(a))=>1742844748022614520。Cassandra还具有返回令牌的内置函数。结果是:-2971817988961560522@ghanad是的,我描述了将int序列化为字符串或字节的不同方法,因为这是3种不同的序列化,它们产生完全不同的字节序列,因此这些字节的哈希将完全不同。我只提供了3种不同的方法,以便作者可以选择其中一种,然后使用相同的序列化函数将为相同的整数提供相同的结果/哈希。
# Needs: python -m pip install cassandra-driver timerit matplotlib
from cassandra.metadata import Murmur3Token
# Input
n = 3202012
# ---------- Variant 0, convert to string ----------
print(Murmur3Token.hash_fn(hex(n)))
# ---------- Variant 1, convert to bytes using pickle ----------
import pickle
print(Murmur3Token.hash_fn(pickle.dumps(n)))
# ---------- Variant 2, convert signed int to min num of bytes ----------
def to_bytes(n, order = 'little'): # order can be 'little' or 'big'
# Zig-Zag encode signed integer to unsigned integer, i.e. map
# 0 to 0, -1 to 1, 1 to 2, -2 to 3, 2 to 4, -3 to 5, 3 to 6, etc
n = (n << 1) if n >= 0 else (((-n - 1) << 1) | 1)
return n.to_bytes((n.bit_length() + 7) // 8, order)
print(Murmur3Token.hash_fn(to_bytes(n)))
# ---------- Time/Speed measure and Visualizing ----------
import random
from timerit import Timerit
random.seed(0)
Timerit._default_asciimode = True
ncycle = 16
def round_fixed(n, c):
s = str(round(n, c))
return s + '0' * (c - (len(s) - 1 - s.rfind('.')))
stats = []
for bit_len_log in range(0, 21, 1):
stats.append([])
bit_len = 1 << bit_len_log
n = random.randrange(1 << bit_len)
num_runs = round(max(1, 2 ** 11 / bit_len)) * 3
print('bit length =', bit_len if bit_len < 1024 else f'{round(bit_len / 1024)}Ki')
rt = None
for fi, (f, fn) in enumerate([
#(str, 'str'),
(hex, 'hex'),
(lambda x: pickle.dumps(x), 'pickle'),
(to_bytes, 'to_bytes'),
]):
print(f'var{fi} ({str(fn).ljust(len("to_bytes"))}): ', end = '', flush = True)
tim = Timerit(num = num_runs, verbose = 0)
for t in tim:
for i in range(ncycle):
Murmur3Token.hash_fn(f(n))
ct = tim.mean() / ncycle
print(f'{round_fixed(ct * 10 ** 6, 2)} mcs', end = '')
if rt is None:
rt = ct
print()
else:
print(f', speedup {round(rt / ct, 2)}x')
stats[-1].append({
'bll': bit_len_log, 'fi': fi, 'fn': fn, 't': ct * 10 ** 6, 'su': rt / ct,
})
import math, matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (9.6, 5.4)
for yt in ['t']:
plt.xlabel('bit len')
plt.yscale('log')
plt.ylabel('time, mcs')
for i in range(len(stats[0])):
p, = plt.plot([e[i]['bll'] for e in stats], [e[i][yt] for e in stats])
p.set_label(stats[0][i]['fn'])
plt.xticks([stats[i][0]['bll'] for i in range(0, len(stats), 2)], [f"2^{stats[i][0]['bll']}" for i in range(0, len(stats), 2)])
p10f, p10l = [r(3 * math.log(e) / math.log(10)) for e, r in zip(plt.ylim(), (math.floor, math.ceil))]
pows = [i / 3 for i in range(p10f, p10l + 1)]
plt.yticks([10. ** p for p in pows], [round(10. ** p, 1) for p in pows])
plt.legend()
plt.show()
plt.clf()
-8130291462150033527
1742844748022614520
-7849657241401383516
bit length = 1
var0 (hex ): 1.21 mcs
var1 (pickle ): 3.06 mcs, speedup 0.39x
var2 (to_bytes): 2.15 mcs, speedup 0.56x
bit length = 2
var0 (hex ): 1.25 mcs
var1 (pickle ): 3.04 mcs, speedup 0.41x
var2 (to_bytes): 2.14 mcs, speedup 0.58x
bit length = 4
var0 (hex ): 1.23 mcs
var1 (pickle ): 3.03 mcs, speedup 0.41x
var2 (to_bytes): 2.15 mcs, speedup 0.57x
bit length = 8
var0 (hex ): 1.24 mcs
var1 (pickle ): 3.08 mcs, speedup 0.4x
var2 (to_bytes): 2.19 mcs, speedup 0.56x
bit length = 16
var0 (hex ): 1.26 mcs
var1 (pickle ): 2.98 mcs, speedup 0.42x
var2 (to_bytes): 2.16 mcs, speedup 0.58x
bit length = 32
var0 (hex ): 1.31 mcs
var1 (pickle ): 3.05 mcs, speedup 0.43x
var2 (to_bytes): 2.18 mcs, speedup 0.6x
bit length = 64
var0 (hex ): 1.32 mcs
var1 (pickle ): 3.43 mcs, speedup 0.38x
var2 (to_bytes): 2.18 mcs, speedup 0.61x
bit length = 128
var0 (hex ): 1.40 mcs
var1 (pickle ): 3.44 mcs, speedup 0.41x
var2 (to_bytes): 2.22 mcs, speedup 0.63x
bit length = 256
var0 (hex ): 1.59 mcs
var1 (pickle ): 3.47 mcs, speedup 0.46x
var2 (to_bytes): 2.29 mcs, speedup 0.69x
bit length = 512
var0 (hex ): 1.97 mcs
var1 (pickle ): 3.70 mcs, speedup 0.53x
var2 (to_bytes): 2.47 mcs, speedup 0.8x
bit length = 1Ki
var0 (hex ): 2.69 mcs
var1 (pickle ): 4.02 mcs, speedup 0.67x
var2 (to_bytes): 2.84 mcs, speedup 0.95x
bit length = 2Ki
var0 (hex ): 4.43 mcs
var1 (pickle ): 5.35 mcs, speedup 0.83x
var2 (to_bytes): 3.45 mcs, speedup 1.28x
bit length = 4Ki
var0 (hex ): 7.27 mcs
var1 (pickle ): 5.96 mcs, speedup 1.22x
var2 (to_bytes): 5.16 mcs, speedup 1.41x
bit length = 8Ki
var0 (hex ): 13.66 mcs
var1 (pickle ): 8.31 mcs, speedup 1.64x
var2 (to_bytes): 8.37 mcs, speedup 1.63x
bit length = 16Ki
var0 (hex ): 25.39 mcs
var1 (pickle ): 14.27 mcs, speedup 1.78x
var2 (to_bytes): 13.78 mcs, speedup 1.84x
bit length = 32Ki
var0 (hex ): 48.91 mcs
var1 (pickle ): 24.59 mcs, speedup 1.99x
var2 (to_bytes): 23.80 mcs, speedup 2.06x
bit length = 64Ki
var0 (hex ): 95.75 mcs
var1 (pickle ): 43.23 mcs, speedup 2.21x
var2 (to_bytes): 44.24 mcs, speedup 2.16x
bit length = 128Ki
var0 (hex ): 189.91 mcs
var1 (pickle ): 81.09 mcs, speedup 2.34x
var2 (to_bytes): 84.14 mcs, speedup 2.26x
bit length = 256Ki
var0 (hex ): 376.56 mcs
var1 (pickle ): 155.73 mcs, speedup 2.42x
var2 (to_bytes): 164.22 mcs, speedup 2.29x
bit length = 512Ki
var0 (hex ): 781.47 mcs
var1 (pickle ): 318.82 mcs, speedup 2.45x
var2 (to_bytes): 324.04 mcs, speedup 2.41x
bit length = 1024Ki
var0 (hex ): 1503.77 mcs
var1 (pickle ): 608.79 mcs, speedup 2.47x
var2 (to_bytes): 648.54 mcs, speedup 2.32x