Python 一个求职面试学习指南告诉我用数组构建一个哈希？_Python

Python 一个求职面试学习指南告诉我用数组构建一个哈希？

python

Python 一个求职面试学习指南告诉我用数组构建一个哈希？,python,Python,如何仅使用python中的数组构建快速查找且无冲突的字典我有超过5年的经验，为数以百万计的用户构建了功能，花费了1000万美元，但根本不知道从哪里开始。这可能是一个很好的起点。仅使用数组的自定义词典的实现。但这有很多局限性。值得注意的限制是：可能会发生碰撞。正如在评论中已经提到的，如果我们想要一个没有冲突的字典，我认为键中应该有某种形式的约束，我们可以在散列期间（换句话说，在选择目标bucket/array_位置期间）将其用作属性，以确保没有单个key:value存在于同一个bucket中

如何仅使用python中的数组构建快速查找且无冲突的字典

我有超过5年的经验，为数以百万计的用户构建了功能，花费了1000万美元，但根本不知道从哪里开始。

这可能是一个很好的起点。仅使用数组的自定义词典的实现。但这有很多局限性。值得注意的限制是：

可能会发生碰撞。正如在评论中已经提到的，如果我们想要一个没有冲突的字典，我认为键中应该有某种形式的约束，我们可以在散列期间（换句话说，在选择目标bucket/array_位置期间）将其用作属性，以确保没有单个key:value存在于同一个bucket中。约束的示例可能是每个键具有唯一的长度，或者每个键以不同的字符开头，或者每个键在其内容（转换为整数）相加时具有不同的整数和，等等
存储桶/阵列大小不调整大小。目前尺寸为100。因此，如果您存储101个项目，则可以保证至少发生1次碰撞
密钥支持的类型当前仅限于int和str
哈希算法可以改进。目前，它对int不做任何处理，而对于str，它只对每个字符的ASCII码执行xor

一般来说，该算法平均运行O（1），但在最坏情况下仍然运行O（n）。设置新关键点只需要获取数组中的目标位置并将其放置在那里，而获取值则计算相同的目标位置并获取其中的值

代码

def my_hash(variable):
    """Currently, only types int and str are supported"""
    if isinstance(variable, int):
        return round(variable)
    elif isinstance(variable, str):
        if len(variable) == 0: return 0
        result = ord(variable[0])
        for index in range(1, len(variable)):
            result ^= ord(variable[index])
        return result
    return 0

class MyDict:
    def __init__(self):
        INITIAL_ARRAY_SIZE = 100
        self.internal_array = [[] for _ in range(INITIAL_ARRAY_SIZE)]

    def set(self, key, value):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # In the case that the bucket at self.internal_array[array_index] is not empty and there is
        # a new key to be placed on it, then there is a collision. At this point, it would be O(n)
        # where n is the number of colliding keys. This could perhaps be improved into O(log n) if
        # we maintain each bucket as sorted, or even O(n) if we maintain a secondary hash table on
        # each bucket which of course requires a second hasher logic.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                item[1] = value
                break
            print("WARNING: Collision detected for key", key)
        else:
             self.internal_array[array_index].append([key, value])   

    def get(self, key):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # As noted above in set method, there is a collision if the self.internal_array[array_index]
        # bucket contains more than one key:value pair.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                return item[1]
        return None

    def show(self):
        print("Showing buckets with contents:")
        for index, bucket in enumerate(self.internal_array):
            if len(bucket) == 0: continue
            for item in bucket:
                print("Bucket[", index, "] ==> key:", item[0], "value:", item[1])

        print("Showing all buckets:")
        print(self.internal_array)

obj = MyDict()

print("Setting keys...")
obj.set("GOAT", "Michael Jordan")
obj.set("humans", "The Wildlife Killers")
obj.set("py-2", "sorry, end of life")
obj.set("GOAT", "LeBron James")
obj.set(69, "an ordinary number")
obj.set("Linus", "Gates")
obj.set(12, "los meses en un ano")
obj.set("beaver", "the best animal out there!")

print("\nGetting keys...")
print("Linus is", obj.get("Linus"))
print("12 is", obj.get(12))
print("GOAT is", obj.get("GOAT")) # key with an updated value
print("69 is", obj.get(69))
print("humans is", obj.get("humans"))
print("beaver is", obj.get("beaver"))
print("py-2 is", obj.get("py-2"))
print("COVID is", obj.get("COVID")) # non-existent key

print("\nDisplaying everything...")
obj.show()

输出

Setting keys...
WARNING: Collision detected for key 12

Getting keys...
Linus is Gates
12 is los meses en un ano
GOAT is LeBron James
69 is an ordinary number
humans is The Wildlife Killers
beaver is the best animal out there!
py-2 is sorry, end of life
COVID is None

Displaying everything...
Showing buckets with contents:
Bucket[ 7 ] ==> key: beaver value: the best animal out there!
Bucket[ 12 ] ==> key: humans value: The Wildlife Killers
Bucket[ 12 ] ==> key: 12 value: los meses en un ano
Bucket[ 22 ] ==> key: py-2 value: sorry, end of life
Bucket[ 29 ] ==> key: GOAT value: LeBron James
Bucket[ 69 ] ==> key: 69 value: an ordinary number
Bucket[ 77 ] ==> key: Linus value: Gates
Showing all buckets:
[[], [], [], [], [], [], [], [['beaver', 'the best animal out there!']], [], [], [], [], [['humans', 'The Wildlife Killers'], [12, 'los meses en un ano']], [], [], [], [], [], [], [], [], [], [['py-2', 'sorry, end of life']], [], [], [], [], [], [], [['GOAT', 'LeBron James']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [[69, 'an ordinary number']], [], [], [], [], [], [], [], [['Linus', 'Gates']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

这也许是一个很好的起点。仅使用数组的自定义词典的实现。但这有很多局限性。值得注意的限制是：

可能会发生碰撞。正如在评论中已经提到的，如果我们想要一个没有冲突的字典，我认为键中应该有某种形式的约束，我们可以在散列期间（换句话说，在选择目标bucket/array_位置期间）将其用作属性，以确保没有单个key:value存在于同一个bucket中。约束的示例可能是每个键具有唯一的长度，或者每个键以不同的字符开头，或者每个键在其内容（转换为整数）相加时具有不同的整数和，等等
存储桶/阵列大小不调整大小。目前尺寸为100。因此，如果您存储101个项目，则可以保证至少发生1次碰撞
密钥支持的类型当前仅限于int和str
哈希算法可以改进。目前，它对int不做任何处理，而对于str，它只对每个字符的ASCII码执行xor

代码

def my_hash(variable):
    """Currently, only types int and str are supported"""
    if isinstance(variable, int):
        return round(variable)
    elif isinstance(variable, str):
        if len(variable) == 0: return 0
        result = ord(variable[0])
        for index in range(1, len(variable)):
            result ^= ord(variable[index])
        return result
    return 0

class MyDict:
    def __init__(self):
        INITIAL_ARRAY_SIZE = 100
        self.internal_array = [[] for _ in range(INITIAL_ARRAY_SIZE)]

    def set(self, key, value):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # In the case that the bucket at self.internal_array[array_index] is not empty and there is
        # a new key to be placed on it, then there is a collision. At this point, it would be O(n)
        # where n is the number of colliding keys. This could perhaps be improved into O(log n) if
        # we maintain each bucket as sorted, or even O(n) if we maintain a secondary hash table on
        # each bucket which of course requires a second hasher logic.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                item[1] = value
                break
            print("WARNING: Collision detected for key", key)
        else:
             self.internal_array[array_index].append([key, value])   

    def get(self, key):
        hash_value = my_hash(key)
        array_index = hash_value % len(self.internal_array)

        # As noted above in set method, there is a collision if the self.internal_array[array_index]
        # bucket contains more than one key:value pair.
        for item in self.internal_array[array_index]:
            if item[0] == key:
                return item[1]
        return None

    def show(self):
        print("Showing buckets with contents:")
        for index, bucket in enumerate(self.internal_array):
            if len(bucket) == 0: continue
            for item in bucket:
                print("Bucket[", index, "] ==> key:", item[0], "value:", item[1])

        print("Showing all buckets:")
        print(self.internal_array)

obj = MyDict()

print("Setting keys...")
obj.set("GOAT", "Michael Jordan")
obj.set("humans", "The Wildlife Killers")
obj.set("py-2", "sorry, end of life")
obj.set("GOAT", "LeBron James")
obj.set(69, "an ordinary number")
obj.set("Linus", "Gates")
obj.set(12, "los meses en un ano")
obj.set("beaver", "the best animal out there!")

print("\nGetting keys...")
print("Linus is", obj.get("Linus"))
print("12 is", obj.get(12))
print("GOAT is", obj.get("GOAT")) # key with an updated value
print("69 is", obj.get(69))
print("humans is", obj.get("humans"))
print("beaver is", obj.get("beaver"))
print("py-2 is", obj.get("py-2"))
print("COVID is", obj.get("COVID")) # non-existent key

print("\nDisplaying everything...")
obj.show()

输出

Setting keys...
WARNING: Collision detected for key 12

Getting keys...
Linus is Gates
12 is los meses en un ano
GOAT is LeBron James
69 is an ordinary number
humans is The Wildlife Killers
beaver is the best animal out there!
py-2 is sorry, end of life
COVID is None

Displaying everything...
Showing buckets with contents:
Bucket[ 7 ] ==> key: beaver value: the best animal out there!
Bucket[ 12 ] ==> key: humans value: The Wildlife Killers
Bucket[ 12 ] ==> key: 12 value: los meses en un ano
Bucket[ 22 ] ==> key: py-2 value: sorry, end of life
Bucket[ 29 ] ==> key: GOAT value: LeBron James
Bucket[ 69 ] ==> key: 69 value: an ordinary number
Bucket[ 77 ] ==> key: Linus value: Gates
Showing all buckets:
[[], [], [], [], [], [], [], [['beaver', 'the best animal out there!']], [], [], [], [], [['humans', 'The Wildlife Killers'], [12, 'los meses en un ano']], [], [], [], [], [], [], [], [], [], [['py-2', 'sorry, end of life']], [], [], [], [], [], [], [['GOAT', 'LeBron James']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [[69, 'an ordinary number']], [], [], [], [], [], [], [], [['Linus', 'Gates']], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

没有碰撞？输入没有约束？最简单的解决方案是两个数组，一个用于键，一个用于键values@DarkLeader好，这并不能为您提供快速查找。可能构建类似于二叉树的东西，它将存储在一个数组中，但存储的数据将是具有两个键和值的节点，并且几乎没有添加和检索数据的函数。我想问一下“无冲突”是什么意思。大多数哈希表都有冲突以及处理冲突的方法。所以我想知道“无碰撞”是否真的意味着没有碰撞，或者说是最小的碰撞，并适当地处理它们。如果你真的不需要冲突，你要么需要无限的存储空间，要么对键有一些约束。没有冲突？输入没有约束？最简单的解决方案是两个数组，一个用于键，一个用于键values@DarkLeader好，这并不能为您提供快速查找。可能构建类似于二叉树的东西，它将存储在一个数组中，但存储的数据将是具有两个键和值的节点，并且几乎没有添加和检索数据的函数。我想问一下“无冲突”是什么意思。大多数哈希表都有冲突以及处理冲突的方法。所以我想知道“无碰撞”是否真的意味着没有碰撞，或者说是最小的碰撞，并适当地处理它们。如果你真的不需要冲突，你要么需要无限的存储空间，要么对键有一些约束。