试图复制Python'；s针对非字符串的字符串实习功能_Python_Python 2.7

试图复制Python'；s针对非字符串的字符串实习功能

python python-2.7

试图复制Python'；s针对非字符串的字符串实习功能,python,python-2.7,Python,Python 2.7,对于一个自我项目，我想做如下事情： class Species(object): # immutable. def __init__(self, id): # ... (using id to obtain height and other data from file) def height(self): # ... class Animal(object): # mutable. def __init__(self, nicknam

对于一个自我项目，我想做如下事情：

class Species(object): # immutable.
    def __init__(self, id):
        # ... (using id to obtain height and other data from file)
    def height(self):
        # ...

class Animal(object): # mutable.

    def __init__(self, nickname, species_id):
        self.nickname = nickname
        self.species = Species(id)
    def height(self):
        return self.species.height()

正如您所看到的，我并不需要每个id都有一个以上的物种（id）实例，但每次使用该id创建动物对象时，我都会创建一个实例，并且我可能需要多次调用，例如，

Animal（somename，3）

为了解决这个问题，我要做的是创建一个类，这样对于它的两个实例，比如a和b，下面的结果总是正确的：

(a == b) == (a is b)

这是Python对字符串文本所做的事情，称为实习。例如：

a = "hello"
b = "hello"
print(a is b)

该打印将产生true（如果直接使用pythonshell，则只要字符串足够短）

我只能猜测CPython是如何做到这一点的（它可能涉及一些C魔法），所以我正在做我自己的版本。到目前为止，我已经：

class MyClass(object):

    myHash = {} # This replicates the intern pool.

    def __new__(cls, n): # The default new method returns a new instance
        if n in MyClass.myHash:
            return MyClass.myHash[n]

        self = super(MyClass, cls).__new__(cls)
        self.__init(n)
        MyClass.myHash[n] = self

        return self

    # as pointed out on an answer, it's better to avoid initializating the instance 
    # with __init__, as that one's called even when returning an old instance.
    def __init(self, n): 
        self.n = n

a = MyClass(2)
b = MyClass(2)

print a is b # <<< True

类MyClass（对象）：
myHash={}#这将复制实习生池。
def uu new（cls，n）：#默认的new方法返回一个新实例
如果MyClass.myHash中有n：
返回MyClass.myHash[n]
self=super（MyClass，cls）。\uuuuuu新的\uuuuuuu（cls）
自初始化（n）
MyClass.myHash[n]=self
回归自我
#正如答案中指出的，最好避免初始化实例
#使用uu init uuu，即使在返回旧实例时也会调用该函数。
定义初始化（自，n）：
self.n=n
a=MyClass（2）
b=MyClass（2）
print a是b#是，实现返回缓存对象的\uuuuu new\uuuu
方法是创建有限数量实例的适当方法。如果您不希望创建大量实例，您可以实现\uuuuueq\uuuuu
，并通过值而不是身份进行比较，但这样做并没有什么坏处
请注意，不可变对象通常应在\uuuu new\uuuuu
中完成其所有初始化，而不是\uuuu init\uuuuuu
，因为后者是在创建对象后调用的。此外，\uuuuuu init\uuuuu
将对从\uuuu new\uuuuu
返回的类的任何实例调用\uuuuuuu init>，因此在缓存时，每次返回缓存对象时都将再次调用该类
另外，\uuuu new\uuuu
的第一个参数是类对象而不是实例，因此您可能应该将其命名为cls
，而不是self
（如果需要，您可以在方法的后面使用self
而不是instance
。
<，我要推荐一些东西。第一，如果你想要“真正”的不变性，从名为tuple的继承（通常人们对此不太在意，但当你在实习时，打破不变性会导致更大的问题）。其次，使用锁允许线程安全行为
因为这相当复杂，我将提供一个物种
代码的修改副本，并附上注释解释：
import collections
import operator
import threading

# Inheriting from a namedtuple is a convenient way to get immutability
class Species(collections.namedtuple('SpeciesBase', 'species_id height ...')):
    __slots__ = ()  # Prevent creation of arbitrary values on instances; true immutability of declared values from namedtuple makes true immutable instances

    # Lock and cache, with underscore prefixes to indicate they're internal details
    _cache_lock = threading.Lock()
    _cache = {} 

    def __new__(cls, species_id):  # Switching to canonical name cls for class type
        # Do quick fail fast check that ID is in fact an int/long
        # If it's int-like, this will force conversion to true int/long
        # and minimize risk of incompatible hash/equality checks in dict
        # lookup
        # I suspect that in CPython, this would actually remove the need
        # for the _cache_lock due to the GIL protecting you at the
        # critical stages (because no byte code is executing comparing
        # or hashing built-in int/long types), but the lock is a good idea
        # for correctness (avoiding reliance on implementation details)
        # and should cost little
        species_id = operator.index(species_id)

        # Lock when checking/mutating cache to make it thread safe
        try:
            with cls._cache_lock:
                return cls._cache[species_id]
        except KeyError:
            pass

        # Read in data here; not done under lock on assumption this might
        # be expensive and other Species (that already exist) might be
        # created/retrieved from cache during this time
        species_id = ...
        height = ...
        # Pass all the values read to the superclass (the namedtuple base)
        # constructor (which will set them and leave them immutable thereafter)
        self = super(Species, cls).__new__(cls, species_id, height, ...)

        with cls._cache_lock:
            # If someone tried to create the same species and raced
            # ahead of us, use their version, not ours to ensure uniqueness
            # If no one raced us, this will put our new object in the cache
            self = cls._cache.setdefault(species_id, self)
        return self

如果您想为通用库进行实习（在这里，用户可能是线程化的，您不能相信他们不会打破不变性不变量），类似于上面的内容是一个基本的结构。它速度快，即使构造很重，也能最大限度地减少暂停的机会（交换条件是可能多次重构对象，如果多个线程试图一次性构造对象，则丢弃除一个副本以外的所有副本），等等
当然，如果构造成本低且实例很小，那么只需编写一个\uuuuueq\uuuu
（如果它在逻辑上是不可变的，则可能编写一个\uuuuuhash\uuuuuu
）并使用它即可。
您可以在Python的各种类型的实现中看到类似的代码。事实上，当完成的工作是原子的时，它们不会锁定“原子”操作，这支持我上面的评论，但它们使用相同的“尝试使用锁从缓存中获取，如果失败，释放锁，执行昂贵的工作，重新获取锁并更新缓存（如果没有竞争）”模式，我在上面使用的缓存大小是有限的（操作组必须以原子方式执行）。