试图复制Python';s针对非字符串的字符串实习功能

试图复制Python';s针对非字符串的字符串实习功能,python,python-2.7,Python,Python 2.7,对于一个自我项目,我想做如下事情: class Species(object): # immutable. def __init__(self, id): # ... (using id to obtain height and other data from file) def height(self): # ... class Animal(object): # mutable. def __init__(self, nicknam

对于一个自我项目,我想做如下事情:

class Species(object): # immutable.
    def __init__(self, id):
        # ... (using id to obtain height and other data from file)
    def height(self):
        # ...

class Animal(object): # mutable.

    def __init__(self, nickname, species_id):
        self.nickname = nickname
        self.species = Species(id)
    def height(self):
        return self.species.height()
正如您所看到的,我并不需要每个id都有一个以上的物种(id)实例,但每次使用该id创建动物对象时,我都会创建一个实例,并且我可能需要多次调用,例如,
Animal(somename,3)

为了解决这个问题,我要做的是创建一个类,这样对于它的两个实例,比如a和b,下面的结果总是正确的:

(a == b) == (a is b)
这是Python对字符串文本所做的事情,称为实习。例如:

a = "hello"
b = "hello"
print(a is b)
该打印将产生true(如果直接使用pythonshell,则只要字符串足够短)

我只能猜测CPython是如何做到这一点的(它可能涉及一些C魔法),所以我正在做我自己的版本。到目前为止,我已经:

class MyClass(object):

    myHash = {} # This replicates the intern pool.

    def __new__(cls, n): # The default new method returns a new instance
        if n in MyClass.myHash:
            return MyClass.myHash[n]

        self = super(MyClass, cls).__new__(cls)
        self.__init(n)
        MyClass.myHash[n] = self

        return self

    # as pointed out on an answer, it's better to avoid initializating the instance 
    # with __init__, as that one's called even when returning an old instance.
    def __init(self, n): 
        self.n = n

a = MyClass(2)
b = MyClass(2)

print a is b # <<< True
类MyClass(对象):
myHash={}#这将复制实习生池。
def uu new(cls,n):#默认的new方法返回一个新实例
如果MyClass.myHash中有n:
返回MyClass.myHash[n]
self=super(MyClass,cls)。\uuuuuu新的\uuuuuuu(cls)
自初始化(n)
MyClass.myHash[n]=self
回归自我
#正如答案中指出的,最好避免初始化实例
#使用uu init uuu,即使在返回旧实例时也会调用该函数。
定义初始化(自,n):
self.n=n
a=MyClass(2)
b=MyClass(2)

print a是b#是,实现返回缓存对象的
\uuuuu new\uuuu
方法是创建有限数量实例的适当方法。如果您不希望创建大量实例,您可以实现
\uuuuueq\uuuuu
,并通过值而不是身份进行比较,但这样做并没有什么坏处

请注意,不可变对象通常应在
\uuuu new\uuuuu
中完成其所有初始化,而不是
\uuuu init\uuuuuu
,因为后者是在创建对象后调用的。此外,
\uuuuuu init\uuuuu
将对从
\uuuu new\uuuuu
返回的类的任何实例调用
\uuuuuuu init>,因此在缓存时,每次返回缓存对象时都将再次调用该类


另外,
\uuuu new\uuuu
的第一个参数是类对象而不是实例,因此您可能应该将其命名为
cls
,而不是
self
(如果需要,您可以在方法的后面使用
self
而不是
instance

<,我要推荐一些东西。第一,如果你想要“真正”的不变性,从名为tuple的
继承(通常人们对此不太在意,但当你在实习时,打破不变性会导致更大的问题)。其次,使用锁允许线程安全行为

因为这相当复杂,我将提供一个
物种
代码的修改副本,并附上注释解释:

import collections
import operator
import threading

# Inheriting from a namedtuple is a convenient way to get immutability
class Species(collections.namedtuple('SpeciesBase', 'species_id height ...')):
    __slots__ = ()  # Prevent creation of arbitrary values on instances; true immutability of declared values from namedtuple makes true immutable instances

    # Lock and cache, with underscore prefixes to indicate they're internal details
    _cache_lock = threading.Lock()
    _cache = {} 

    def __new__(cls, species_id):  # Switching to canonical name cls for class type
        # Do quick fail fast check that ID is in fact an int/long
        # If it's int-like, this will force conversion to true int/long
        # and minimize risk of incompatible hash/equality checks in dict
        # lookup
        # I suspect that in CPython, this would actually remove the need
        # for the _cache_lock due to the GIL protecting you at the
        # critical stages (because no byte code is executing comparing
        # or hashing built-in int/long types), but the lock is a good idea
        # for correctness (avoiding reliance on implementation details)
        # and should cost little
        species_id = operator.index(species_id)

        # Lock when checking/mutating cache to make it thread safe
        try:
            with cls._cache_lock:
                return cls._cache[species_id]
        except KeyError:
            pass

        # Read in data here; not done under lock on assumption this might
        # be expensive and other Species (that already exist) might be
        # created/retrieved from cache during this time
        species_id = ...
        height = ...
        # Pass all the values read to the superclass (the namedtuple base)
        # constructor (which will set them and leave them immutable thereafter)
        self = super(Species, cls).__new__(cls, species_id, height, ...)

        with cls._cache_lock:
            # If someone tried to create the same species and raced
            # ahead of us, use their version, not ours to ensure uniqueness
            # If no one raced us, this will put our new object in the cache
            self = cls._cache.setdefault(species_id, self)
        return self
如果您想为通用库进行实习(在这里,用户可能是线程化的,您不能相信他们不会打破不变性不变量),类似于上面的内容是一个基本的结构。它速度快,即使构造很重,也能最大限度地减少暂停的机会(交换条件是可能多次重构对象,如果多个线程试图一次性构造对象,则丢弃除一个副本以外的所有副本),等等


当然,如果构造成本低且实例很小,那么只需编写一个
\uuuuueq\uuuu
(如果它在逻辑上是不可变的,则可能编写一个
\uuuuuhash\uuuuuu
)并使用它即可。

您可以在Python的各种类型的实现中看到类似的代码。事实上,当完成的工作是原子的时,它们不会锁定“原子”操作,这支持我上面的评论,但它们使用相同的“尝试使用锁从缓存中获取,如果失败,释放锁,执行昂贵的工作,重新获取锁并更新缓存(如果没有竞争)”模式,我在上面使用的缓存大小是有限的(操作组必须以原子方式执行)。