Python 结构化重复数据消除会产生一个数据库_Python_Python Dedupe

Python 结构化重复数据消除会产生一个数据库

python

Python 结构化重复数据消除会产生一个数据库,python,python-dedupe,Python,Python Dedupe,我正在使用python项目在数据中查找重复的组织名称。许多示例都关注于如何处理数据，而不是如何实现结果。对于获取结果、将其放入数据库以及查询重复的组记录，是否有任何最佳做法到目前为止，我的想法是这样构造这两个表（使用sqlalchemy），但我觉得这有点不对劲： class Organization(Base): __tablename__ = 'organization' id = Column(Integer, primary_key=True) name = C

我正在使用python项目在数据中查找重复的组织名称。许多示例都关注于如何处理数据，而不是如何实现结果。对于获取结果、将其放入数据库以及查询重复的组记录，是否有任何最佳做法

到目前为止，我的想法是这样构造这两个表（使用sqlalchemy），但我觉得这有点不对劲：

class Organization(Base):
    __tablename__ = 'organization'

    id = Column(Integer, primary_key=True)
    name = Column(String)
    cluster_id = Column(Integer, ForeignKey('duplicate_organization.cluster_id'))


class DuplicateOrganzation(Base):
    __tablename__ = 'duplicate_organization'

    id = Column(Integer, primary_key=True)
    cluster_id = Column(Integer)
    name = Column(String)
    organizations = relationship("Organization")