Flask Sqlalchemy-当数据具有关系时,如何正确地将数据批量插入数据库
我有一行行东西的数据文件。每种东西都有一个基因。这是一种一对多的关系,因为每个基因可以是多个事物的一部分,但每个事物只能有一个基因 想象一下大致如下的模型:Flask Sqlalchemy-当数据具有关系时,如何正确地将数据批量插入数据库,flask,sqlalchemy,flask-sqlalchemy,Flask,Sqlalchemy,Flask Sqlalchemy,我有一行行东西的数据文件。每种东西都有一个基因。这是一种一对多的关系,因为每个基因可以是多个事物的一部分,但每个事物只能有一个基因 想象一下大致如下的模型: class Gene(db.Model): __tablename__ = "gene" id = db.Column(db.Integer, primary_key=True) name1 = db.Column(db.Integer, index=True, unique=True,
class Gene(db.Model):
__tablename__ = "gene"
id = db.Column(db.Integer, primary_key=True)
name1 = db.Column(db.Integer, index=True, unique=True, nullable=False) # nullable might be not right
name2 = db.Column(db.String(120), index=True, unique=True)
things = db.relationship("Thing", back_populates='gene')
def __init__(self, name1, name2=None):
self.name1 = name1
self.name2 = name2
@classmethod
def find_or_create(cls, name1, name2=None):
record = cls.query.filter_by(name1=name1).first()
if record != None:
if record.name2 == None and name2 != None:
record.name2 = name2
else:
record = cls(name1, name2)
db.session.add(record)
return record
class Thing(db.Model):
__tablename__ = "thing"
id = db.Column(db.Integer, primary_key=True)
gene_id = db.Column(db.Integer, db.ForeignKey("gene.id"), nullable=False, index=True)
gene = db.relationship("Gene", back_populates='thing')
data = db.Column(db.Integer)
我想批量插入很多东西,但我担心使用
db.engine.execute(Thing.__table__.insert(), things)
我在数据库中没有关系。是否有某种方法可以通过批量添加来保持关系,或者以某种方式顺序添加这些关系,然后在以后建立关系?所有这些似乎都假设您想要插入非常简单的模型,而当您的模型更复杂时,我有点不知道如何做(上面的示例是一个简化的版本)
--更新1--
似乎表明没有真正的解决办法
似乎证实了这一点 实际上,我对代码做了不少修改,我认为代码有所改进,我也在修改我的答案 我定义了以下两个表。集合和数据,对于集合中的每个集合,数据中有许多数据
class Sets(sa_dec_base):
__tablename__ = 'Sets'
id = sa.Column(sa.Integer, primary_key=True)
FileName = sa.Column(sa.String(250), nullable=False)
Channel = sa.Column(sa.Integer, nullable=False)
Loop = sa.Column(sa.Integer, nullable=False)
Frequencies = sa.Column(sa.Integer, nullable=False)
Date = sa.Column(sa.String(250), nullable=False)
Time = sa.Column(sa.String(250), nullable=False)
Instrument = sa.Column(sa.String(250), nullable=False)
Set_Data = sa_orm.relationship('Data')
Set_RTD_spectra = sa_orm.relationship('RTD_spectra')
Set_RTD_info = sa.orm.relationship('RTD_info')
__table_args__ = (sa.UniqueConstraint('FileName', 'Channel', 'Loop'),)
class Data(sa_dec_base):
__tablename__ = 'Data'
id = sa.Column(sa.Integer, primary_key = True)
Frequency = sa.Column(sa.Float, nullable=False)
Magnitude = sa.Column(sa.Float, nullable=False)
Phase = sa.Column(sa.Float, nullable=False)
Set_ID = sa.Column(sa.Integer, sa.ForeignKey('Sets.id'))
Data_Set = sa_orm.relationship('Sets', foreign_keys = [Set_ID])
然后,我编写了这个函数来批量插入具有关系的数据
def insert_set_data(session, set2insert, data2insert, Data):
"""
Insert set and related data; with uniqueconstraint check on the set
set2insert is the prepared set object without id. A correct and unique id will given by the db itself
data2insert is a big pandas df, so that bulk_insert is used
"""
session.add(set2insert)
try:
session.flush()
except sa.exc.IntegrityError as err: # here catch uniqueconstraint error if set already in db
session.rollback()
print('already inserted ', set2insert.FileName, 'loop ', set2insert.Loop, 'channel ', set2insert.Channel)
pass
else: # if not error, flush will give the id to the set ("Set.id")
data2insert['Set_ID'] = set2insert.id # pass Set.id to data2insert as foreign_keys to keep relationship
data2insert = data2insert.to_dict(orient = 'records') # convert df to record for bulk_insert
session.bulk_insert_mappings(Data, data2insert) # bulk_insert
session.commit() # commit only once, so that it is done only if set and data were correctly inserted
print('inserting ', set2insert.FileName, 'loop ', set2insert.Loop, 'channel ', set2insert.Channel)
不管怎样,可能还有其他更好的解决方案 谢谢你的回答。因为这是一个老问题,如果你解释它的作用以及为什么这样写,你的答案对其他人会更有用。