Python 用SQLAlchemy比较两个表
我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表,看看添加了哪些元组,删除了哪些元组,更新了哪些元组。我使用以下代码执行此操作:Python 用SQLAlchemy比较两个表,python,mysql,sqlalchemy,Python,Mysql,Sqlalchemy,我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表,看看添加了哪些元组,删除了哪些元组,更新了哪些元组。我使用以下代码执行此操作: from sqlalchemy import * # query the databases to get all tuples from the relations # save each relation to a list in order to be able to iterate over their tuples multiple t
from sqlalchemy import *
# query the databases to get all tuples from the relations
# save each relation to a list in order to be able to iterate over their tuples multiple times
# iterate through the lists, hash each tuple with k, v being primary key, tuple
# iterate through the "after" relation. for each tuple in the new relation, hash its key in the "before" relation.
# If it's found and the tuple is different, consider that an update, else, do nothing.
# If it is not found, consider that an insert
# iterate through the "before" relation. for each tuple in the "before" relation, hash by the primary key
# if the tuple is found in the "after" relation, do nothing
# if not, consider that a delete.
dev_engine = create_engine('mysql://...')
prod_engine = create_engine('mysql://...')
def transactions(exchange):
dev_connect = dev_engine.connect()
prod_connect = prod_engine.connect()
get_dev_instrument = "select * from " + exchange + "_instrument;"
instruments = dev_engine.execute(get_dev_instrument)
instruments_list = [r for r in instruments]
print 'made instruments_list'
get_prod_instrument = "select * from " + exchange + "_instrument;"
instruments_after = prod_engine.execute(get_prod_instrument)
instruments_after_list = [r2 for r2 in instruments_after]
print 'made instruments after_list'
before_map = {}
after_map = {}
for row in instruments:
before_map[row['instrument_id']] = row
for y in instruments_after:
after_map[y['instrument_id']] = y
print 'formed maps'
update_count = insert_count = delete_count = 0
change_list = []
for prod_row in instruments_after_list:
result = list(prod_row)
try:
row = before_map[prod_row['instrument_id']]
if not row == prod_row:
update_count += 1
for i in range(len(row)):
if not row[i] == prod_row[i]:
result[i] = str(row[i]) + '--->' + str(prod_row[i])
result.append("updated")
change_list.append(result)
except KeyError:
insert_count += 1
result.append("inserted")
change_list.append(result)
for before_row in instruments_list:
result = before_row
try:
after_row = after_map[before_row['instrument_id']]
except KeyError:
delete_count += 1
result.append("deleted")
change_list.append(result)
for el in change_list:
print el
print "Insert: " + str(insert_count)
print "Update: " + str(update_count)
print "Delete: " + str(delete_count)
dev_connect.close()
prod_connect.close()
def main():
transactions("...")
main()
instruments
是“before”表,instruments\u after
是“after”表,因此我想查看将instruments
更改为instruments\u after
时发生的更改
上述代码工作正常,但当
仪器
或仪器
非常大时失败。我有一个超过400万行的表,简单地将其加载到内存中会导致Python退出。我已经尝试通过在查询中使用LIMIT,OFFSET
来克服这个问题,将instruments\u列表
s分段附加到instruments\u列表中,但是Python仍然存在,因为两个这样大小的列表占用了太多的空间。我的最后一个选择是从一个关系中选择一个批次,然后迭代第二个关系的批次并进行比较,但这非常容易出错。有没有其他方法可以绕过这个问题?我曾考虑过为我的虚拟机分配更多内存,但我觉得代码的空间复杂性是问题所在,这是应该首先解决的问题 这并不能回答你的问题,但是制作MySQL差异可能有用这并不能回答你的问题,但是制作MySQL差异可能有用