Python 用SQLAlchemy比较两个表_Python_Mysql_Sqlalchemy

Python 用SQLAlchemy比较两个表

python mysql sqlalchemy

Python 用SQLAlchemy比较两个表,python,mysql,sqlalchemy,Python,Mysql,Sqlalchemy,我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表，看看添加了哪些元组，删除了哪些元组，更新了哪些元组。我使用以下代码执行此操作： from sqlalchemy import * # query the databases to get all tuples from the relations # save each relation to a list in order to be able to iterate over their tuples multiple t

我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表，看看添加了哪些元组，删除了哪些元组，更新了哪些元组。我使用以下代码执行此操作：

from sqlalchemy import *

# query the databases to get all tuples from the relations
# save each relation to a list in order to be able to iterate over their tuples multiple times
# iterate through the lists, hash each tuple with k, v being primary key, tuple
# iterate through the "after" relation. for each tuple in the new relation, hash its key in the "before" relation. 
# If it's found and the tuple is different, consider that an update, else, do nothing.
# If it is not found, consider that an insert
# iterate through the "before" relation. for each tuple in the "before" relation, hash by the primary key
# if the tuple is found in the "after" relation, do nothing
# if not, consider that a delete.

 dev_engine = create_engine('mysql://...')
 prod_engine  = create_engine('mysql://...')

def transactions(exchange):
    dev_connect = dev_engine.connect()
    prod_connect = prod_engine.connect()

    get_dev_instrument = "select * from " + exchange + "_instrument;"
    instruments = dev_engine.execute(get_dev_instrument)
    instruments_list = [r for r in instruments]
    print 'made instruments_list'

    get_prod_instrument = "select * from " + exchange + "_instrument;"
    instruments_after = prod_engine.execute(get_prod_instrument)
    instruments_after_list = [r2 for r2 in instruments_after]
    print 'made instruments after_list'


    before_map = {}
    after_map = {}

    for row in instruments:
        before_map[row['instrument_id']] = row
    for y in instruments_after:
        after_map[y['instrument_id']] = y
    print 'formed maps'
    update_count = insert_count = delete_count = 0

    change_list = []
    for prod_row in instruments_after_list:
        result = list(prod_row)
        try:
            row = before_map[prod_row['instrument_id']]
            if not row == prod_row:
                update_count += 1
                for i in range(len(row)):
                    if not row[i] == prod_row[i]:
                        result[i] = str(row[i]) + '--->' + str(prod_row[i])
                result.append("updated")
                change_list.append(result)
        except KeyError:
            insert_count += 1
            result.append("inserted")
            change_list.append(result)

    for before_row in instruments_list:

        result = before_row
        try:
            after_row = after_map[before_row['instrument_id']]
        except KeyError:
            delete_count += 1
            result.append("deleted")
            change_list.append(result)

    for el in change_list:
        print el

    print "Insert: " + str(insert_count)
    print "Update: " + str(update_count)
    print "Delete: " + str(delete_count)

    dev_connect.close()
    prod_connect.close()

def main():

    transactions("...")

main()

instruments

是“before”表，

instruments\u after

是“after”表，因此我想查看将

instruments

更改为

instruments\u after

时发生的更改

上述代码工作正常，但当

仪器

或

仪器

非常大时失败。我有一个超过400万行的表，简单地将其加载到内存中会导致Python退出。我已经尝试通过在查询中使用

LIMIT，OFFSET

来克服这个问题，将

instruments\u列表

s分段附加到

instruments\u列表中，但是Python仍然存在，因为两个这样大小的列表占用了太多的空间。我的最后一个选择是从一个关系中选择一个批次，然后迭代第二个关系的批次并进行比较，但这非常容易出错。有没有其他方法可以绕过这个问题？我曾考虑过为我的虚拟机分配更多内存，但我觉得代码的空间复杂性是问题所在，这是应该首先解决的问题
 这并不能回答你的问题，但是制作MySQL差异可能有用这并不能回答你的问题，但是制作MySQL差异可能有用