Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/sharepoint/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用SQLAlchemy比较两个表_Python_Mysql_Sqlalchemy - Fatal编程技术网

Python 用SQLAlchemy比较两个表

Python 用SQLAlchemy比较两个表,python,mysql,sqlalchemy,Python,Mysql,Sqlalchemy,我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表,看看添加了哪些元组,删除了哪些元组,更新了哪些元组。我使用以下代码执行此操作: from sqlalchemy import * # query the databases to get all tuples from the relations # save each relation to a list in order to be able to iterate over their tuples multiple t

我在这个问题上遇到了很多麻烦。我试图比较两个不同数据库中的两个不同表,看看添加了哪些元组,删除了哪些元组,更新了哪些元组。我使用以下代码执行此操作:

from sqlalchemy import *

# query the databases to get all tuples from the relations
# save each relation to a list in order to be able to iterate over their tuples multiple times
# iterate through the lists, hash each tuple with k, v being primary key, tuple
# iterate through the "after" relation. for each tuple in the new relation, hash its key in the "before" relation. 
# If it's found and the tuple is different, consider that an update, else, do nothing.
# If it is not found, consider that an insert
# iterate through the "before" relation. for each tuple in the "before" relation, hash by the primary key
# if the tuple is found in the "after" relation, do nothing
# if not, consider that a delete.

 dev_engine = create_engine('mysql://...')
 prod_engine  = create_engine('mysql://...')

def transactions(exchange):
    dev_connect = dev_engine.connect()
    prod_connect = prod_engine.connect()

    get_dev_instrument = "select * from " + exchange + "_instrument;"
    instruments = dev_engine.execute(get_dev_instrument)
    instruments_list = [r for r in instruments]
    print 'made instruments_list'

    get_prod_instrument = "select * from " + exchange + "_instrument;"
    instruments_after = prod_engine.execute(get_prod_instrument)
    instruments_after_list = [r2 for r2 in instruments_after]
    print 'made instruments after_list'


    before_map = {}
    after_map = {}

    for row in instruments:
        before_map[row['instrument_id']] = row
    for y in instruments_after:
        after_map[y['instrument_id']] = y
    print 'formed maps'
    update_count = insert_count = delete_count = 0

    change_list = []
    for prod_row in instruments_after_list:
        result = list(prod_row)
        try:
            row = before_map[prod_row['instrument_id']]
            if not row == prod_row:
                update_count += 1
                for i in range(len(row)):
                    if not row[i] == prod_row[i]:
                        result[i] = str(row[i]) + '--->' + str(prod_row[i])
                result.append("updated")
                change_list.append(result)
        except KeyError:
            insert_count += 1
            result.append("inserted")
            change_list.append(result)

    for before_row in instruments_list:

        result = before_row
        try:
            after_row = after_map[before_row['instrument_id']]
        except KeyError:
            delete_count += 1
            result.append("deleted")
            change_list.append(result)

    for el in change_list:
        print el

    print "Insert: " + str(insert_count)
    print "Update: " + str(update_count)
    print "Delete: " + str(delete_count)

    dev_connect.close()
    prod_connect.close()

def main():

    transactions("...")

main()
instruments
是“before”表,
instruments\u after
是“after”表,因此我想查看将
instruments
更改为
instruments\u after
时发生的更改


上述代码工作正常,但当
仪器
仪器
非常大时失败。我有一个超过400万行的表,简单地将其加载到内存中会导致Python退出。我已经尝试通过在查询中使用
LIMIT,OFFSET
来克服这个问题,将
instruments\u列表
s分段附加到
instruments\u列表中,但是Python仍然存在,因为两个这样大小的列表占用了太多的空间。我的最后一个选择是从一个关系中选择一个批次,然后迭代第二个关系的批次并进行比较,但这非常容易出错。有没有其他方法可以绕过这个问题?我曾考虑过为我的虚拟机分配更多内存,但我觉得代码的空间复杂性是问题所在,这是应该首先解决的问题

这并不能回答你的问题,但是制作MySQL差异可能有用这并不能回答你的问题,但是制作MySQL差异可能有用