Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用CSV文件加速SQLite3更新功能_Python_Mysql_Sql_Csv_Sqlite - Fatal编程技术网

Python 使用CSV文件加速SQLite3更新功能

Python 使用CSV文件加速SQLite3更新功能,python,mysql,sql,csv,sqlite,Python,Mysql,Sql,Csv,Sqlite,由于SQLite 3.7.13在连接上更新列的局限性,我创建了以下Python脚本来帮助解决这个问题。虽然由于我处理的数据量很大,但我遇到了系统资源问题,更新方法耗时太长 我有一个SQLite3表,其中包含具有以下模式的7159587条记录: dist_trpwrenc (id integer primary key autoincrement, IP TEXT, VNE_INTERNAL TEXT, VNE_ENTERPRISE TEXT, VNE_EXTERNAL TEXT) 我有一个CS

由于SQLite 3.7.13在连接上更新列的局限性,我创建了以下Python脚本来帮助解决这个问题。虽然由于我处理的数据量很大,但我遇到了系统资源问题,更新方法耗时太长

我有一个SQLite3表,其中包含具有以下模式的7159587条记录:

dist_trpwrenc (id integer primary key autoincrement, IP TEXT, VNE_INTERNAL TEXT, VNE_ENTERPRISE TEXT, VNE_EXTERNAL TEXT)
我有一个CSV文件,其中包含9224812条重复记录。以下是CSV文件中的数据示例:

"IP","VNE"
"192.168.1.1","internal"
"192.168.1.1","enterprise"
"192.168.1.1","external"
"192.168.2.1","internal"
"192.168.2.1","external"
Python脚本获取CSV文件并根据以下示例更新“dist_trpwrence”表:

--------------------------------------------------------------
|      IP     | VNE_INTERNAL | VNE_ENTERPRISE | VNE_EXTERNAL |
| 192.168.1.1 |      x       |       x        |      x       |
| 192.168.2.1 |      x       |                |      x       |
--------------------------------------------------------------
我正在寻找一种更快的方法来处理更新,这在SQLite3/Python中可能吗

#!/usr/bin/python

from openpyxl.reader.excel import load_workbook
import sys, csv, sqlite3, logging, time, os, errno

s = time.strftime('%Y%m%d%H%M%S')

# Create exception file from standard output
class Logger(object):
    def __init__(self):
        self.terminal = sys.stdout
        self.log = open((s)+"_log", "a")

    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)

def dist_trpwrenc_update():
    sys.stdout = Logger()

    con = sqlite3.connect(sys.argv[1]) # input database name (e.g. database.db) and creates in current working directory.
    cur = con.cursor()

    try:
        with open(sys.argv[2], "rb") as f: # input CSV file 
            reader = csv.reader(f, delimiter=',')
            for row in reader:
                try:
                    ipupdate = (row[4],)

                    if row[3] == 'internal':
                        cur.execute('UPDATE dist_trpwrenc SET VNE_INTERNAL="x" WHERE IP=?;', ipupdate)
                        con.commit()
                        print row[0], row[4], 'updated:', row[3], ' successfully!'  
                    elif row[3] == 'enterprise':
                        cur.execute('UPDATE dist_trpwrenc SET VNE_ENTERPRISE="x" WHERE IP=?;', ipupdate)
                        con.commit()
                        print row[0], row[4], 'updated:', row[3], ' successfully!'
                    elif row[3] == 'external':
                        cur.execute('UPDATE dist_trpwrenc SET VNE_EXTERNAL="x" WHERE IP=?;', ipupdate)
                        con.commit()
                        print row[0], row[4], 'updated:', row[3], ' successfully!'
                    else:
                        print row[0], row[4], 'did not update:', row[3], ' successfully.'
                except (KeyboardInterrupt, SystemExit):
                    raise
    except IOError:
        raise

    # Close SQLite database connection  
    con.close()

    # Stop logging
    sys.stdout = sys.__stdout__

def main():
    dist_trpwrenc_update()

if __name__=='__main__':
    main()
感谢所有提示,我使用了另一种方法,只使用了SQL CASE语句:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys, csv, sqlite3, logging, time, os, errno

# Functions
s = time.strftime('%Y%m%d%H%M%S')

# Create file from standard output for database import
class Logger(object):
    def __init__(self):
        self.terminal = sys.stdout
        self.log = open((s) + "_" + sys.argv[1], "a")        

    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)

# Function to create CSV from a SQL query.
def sqlExport():
    sys.stdout = Logger() # Start screen capture to log file

    con = sqlite3.connect(sys.argv[1]) # input database name (e.g. database.db) and creates in current working directory.
    cur = con.cursor()

    try:
        cur.execute('SELECT network, SUM(case when VNE = "V1" then 1 end) as VNECH1, SUM(case when VNE = "V2" then 1 end) as VNECH2, SUM(case when VNE = "V3" then 1 end) as VNECH3 from data_table GROUP by network ORDER BY network;')
        data = cur.fetchall()

        for row in data:
            print '"'+row[0]+'","'+str(row[1])+'","'+str(row[2])+'","'+str(row[3])+'"'

    except (KeyboardInterrupt, SystemExit):
        raise

    con.close()
    sys.stdout = sys.__stdout__ # stops capturing data from database export.

# Primary function to execute
def main():
    sqlExport()

if __name__=='__main__':
    main()
  • 确保IP字段上有索引
  • 首先将CSV中的所有行添加到一个集合中,以消除重复,并有望节省数百万次操作
  • 删除日志记录功能。您可以假设更新已完成,因为否则会引发异常,因此日志不会告诉您任何信息
  • 尝试降低提交发生的频率——尽管在单个事务中执行所有更新可能有其自身的问题

  • 如果所有这些还不够,那么如果它适合您的应用程序(例如,CSV包含dist_trpwrence表中的所有行),删除现有记录或表并使用插入查询从CSV中重新填充它可能比大量更新更快。

    看起来您的vne_*列只是用作标志。您是否考虑过使用将列定义为存储0或1的整数的架构?只需使用
    BEGIN
    /
    COMMIT
    即可在事务内部进行所有更新。谢谢您的建议。我想我可能需要将我的表移植到MySQL数据库,并创建JOIN/UPDATE查询。由于我的系统原因,我没有足够的资源。