python脚本中的线程

python脚本中的线程,python,mysql,Python,Mysql,我必须阅读一些文件,并将这些文件的一些信息放入MySQL数据库。我有82个文件。我想同时读取多个文件。为此,我有以下函数定义: def sql_processes(db1, infile_name, cursor, z): print infile_name PrintLog("Adding " + infile_name + "to MySQL...") vcf_reader = vcf.Reader(open(infile_name, 'r')) for re

我必须阅读一些文件,并将这些文件的一些信息放入
MySQL
数据库。我有82个文件。我想同时读取多个文件。为此,我有以下函数定义:

def sql_processes(db1, infile_name, cursor, z):
    print infile_name
    PrintLog("Adding " + infile_name + "to MySQL...")
    vcf_reader = vcf.Reader(open(infile_name, 'r'))
    for record in vcf_reader:
        snp_position='_'.join([record.CHROM, str(record.POS)])
        ref_F = float(record.INFO['DP4'][0])
        ref_R = float(record.INFO['DP4'][1])
        alt_F = float(record.INFO['DP4'][2])
        alt_R = float(record.INFO['DP4'][3])
        AF = (alt_F+alt_R)/(alt_F+alt_R+ref_F+ref_R)
        sql_test_query = "SELECT * from snps where snp_pos='" + snp_position + "'"
        try:
            sql_insert_table = "INSERT INTO snps (snp_pos, " + str(z) + "g) VALUES ('" + snp_position + "', " + str(AF) + ")"
            cursor.execute(sql_insert_table)
        except db1.IntegrityError, e:
            sql_insert_table = "UPDATE snps SET " + str(z) + "g=" + str(AF) + " WHERE snp_pos='" + snp_position + "'";
            cursor.execute(sql_insert_table)
        db1.commit()
    PrintLog("Added " + infile_name + "to MySQL!")

def extractAF(files_vcf):
    z=6
    snp_dict=[]
    #First connection
    #db1 = MS.connect(host="localhost",user="root",passwd="sequentia2")
    #cursor = db1.cursor()
    #sql_create_db = "CREATE DATABASE SUPER_SNP_calling"
    #cursor.execute(sql_create_db)
    #db1.commit()
    #db1.close()
    #Second connection once we have created the db
    db1 = MS.connect(host="localhost",user="root",passwd="sequentia2",db="SUPER_SNP_calling")
    cursor = db1.cursor()
    #sql_create_table = "CREATE TABLE snps (snp_pos VARCHAR(40) PRIMARY KEY"
    #for num in range(0, len(files_vcf)):
    #    sql_create_table = sql_create_table + ", " + str(num) + "g FLOAT(4,3)"
    #sql_create_table = sql_create_table + ")"
    #cursor.execute(sql_create_table)
    #db1.commit()
    threads = []
    for infile_name in sorted(files_vcf):
        vcf_reader = vcf.Reader(open(infile_name, 'r'))
        t = Thread(target = sql_processes, args = (db1, infile_name, cursor, z)).start()
        threads.append(t)
        z+=1
    count_t = 1
    my_threads = []
    for t in threads:
        t.start()
        my_threads.append(t)
        if count_t == 8:
            for x in my_threads:
                x.join()
            my_threads = []
            count_t = 0
        count_t+=1
    db1.close()
    return snp_dict #this is empty, I should solve this.
然而,我认为问题在于:

count_t = 1
my_threads = []
for t in threads:
    t.start()
    my_threads.append(t)
    if count_t == 8:
        for x in my_threads:
            x.join()
        my_threads = []
        count_t = 0
    count_t+=1
db1.close()
我想同时读取8个文件。然后,等待8个进程完成,以便启动下一个8个进程。但这会引发以下错误:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./SUPER_mysql4.py", line 460, in sql_processes
    db1.commit()
InterfaceError: (0, '')

我如何更正此问题?

为什么要并行阅读?这些文件都来自不同的磁盘吗?除非您使用多处理,否则不会看到速度的提高。所有python线程实际上只有一个(请参见GIL)@JasonMorgan如果进程IO繁重,那么您应该看到线程的改进,因为I CPython对OS级IO的调用将释放GIL。问题是您试图在每个线程中使用相同的游标吗?我认为您应该在每个线程中使用一个单独的游标(以及相应的事务)。此外,这可能很重要,也可能不重要,但您的DB密码包含在问题文本中。