Python 读取作为块插入的实时数据

Python 读取作为块插入的实时数据,python,postgresql,Python,Postgresql,我有一个python脚本,每天一开始就使用任务调度器运行,读取不断增长的日志文件和文本文件,并将数据插入Postgresql数据库。每天生成新的日志文件。每个日志的大致大小为1GB 平台:Windows7,我通常不喜欢它,但这次我不得不这么做 内存:32GB 我在tunning PostgreSQL上搜索以处理繁重的I/O,下面是我修改的内容: shared_buffers: 8GB work_mem: 100 MB maintenance_work_mem: 512 MB checkpo

我有一个python脚本,每天一开始就使用任务调度器运行,读取不断增长的日志文件和文本文件,并将数据插入Postgresql数据库。每天生成新的日志文件。每个日志的大致大小为1GB

平台:Windows7,我通常不喜欢它,但这次我不得不这么做 内存:32GB 我在tunning PostgreSQL上搜索以处理繁重的I/O,下面是我修改的内容:

shared_buffers: 8GB 
work_mem: 100 MB 
maintenance_work_mem: 512 MB 
checkpoint_segments: 100 
checkpoint_timepot: 1hr 
synchronous_commit = off 
full_page_writes = off 
fsync = off
为逐行读取日志文件并插入数据库而编写的脚本:

 import psycopg2 as psycopg
    try:
      connectStr = "dbname='postgis20' user='postgres' password='' host='localhost'"
      cx = psycopg.connect(connectStr)
      cu = cx.cursor()
      logging.info("connected to DB")
    except:
      logging.error("could not connect to the database")


import time
file = open('textfile.log', 'r')
while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline
        dodecode(line)

问题是,在运行脚本6小时后,它插入了5分钟的文件数据!我怀疑数据是以块而不是行的形式流到日志文件中的,但我真的不知道如何解决这个问题,使其更像数据库中的实时数据。

您考虑过使用psycopg2吗?答案中的简单示例:

namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)

你明白吗,fsync=off基本上是说,如果你吃了我的数据就没问题了,不用担心崩溃安全?如果fsync=off,那么将synchronous_commit=off设置为off也完全没有意义。无论如何,请使用psycopg2并从中复制_。请参见@Shad;通过谷歌搜索psycopg2副本。换句话说,使用PostgreSQL COPY API大容量加载输入。@Shad还显示表/索引定义。如果你有很多或很大的索引,这可能是速度问题的一大部分。你正在做的这些更新也将是一个杀手。如果您单独提交每一行,则情况会更糟。顺便说一句,不要使用睡眠循环,而是使用适当的文件更改等待选项,如select或poll。是否提交?cx.commit
def Insert(msgnum,time,msg):
 global cx

 try:    
         if msgnum in [1,2,3]:   
          if msg['type']==0:
            cu.execute("INSERT INTO table1 ( messageid, timestamp, userid, position, text ) SELECT "+str(msgnum)+", '"+time+"', "+str(msg['UserID'])+", ST_GeomFromText('POINT("+str(float(msg['longitude']), '"+text+"')+" "+str(float(msg['latitude']))+")']))+"  WHERE NOT EXISTS (SELECT * FROM table1 WHERE timestamp='"+time+"' AND text='"+text+"';")      
            cu.execute("INSERT INTO table2 ( field1,field2,field3, time_stamp, pos,) SELECT "+str(msg['UserID'])+","+str(int(msg['UserName']))+","+str(int(msg['UserIO']))+", '"+time+"', ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")')," WHERE NOT EXISTS (SELECT * FROM table2 WHERE field1="+str(msg['UserID'])+");")
            cu.execute("Update table2 SET field3='"+str(int(msg['UserIO']))+"',time_stamp='"+str(time)+"',pos=ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")'),"' WHERE field1='"+str(msg['UserID'])+"' AND time_stamp < '"+str(time)+"';")
          elif msg['type']==1:
            cu.execute("INSERT INTO table1 ( messageid, timestamp, userid, position, text ) SELECT "+str(msgnum)+", '"+time+"', "+str(msg['UserID'])+", ST_GeomFromText('POINT("+str(float(msg['longitude']), '"+text+"')+" "+str(float(msg['latitude']))+")']))+"  WHERE NOT EXISTS (SELECT * FROM table1 WHERE timestamp='"+time+"' AND text='"+text+"';")    
            cu.execute("INSERT INTO table2 ( field1,field2,field3, time_stamp, pos,) SELECT "+str(msg['UserID'])+","+str(int(msg['UserName']))+","+str(int(msg['UserIO']))+", '"+time+"', ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")')," WHERE NOT EXISTS (SELECT * FROM table2 WHERE field1="+str(msg['UserID'])+");")
            cu.execute("Update table2 SET field3='"+str(int(msg['UserIO']))+"',time_stamp='"+str(time)+"',pos=ST_GeomFromText('POINT("+str(float(msg['longitude']))+" "+str(float(msg['latitude']))+")'),"' WHERE field1='"+str(msg['UserID'])+"' AND time_stamp < '"+str(time)+"';")
          elif msg['type']==2:
        ....
        ....
        ....
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)