Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 检查文件是否已处理的最佳方法_Python_Sql_Python 3.x - Fatal编程技术网

Python 检查文件是否已处理的最佳方法

Python 检查文件是否已处理的最佳方法,python,sql,python-3.x,Python,Sql,Python 3.x,E:我最初的标题很容易让人误解 我有一个带数据库的SQL server,在一个目录中有大约10000个excel文件。这些文件包含我需要复制到数据库中的值,并每天添加新的excel文件。此外,每个文件都包含一个带有布尔值的字段“finished”,表示文件是否准备好复制到数据库。但是,文件名未连接到它的文件名。只有文件内容包含与DB的键和字段名相对应的主键和字段名 通过反复比较主键来检查文件内容是否已经在数据库中是不可行的,因为打开文件的速度太慢了。但是,我可以首先检查数据库中是否已经存在文件,

E:我最初的标题很容易让人误解

我有一个带数据库的SQL server,在一个目录中有大约10000个excel文件。这些文件包含我需要复制到数据库中的值,并每天添加新的excel文件。此外,每个文件都包含一个带有布尔值的字段“finished”,表示文件是否准备好复制到数据库。但是,文件名未连接到它的文件名。只有文件内容包含与DB的键和字段名相对应的主键和字段名

通过反复比较主键来检查文件内容是否已经在数据库中是不可行的,因为打开文件的速度太慢了。但是,我可以首先检查数据库中是否已经存在文件,然后将结果写入一个文件(比如copied.txt),因此它只保存所有已复制文件的文件名。真正的服务可以将这个文件的内容加载到一个字典(dict1)中,以文件名为键,没有值(我认为哈希表是比较操作最快的),然后将所有现有excel文件的文件名存储在第二个字典(dict2)的目录中并比较这两个字典,创建一个清单,列出所有在dict2中但不在dict1中的文件。然后,我将遍历该列表(通常只包含大约10-20个文件),检查文件是否标记为“准备复制”,并将值复制到数据库中。最后,我将把这个文件的名称添加到dict1中,并将其存储回复制的.txt文件中

我的想法是将这个python脚本作为一个服务运行,只要有文件可以使用,它就会循环运行。当它找不到要从中复制的文件时,它应该等待x秒(可能是45秒),然后再从头开始


这是我迄今为止最好的概念。有没有更快/更有效的方法呢?

我突然想到,集合只包含唯一的元素,因此是进行此类比较的最佳数据类型。这是一种我几乎不知道的数据类型,但现在我可以看到它是多么有用

代码中与我的原始问题相关的部分在第1-3部分: 该方案: 1.将文件名从文件加载到集合 2.将文件名从文件系统/特定目录+子目录加载到一个集合 3.创建两组差异的列表 4.遍历所有剩余文件 外观,如果它们已标记为“已完成”, 而不是每行: 在数据库中创建新记录 并向给定记录中添加值(逐个) 5.将已处理文件的名称添加到文件名文件中

每5分钟一次。这对我来说完全没问题

我对编码很陌生,所以很抱歉我的业余爱好。至少到目前为止它是有效的

#modules
import pandas as pd
import pyodbc as db
import xlwings as xw
import glob
import os
from datetime import datetime, date
from pathlib import Path
import time
import sys

#constants
tick_time_seconds = 300

line = ("################################################################################### \n")
pathTodo = "c:\\myXlFiles\\**\\*"
pathDone = ("c:\\Done\\")
pathError = ("c:\\Error\\")

sqlServer = "MyMachine\\MySQLServer"
sqlDriver = "{SQL Server}"
sqlDatabase="master"
sqlUID="SA"
sqlPWD="PWD" 

#functions
def get_list_of_files_by_extension(path:str, extension:str) -> list:
    """Recieves string patch and extension;
    gets list of files with corresponding extension in path;
    return list of file with full path."""
    fileList = glob.glob(path+extension, recursive=True)
    if not fileList:
        print("no found files")
    else:
        print("found files")
    return fileList

def write_error_to_log(description:str, errorString:str, optDetails=""):
    """Recieves strings description errorstring and opt(ional)Details;
    writes the error with date and time in logfile with the name of current date;
    return nothing."""
    logFileName = str(date.today())+".txt"
    optDetails = optDetails+"\n"
    dateTimeNow = datetime.now()
    newError = "{0}\n{1}\n{2}{3}\n".format(line, str(dateTimeNow), optDetails, errorString)
    print(newError)
    with open(Path(pathError, logFileName), "a") as logFile:
        logFile.write(newError)

def sql_connector():
    """sql_connector: Recieves nothing;
    creates a connection to the sql server (conncetion details sould be constants);
    returns a connection."""
    return db.connect("DRIVER="+sqlDriver+"; \
                        SERVER="+sqlServer+"; \
                        DATABASE="+sqlDatabase+"; \
                        UID="+sqlUID+"; \
                        PWD="+sqlPWD+";")

def sql_update_builder(dbField:str, dbValue:str, dbKey:str) -> str:
    """ sql_update_builder: takes strings dbField, dbValue and dbKey;
    creates a sql syntax command with the purpose to update the value of the
    corresponding field with the corresponding key;
    returns a string with a sql command."""
    return "\
            UPDATE [tbl_Main] \
            SET ["+dbField+"]='"+dbValue+"' \
            WHERE ((([tbl_Main].MyKey)="+dbKey+"));"

def sql_insert_builder(dbKey: str) -> str:
    """ sql_insert_builder: takes strings  dbKey;
    creates a sql syntax command with the purpose to create a new record;
    returns a string with a sql command."""
    return "\
            INSERT INTO [tbl_Main] ([MyKey])\
            VALUES ("+dbKey+")"

def append_filename_to_fileNameFile(xlFilename):
    """recieves anywthing xlFilename;
    converts it to string  and writes the filename (full path) to a file;
    returns nothing."""
    with open(Path(pathDone, "filesDone.txt"), "a") as logFile:
        logFile.write(str(xlFilename)+"\n")
###################################################################################
###################################################################################
# main loop
while __name__ == "__main__":

    ###################################################################################
    """ 1. load filesDone.txt into set"""
    listDone = []
    print(line+"reading filesDone.txt in "+pathDone)
    try:
        with open(Path(pathDone, "filesDone.txt"), "r") as filesDoneFile:
            if filesDoneFile:
                print("file contains entries")
                for filePath in filesDoneFile:
                    filePath = filePath.replace("\n","")
                    listDone.append(Path(filePath))
    except Exception as err:
        errorDescription = "failed to read filesDone.txt from {0}".format(pathDone)
        write_error_to_log(description=errorDescription, errorString=str(err))
        continue
    else: setDone = set(listDone)
    ###################################################################################
    """ 2. load filenames of all .xlsm files into set"""
    print(line+"trying to get list of files in filesystem...")
    try: 
        listFileSystem = get_list_of_files_by_extension(path=pathTodo, extension=".xlsm")
    except Exception as err:
        errorDescription = "failed to read file system "
        write_error_to_log(description=errorDescription, errorString=str(err))
        continue
    else:
        listFiles = []
        for filename in listFileSystem:
            listFiles.append(Path(filename))
        setFiles = set(listFiles)
    ###################################################################################
    """ 3. create list of difference of setMatchingFiles and setDone"""
    print(line+"trying to compare done files and files in filesystem...")
    setDifference = setFiles.difference(setDone)
    ###################################################################################
    """ 4. iterate thru list of files """

    for filename in setDifference:
        """ 4.1 try: look if file is marked as "finalized=True";
        if the xlfile does not have sheet 7 (old ones)
        just add the xlfilename to the xlfilenameFile"""
        try:
            print("{0}trying to read finalized state ... of {1}".format(line, filename))
            filenameClean = str(filename).replace("\n","")
            xlFile = pd.ExcelFile(filenameClean)
        except Exception as err:
                errorDescription = "failed to read finalized-state from {0} to dataframe".format(filename)
                write_error_to_log(description=errorDescription, errorString=str(err))
                continue
        else:
            if "finalized" in xlFile.sheet_names:
                dataframe = xlFile.parse("finalized")
                print("finalized state ="+str(dataframe.iloc[0]["finalized"]))
                if dataframe.iloc[0]["finalized"] == False:
                    continue
            else: 
                append_filename_to_fileNameFile(filename) #add the xlfilename to the xlfilenameFile"
                continue
        ###################################################################################
        """ 4.2 try: read values to dataframe"""
        try:
            dataframe = pd.read_excel(Path(filename), sheet_name=4)
        except Exception as err:
                errorDescription = "Failed to read values from {0} to dataframe".format(filename)
                write_error_to_log(description=errorDescription, errorString=str(err))
                continue
        ###################################################################################
        """ 4.2 try: open connection to database"""
        print("{0}Trying to open connection to database {1} on {2}".format(line, sqlDatabase, sqlServer))
        try:
            sql_connection = sql_connector() #create connection to server
            stuff = sql_connection.cursor()
        except Exception as err:
            write_error_to_log(description="Failed to open connection:", errorString=str(err))
            continue
        ###################################################################################
        """ 4.3 try: write to database"""
        headers = list(dataframe) #copy header from dataframe to list; easier to iterate
        values = dataframe.values.tolist() #copy values from dataframe to list of lists [[row1][row2]...]; easier to iterate
        for row in range(len(values)): #iterate over lines
            dbKey = str(values[row][0]) #first col is key
            sqlCommandString = sql_insert_builder(dbKey=dbKey)
            """ 4.3.1 firts trying to create (aka insert) new record in db ..."""
            try: 
                print("{0}Trying insert new record with the id {1}".format(line, dbKey))
                stuff.execute(sqlCommandString)
                sql_connection.commit()
                print(sqlCommandString)
            except Exception as err:
                sql_log_string = " ".join(sqlCommandString.split()) #get rid of whitespace in sql command
                write_error_to_log(description="Failed to create new record in DB:", errorString=str(err), optDetails=sql_log_string)
            else: #if record was created add the values one by one:
                print("{0}Trying to add values to record with the ID {1}".format(line, dbKey))
            """ 4.3.2 ... than trying to add the values one by one"""
            for col in range(1, len(headers)): #skip col 0 (the key)
                dbField = str(headers[col]) #field in db is header in the excel sheet
                dbValue = str(values[row][col]) #get the corresponding value
                dbValue = (dbValue.replace("\"","")).replace("\'","") #getting rid of ' and " to prevent trouble with the sql command
                sqlCommandString = sql_update_builder(dbField, dbValue, dbKey) # calling fuction to create a sql update command string
                try: #try to commit the sql command
                    stuff.execute(sqlCommandString)
                    sql_connection.commit()
                    print(sqlCommandString)
                except Exception as err:
                    sql_log_string = " ".join(sqlCommandString.split()) #get rid of whitespace in sql command
                    write_error_to_log(description="Failed to add values in DB:", errorString=str(err), optDetails=sql_log_string)
        append_filename_to_fileNameFile(filename)

    print(line)
    # wait for a certain amount of time
    for i in range(tick_time_seconds, 0, -1):
                sys.stdout.write("\r" + str(i))
                sys.stdout.flush()
                time.sleep(1)
                sys.stdout.flush()
    print(line)
    #break # this is for debuggung
使用观察者