watchdog（python）-只监视一种文件格式，而忽略'；PatternMatchingEventHandler'；_Python_Python 3.x_Csv_File Handling_Python Watchdog

watchdog（python）-只监视一种文件格式，而忽略'；PatternMatchingEventHandler'；

python python-3.x csv

watchdog（python）-只监视一种文件格式，而忽略'；PatternMatchingEventHandler'；,python,python-3.x,csv,file-handling,python-watchdog,Python,Python 3.x,Csv,File Handling,Python Watchdog,我正在运行中的代码，并做了一些更改，以监视文件创建/添加，只使用一种格式，即指定目录中的.csv 现在的问题是：每当添加的新文件不是.csv格式时，我的程序就会中断（停止监视，但保持运行）；为了弥补这一点，下面是我对ignore_patterns参数所做的操作（但在添加其他格式的新文件后，程序仍会停止监视）： PatternMatchingEventHandler（patterns=“*.csv”、ignore\u patterns=[“*~”]，ignore\u directories=Tr

我正在运行中的代码，并做了一些更改，以监视文件创建/添加，只使用一种格式，即指定目录中的

.csv

现在的问题是：

每当添加的新文件不是.csv格式时，我的程序就会中断（停止监视，但保持运行）；为了弥补这一点，下面是我对

ignore_patterns

参数所做的操作（但在添加其他格式的新文件后，程序仍会停止监视）：

PatternMatchingEventHandler（patterns=“*.csv”、ignore\u patterns=[“*~”]，ignore\u directories=True，区分大小写=True）

完整的代码是：

import time
import csv
from datetime import datetime
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
from os import path
from pandas import read_csv
# class that takes care of everything
class file_validator(PatternMatchingEventHandler):
    def __init__(self, source_path):
        # setting parameters for 'PatternMatchingEventHandler'
        super(file_validator, self).__init__(patterns="*.csv", ignore_patterns=["*~"], ignore_directories=True, case_sensitive=True)
        self.source_path = source_path
        self.print_info = None

    def on_created(self, event):
        # this is the new file that was created
        new_file = event.src_path
        # details of each new .csv file
        # demographic details
        file_name = path.basename(new_file)
        file_size = f"{path.getsize(new_file) / 1000} KiB"
        file_creation = f"{datetime.fromtimestamp(path.getmtime(new_file)).strftime('%Y-%m-%d %H:%M:%S')}"
        new_data = read_csv(new_file)
        # more details
        number_columns = new_data.shape[1]
        data_types_data = [
            ('float' if i == 'float64' else ('int' if i == 'int64' else ('character' if i == 'object' else i))) for i in
            [x.name for x in list(new_data.dtypes)]]
        null_count_data = list(dict(new_data.isna().sum()).values())
        print(f"{file_name}, {file_size}, {file_creation}, {number_columns}")
        # trying to access this info, but of no help
        self.print_info = f"{file_name}, {file_size}, {file_creation}, {number_columns}"

    def return_logs(self):
        return self.print_info

# main function    
if __name__ == "__main__":
    some_path = "C:\\Users\\neevaN_Reddy\\Documents\\learning dash\\"
    my_validator = file_validator(source_path=some_path)
    my_observer = Observer()
    my_observer.schedule(my_validator, some_path, recursive=True)
    my_observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        my_observer.stop()
        my_observer.join()
    # # this doesn't print anything
    print(my_validator.return_logs)

编辑1（在昆汀·普拉代特的评论之后）： 在您在评论中提出建议后，我将我的论点改为：

super(file_validator, self).__init__(patterns="*.csv",
                                     # ignore_patterns=["*~"],
                                     ignore_directories=True, 
                                     case_sensitive=True)

当我复制其他格式的文件时（我尝试使用

.ipynb

文件），我看到的就是这个错误（此后程序甚至停止监视

.csv

文件）：

显然，

pandas

出现了一些错误，这意味着对于非

.csv

的文件格式，正在触发我创建的

函数上的，我想这意味着当添加其他格式的文件时，必须在ignore\u patterns
参数中加入on\u created
函数，才能不触发该函数。
您能否尝试将patterns
作为列表而不是字符串发送，例如patterns=[“*.csv”]
？
您是否可以尝试将模式作为列表而不是字符串发送，例如patterns=[“*.csv”]
？
使用patterns=“*.csv”
您不应该添加忽略模式=[“*~”]
。您的程序是如何中断的？@QuentinPradet，现在请检查我问题中的编辑。使用patterns=“*.csv”
您不必添加忽略模式=[“*~”]
。您的程序是如何中断的？@QuentinPradet，请现在检查我问题中的编辑。在我复制另一种格式的新文件或/和文件夹以及.csv文件后，这些文件将被处理两次。在我的例子中，文件的信息在控制台上打印了两次。是的，可能有多个事件。您可以根据事件类型
对其进行筛选。如果这不起作用，请用暴露问题的最少代码问另一个问题，这将帮助您获得答案。在我复制另一种格式的新文件或/和文件夹以及.csv文件后，这些文件将被处理两次。在我的例子中，文件的信息在控制台上打印了两次。是的，可能有多个事件。您可以根据事件类型对其进行筛选。如果这不起作用，请用最少的代码问另一个问题，这将有助于你得到答案。
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\observers\api.py", line 199, in run
    self.dispatch_events(self.event_queue, self.timeout)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\observers\api.py", line 368, in dispatch_events
    handler.dispatch(event)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\watchdog\events.py", line 454, in dispatch
    _method_map[event_type](event)
  File "C:/Users/neevaN_Reddy/Documents/Work/Project-Aretaeus/diabetes_risk project/file validation using a class.py", line 26, in on_created
    new_data = read_csv(new_file)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 463, in _read
    data = parser.read(nrows)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 1154, in read
    ret = self._engine.read(nrows)
  File "C:\Users\neevaN_Reddy\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\parsers.py", line 2059, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2