Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于从上次读取后的文件中提取新数据的Python脚本_Python_Parsing_Extraction - Fatal编程技术网

用于从上次读取后的文件中提取新数据的Python脚本

用于从上次读取后的文件中提取新数据的Python脚本,python,parsing,extraction,Python,Parsing,Extraction,我正在编写python脚本以执行以下操作:- 我希望每十分钟读取一次日志文件,每次读取时,我希望提取自上次读取以来添加到文件中的任何数据(最好不必每次读取整个日志文件)。例如:- 在09:00,我阅读了日志文件,内容是:- 1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13 2. 2011-07-04 11:15:04,508 Processing request 17897931 from sta

我正在编写python脚本以执行以下操作:-

我希望每十分钟读取一次日志文件,每次读取时,我希望提取自上次读取以来添加到文件中的任何数据(最好不必每次读取整个日志文件)。例如:-

在09:00,我阅读了日志文件,内容是:-

1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
6. 2011-07-04 11:15:04,510 Processing request 178819a1 from status 13 to 17
7. 2011-07-04 11:15:04,510 Processing request 17161df1 from status 7 to 13
8. 2011-07-04 11:15:04,511 Processing request 17161df1 from status 13 to 17
9. 2011-07-04 11:15:04,511 Processing request 182013e1 from status 7 to 9
在09:10,我再次阅读了日志文件,现在内容是:-

1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
1. 2011-07-04 11:15:04,507 Processing request 17897931 from status 7 to 13
2. 2011-07-04 11:15:04,508 Processing request 17897931 from status 13 to 17
3. 2011-07-04 11:15:04,508 Processing request d0fcb681 from status 7 to 13
4. 2011-07-04 11:15:04,509 Processing request d0fcb681 from status 13 to 17
5. 2011-07-04 11:15:04,509 Processing request 178819a1 from status 7 to 13
6. 2011-07-04 11:15:04,510 Processing request 178819a1 from status 13 to 17
7. 2011-07-04 11:15:04,510 Processing request 17161df1 from status 7 to 13
8. 2011-07-04 11:15:04,511 Processing request 17161df1 from status 13 to 17
9. 2011-07-04 11:15:04,511 Processing request 182013e1 from status 7 to 9
我的脚本如何提取新行(第6行到第9行)

我有一个shell脚本,它已经通过使用文件的inode完成了这个任务。我正在寻找一个基于python的解决方案

我的计划是通过crontab执行脚本

你们知道我该怎么做吗

  • 检查文件大小
  • 等待大小更改
  • 打开文件,搜索到以前的大小
  • 阅读
  • 例如:

    import os, time
    size = os.stat(file).st_size
    time.sleep(600)
    fh = open(file)
    fh.seek(size)
    newData = fh.read()
    

    如果另一个进程同时向日志中写入内容,则此示例将不时读取部分行。我将把这个解决方案留作练习:)

    嗨,卢克,谢谢你的回答。“time.sleep(600)”行的作用是什么?这是否意味着脚本总是在运行,并且每十分钟就会唤醒一次?这不能由crontab驱动吗?睡眠10分钟。您可以改用crontab,但随后必须将最后一个已知大小写入文件,以便知道下次从何处获取。