Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/logging/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python CrawlerProcess中的多个爬行器-如何为每个爬行器获取日志?_Python_Logging_Scrapy - Fatal编程技术网

Python CrawlerProcess中的多个爬行器-如何为每个爬行器获取日志?

Python CrawlerProcess中的多个爬行器-如何为每个爬行器获取日志?,python,logging,scrapy,Python,Logging,Scrapy,场景: 具有多个spider的单个scrapy项目 所有爬行器从脚本一起运行 问题: 同一命名空间中的所有日志消息。不可能知道哪个消息属于哪个蜘蛛 在scrapy 0.24中,我在一个脚本中运行了多个爬行器,我得到了一个日志文件,其中包含与其爬行器相关的消息,如下所示: 2015-09-30 22:55:12-0400 [scrapy] INFO: Scrapy 0.24.5 started (bot: mybot) 2015-09-30 22:55:12-0400 [scrapy] D

场景:

  • 具有多个spider的单个scrapy项目
  • 所有爬行器从脚本一起运行
问题:

  • 同一命名空间中的所有日志消息。不可能知道哪个消息属于哪个蜘蛛
在scrapy 0.24中,我在一个脚本中运行了多个爬行器,我得到了一个日志文件,其中包含与其爬行器相关的消息,如下所示:

2015-09-30 22:55:12-0400 [scrapy] INFO: Scrapy 0.24.5 started (bot: mybot)
2015-09-30 22:55:12-0400 [scrapy] DEBUG: Enabled extensions: LogStats, ...
2015-09-30 21:55:12-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, ...
2015-09-30 21:55:12-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, ...
2015-09-30 21:55:12-0500 [scrapy] DEBUG: Enabled item pipelines: MybotPipeline
2015-09-30 21:55:12-0500 [spider1] INFO: Spider opened
2015-09-30 21:55:12-0500 [spider1] INFO: Crawled 0 pages ...
2015-09-30 21:55:12-0500 [spider2] INFO: Spider opened
2015-09-30 21:55:12-0500 [spider2] INFO: Crawled 0 pages ...
2015-09-30 21:55:12-0500 [spider3] INFO: Spider opened
2015-09-30 21:55:12-0500 [spider3] INFO: Crawled 0 pages ...
2015-09-30 21:55:13-0500 [spider2] DEBUG: Crawled (200) <GET ...
2015-09-30 21:55:13-0500 [spider3] DEBUG: Crawled (200) <GET ...
2015-09-30 21:55:13-0500 [spider1] DEBUG: Crawled (200) <GET ...
2015-09-30 21:55:13-0500 [spider1] INFO: Closing spider (finished)
2015-09-30 21:55:13-0500 [spider1] INFO: Dumping Scrapy stats: ...
2015-09-30 21:55:13-0500 [spider3 INFO: Closing spider (finished)
2015-09-30 21:55:13-0500 [spider3] INFO: Dumping Scrapy stats: ...
2015-09-30 21:55:13-0500 [spider2] INFO: Closing spider (finished)
2015-09-30 21:55:13-0500 [spider2] INFO: Dumping Scrapy stats: ...
显然,不可能知道哪条消息属于每个蜘蛛

问题是:有没有办法让以前的行为发生

还可以为每个spider提供不同的日志文件。[1]
但无法使用
自定义设置
覆盖spider中的日志文件。[2]

那么,有没有办法为每个spider提供不同的日志文件

[1]

[2] 我刚刚发现这是一个已知的“bug”:

已知解决方案:将
utils.log.TopLevelFormatter.filter更改为

def filter(self, record):
    if hasattr(record, 'spider'): 
        record.name = record.spider.name
    elif any(record.name.startswith(l + '.') for l in self.loggers):
        record.name = record.name.split('.', 1)[0]
    return True

@Djunzu的答案不太容易适用。所以我试着完善它

# -*- coding: utf-8 -*-

from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging, _get_handler, TopLevelFormatter

import datetime
import logging
import time

class MyTopLevelFormatter(TopLevelFormatter):
    def __init__(self, loggers=None, name=None):
        super(CurrencyTopLevelFormatter, self).__init__()
        self.loggers = loggers or []
        self.name = name

    def filter(self, record):
        if self.name in record.name: return False
        if hasattr(record, 'spider'): 
            if record.spider.name != self.name: return False
            record.name = record.spider.name + "." + record.name
        elif hasattr(record, 'crawler') and hasattr(record.crawler, 'spidercls'): 
            if record.crawler.spidercls.name != self.name: return False
            record.name = record.crawler.spidercls.name + "." + record.name
        elif any(record.name.startswith(l + '.') for l in self.loggers):
            record.name = record.name.split('.', 1)[0]
        return True

def log_init(name): 
    now = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d-%H-%M-%S')
    configure_logging({'LOG_FILE' : "../logs/{0}_{1}.log".format(name,now)}, install_root_handler=False)
    settings = get_project_settings()
    settings['LOG_FILE'] = "../logs/{0}_{1}.log".format(name,now)
    settings['DISABLE_TOPLEVELFORMATTER'] = True
    handler = _get_handler(settings)
    handler.addFilter(MyTopLevelFormatter(loggers=[__name__], name=name))
    # handler.addFilter(TopLevelFormatter(loggers=[__name__]))
    logging.root.addHandler(handler)
然后在你的蜘蛛身上,做这个:

类MySpider(scrapy.Spider): #省略

def __init__(self):
    log_init(self.name)

如何更改?是否有猴子补丁样本?
def __init__(self):
    log_init(self.name)