使用Scrapy以固定的时间间隔输出和输入带宽_Scrapy

使用Scrapy以固定的时间间隔输出和输入带宽

scrapy

使用Scrapy以固定的时间间隔输出和输入带宽,scrapy,Scrapy,是否可以在固定的时间间隔使用scrapy获取爬网期间使用的传出和传入带宽等统计信息？是的，这是可能的。=）请求和响应的总字节数已在stats中由跟踪。您可以添加另一个跟踪时间并添加新的统计信息以下是它的步骤： 1）在settings.py中配置一个新的下载器中间件，并使用高阶编号，以便稍后在管道中执行： DOWNLOADER_MIDDLEWARES = { 'testing.middlewares.InOutBandwithStats': 990, } 2）将以下代码放入与se

是否可以在固定的时间间隔使用scrapy获取爬网期间使用的传出和传入带宽等统计信息？

是的，这是可能的。=）

请求和响应的总字节数已在stats中由跟踪。您可以添加另一个跟踪时间并添加新的统计信息

以下是它的步骤：

1）在

settings.py

中配置一个新的下载器中间件，并使用高阶编号，以便稍后在管道中执行：

DOWNLOADER_MIDDLEWARES = {
    'testing.middlewares.InOutBandwithStats': 990,
}

2）将以下代码放入与

settings.py

相同目录中的

middleware.py

文件中

import time


class InOutBandwithStats(object):

    def __init__(self, stats):
        self.stats = stats
        self.startedtime = time.time()

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

    def elapsed_seconds(self):
        return time.time() - self.startedtime

    def process_request(self, request, spider):
        request_bytes = self.stats.get_value('downloader/request_bytes')

        if request_bytes:
            outgoing_bytes_per_second = request_bytes / self.elapsed_seconds()
            self.stats.set_value('downloader/outgoing_bytes_per_second',
                                 outgoing_bytes_per_second)

    def process_response(self, request, response, spider):
        response_bytes = self.stats.get_value('downloader/response_bytes')

        if response_bytes:
            incoming_bytes_per_second = response_bytes / self.elapsed_seconds()
            self.stats.set_value('downloader/incoming_bytes_per_second',
                                 incoming_bytes_per_second)

        return response

就这样。无论何时处理请求/响应，都将调用process\u request/process\u response方法，并相应地不断更新stats

如果您想在正常时间拥有日志，您也可以在那里调用

spider.log（'传入字节/秒：%s'%传入字节/秒）