Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在scrapy spider中按请求设置自定义提要文件名?_Python_Web Scraping_Scrapy_Web Crawler - Fatal编程技术网

Python 如何在scrapy spider中按请求设置自定义提要文件名?

Python 如何在scrapy spider中按请求设置自定义提要文件名?,python,web-scraping,scrapy,web-crawler,Python,Web Scraping,Scrapy,Web Crawler,我已经设置了一个蜘蛛,如下所示。我从邮递员那里向同一个facebook蜘蛛发出多个不同搜索字符串的请求。为具有相同名称的所有请求生成输出提要文件fb_20201025024201.json 对于不同的请求,预期的文件名应该不同,因为我在不同的时间发出请求 class Facebook(scrapy.Spider): name = "fb" start_urls = [FB_SEARCH_URL] allowed_domains = ["facebook.co

我已经设置了一个蜘蛛,如下所示。我从邮递员那里向同一个facebook蜘蛛发出多个不同搜索字符串的请求。为具有相同名称的所有请求生成输出提要文件fb_20201025024201.json

对于不同的请求,预期的文件名应该不同,因为我在不同的时间发出请求

class Facebook(scrapy.Spider):
   
name = "fb"
start_urls = [FB_SEARCH_URL]
allowed_domains = ["facebook.com"]
fb_url = FB_SEARCH_URL
timestring = time.strftime("%Y%m%d%H%M%S")
FB_ROOT = "/fb"
feed_uri = setting.S3_BASE_PATH + FB_ROOT + "/fb_{}.json".format(timestring)

# setting to save extracted data.
custom_settings = {
    "ITEM_PIPELINES": {"fb_scrapping.fb_scrapping.pipelines.JSONPipeline": 200,},
    "FEEDS": {
        feed_uri: {"format": "json", "encoding": "utf8", "indent": 4,}
    },
    "FEED_EXPORT_ENCODING": "utf-8",
    "FEED_EXPORT_INDENT": 2,
}

def parse(self, response):
   
    search_key_list = getattr(self, "keys", None)
    if len(search_key_list) == 0:
        self.logger.error("no search key details provided")
    else:
        self.logger.info(f"crawler request payload : {search_key_list}")
        for search_key in search_key_list:
            search_string = {
                "key": search_key,
            }
            self.logger.debug(
                "scrapping for key: "
                + search_string
            )

            # fetch user details for each search string
            details_info = self.extract_info(
                search_string, GET_DETAILS
            )
            self.logger.debug(
                "after extraction " + json.dumps(details_info)
            )

            for search_key in search_key_list:
                user_name = details_info["User Name"]
                search_string = {
                    "key": search_key,
                }
                # fetch more details for each key
                more_details_info = self.extract_info(
                    search_string, GET_MORE_DETAILS
                )
                doc = {
                    "search_key": search_key,
                    "INFO": details_info,
                    "MORE_INFO": more_details_info,
                }
                yield doc
请求1:

> curl --location --request POST 'http://localhost:8000/v1/fb/search/' \
--header 'X-CSRFToken: NDZ1cJZBq8Mbk9xEObdimb5BgI4KiAKXOYOQg6Ipeu4wDN' \
--header 'Content-Type: application/json' \
--header 'Cookie: csrftoken=NDZ1cJZBq8Mbk9xEObdimb5BgI4KiAKXOYOQg6Ipeu4wDN' \
--data-raw '[
        {
            "key" : "Amazon",
        }
]'
请求2:

> curl --location --request POST 'http://localhost:8000/v1/fb/search/' \
--header 'X-CSRFToken: NDZ1cJZBq8Mbk9xEObdimb5BgI4KiAKXOYOQg6Ipeu4wDN' \
--header 'Content-Type: application/json' \
--header 'Cookie: csrftoken=NDZ1cJZBq8Mbk9xEObdimb5BgI4KiAKXOYOQg6Ipeu4wDN' \
--data-raw '[
        {
            "key" : "Google",
        }
]'
两个请求都生成了文件fb_20201025024201.json。因此,它们相互超越

附加信息:我正在使用DJango和芹菜来授权scrapy任务


您能帮我为不同的请求生成不同的文件吗?

Scrapy feed导出不支持每个请求的文件存储。您将不需要使用提要导出,而是在spider、项目管道或spider中间件中实现相应的逻辑。