Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/295.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
无法使用python scrapy刮取URL,因为我包含#(URI片段)_Python_Scrapy_Fragment_Uri - Fatal编程技术网

无法使用python scrapy刮取URL,因为我包含#(URI片段)

无法使用python scrapy刮取URL,因为我包含#(URI片段),python,scrapy,fragment,uri,Python,Scrapy,Fragment,Uri,我也面临同样的问题。有人能告诉我怎样才能刮取下面提到的URL吗 start_urls = [ 'https://onlinelibrary.ectrims-congress.eu/ectrims/#!*menu=6*browseby=3*sortby=2*media=3*ce_id=1428' ] 我得到的回应是 Crawled (200) <GET https://onlinelibrary.ectrims-congress.eu/ectrims/?_escaped_fragment_

我也面临同样的问题。有人能告诉我怎样才能刮取下面提到的URL吗

start_urls = [ 'https://onlinelibrary.ectrims-congress.eu/ectrims/#!*menu=6*browseby=3*sortby=2*media=3*ce_id=1428' ]
我得到的回应是

Crawled (200) <GET https://onlinelibrary.ectrims-congress.eu/ectrims/?_escaped_fragment_=%2Amenu%3D6%2Abrowseby%3D3%2Asortby%3D2%2Amedia%3D3%2Ace_id%3D1428%3E> (referer: None) ['cached']
Crawled(200)(referer:None)['cached']
但不幸的是,我无法提取数据(response.xpath),因为它给了我空值。这是因为当我单击响应URL时,它似乎没有给我想要从中获取数据的确切URL


请帮忙

网站

from scrapy import Request

class Ectrims(scrapy.Spider):
    name = 'library'

    headers = {
        "Connection": "keep-alive",
        "sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"",
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "X-Requested-With": "XMLHttpRequest",
        "sec-ch-ua-mobile": "?0",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Origin": "https://onlinelibrary.ectrims-congress.eu",
        "Sec-Fetch-Site": "same-origin",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Dest": "empty",
        "Referer": "https://onlinelibrary.ectrims-congress.eu/ectrims/",
        "Accept-Language": "en-US,en;q=0.9"
    }

    cookies = {
        "PHPSESSID": "if994kqobo2l80nk1ki7q233i5",
        "_ga": "GA1.2.212877690.1624208120",
        "_gid": "GA1.2.291339791.1624208120",
        "intercom-id-aucjjau5": "bdcafc49-97d0-42fb-b61e-46c74cfed3b0",
        "cp_user_200": "{\"1\":0}"
    }

    body = 'menu=6&browseby=3&sortby=2&media=3&ce_id=1428&getpage=1'

    def start_requests(self):
        url = 'https://onlinelibrary.ectrims-congress.eu/ectrims/listing/events/banners'
        yield Request(url=url, method='POST', cookies=self.cookies, headers=self.headers, body=self.body, callback=self.parse)


    def parse(self, response):
        print(response.body)
通过查看网站,您可以看到您想要获取的内容是由javascript驱动的,javascript通过发出AJAX请求,增加了通过API端点加载数据的机会。使用chrome开发工具,您可以检查XHR中是否加载了5个请求。但是,这个API
https://onlinelibrary.ectrims-congress.eu/ectrims/listing/events/banners
将在传递所需参数后为您提供所需数据,这些参数是Scrapy中的Header、Cookie和body

代码

from scrapy import Request

class Ectrims(scrapy.Spider):
    name = 'library'

    headers = {
        "Connection": "keep-alive",
        "sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"",
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "X-Requested-With": "XMLHttpRequest",
        "sec-ch-ua-mobile": "?0",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Origin": "https://onlinelibrary.ectrims-congress.eu",
        "Sec-Fetch-Site": "same-origin",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Dest": "empty",
        "Referer": "https://onlinelibrary.ectrims-congress.eu/ectrims/",
        "Accept-Language": "en-US,en;q=0.9"
    }

    cookies = {
        "PHPSESSID": "if994kqobo2l80nk1ki7q233i5",
        "_ga": "GA1.2.212877690.1624208120",
        "_gid": "GA1.2.291339791.1624208120",
        "intercom-id-aucjjau5": "bdcafc49-97d0-42fb-b61e-46c74cfed3b0",
        "cp_user_200": "{\"1\":0}"
    }

    body = 'menu=6&browseby=3&sortby=2&media=3&ce_id=1428&getpage=1'

    def start_requests(self):
        url = 'https://onlinelibrary.ectrims-congress.eu/ectrims/listing/events/banners'
        yield Request(url=url, method='POST', cookies=self.cookies, headers=self.headers, body=self.body, callback=self.parse)


    def parse(self, response):
        print(response.body)
如果有帮助,请投票