Python 无法使用scrapy刮取snapdeal数据_Python_Html_Web Scraping_Scrapy

Python 无法使用scrapy刮取snapdeal数据

python html web-scraping scrapy

Python 无法使用scrapy刮取snapdeal数据,python,html,web-scraping,scrapy,Python,Html,Web Scraping,Scrapy,尝试刮取snapdeal数据时的输出如下： scrapy shell "https://www.snapdeal.com" response.text u'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access

尝试刮取snapdeal数据时的输出如下：

scrapy shell "https://www.snapdeal.com"

response.text

u'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http&#58;&#47;&#47;www&#46;snapdeal&#46;com&#47;" on this server.<P>\nReference&#32;&#35;18&#46;1dd70b17&#46;1514632273&#46;17456300\n</BODY>\n</HTML>\n'

scrapy shell”https://www.snapdeal.com" response.text u'\n访问被拒绝\n\n访问被拒绝\n\n您无权访问此服务器上的“http:；/；/；www.；snapdeal.；com/；”。

\n请参阅 #18.1dd70b17和#46；1514632273.17456300\n\n\n'

有什么帮助吗？

如果我使用

用户代理

，那么我会得到正确的页面

scrapy shell

fetch("https://www.snapdeal.com", headers={'User-Agent': "Mozilla/5.0"})

response.text

或者使用脚本

import scrapy
#from scrapy.commands.view import open_in_browser

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['https://www.snapdeal.com/']

    def parse(self, response):
        print('url:', response.url)

        #open_in_browser(response)

        for item in response.xpath('//*[@class="catText"]/text()').extract():
            print(item)

# --- it runs without project ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(MySpider)
c.start()

这是刮保护，他们不想让你刮。您需要使用代理，也需要使用其他一些用户代理，scrapy shell将使用默认的scrapy用户代理。您必须复制整个请求，并模仿scrapy中的情况，它是最小的

用户代理

，主要起作用。如果您需要从浏览器中使用real

用户代理

，请访问