Python Scrapy-如何在访问被拒绝时刮取网站[Lowes]

Python Scrapy-如何在访问被拒绝时刮取网站[Lowes],python,curl,web-scraping,scrapy,Python,Curl,Web Scraping,Scrapy,因此,我正在尝试为Lowe的网站创建一个webscraper,而该网站似乎不允许使用机器人 在scrapy shell上运行时,我得到:twisted.internet.error.TimeoutError:用户超时导致连接失败: 然后我运行命令:curl-v"https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-ZLINE-24-2-8-cu-ft-Dual-Fuel-Range-with-Gas-Stove-and-Electric-Oven-in-Stai

因此,我正在尝试为Lowe的网站创建一个webscraper,而该网站似乎不允许使用机器人

在scrapy shell上运行时,我得到:
twisted.internet.error.TimeoutError:用户超时导致连接失败:

然后我运行命令:
curl-v"https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-ZLINE-24-2-8-cu-ft-Dual-Fuel-Range-with-Gas-Stove-and-Electric-Oven-in-Stainless-Steel-and-Blue-Gloss-Door/5001835677?cm_mmc=shp-_-c-u-prd--app--google--pla--186--soscooking--5001835677--0&placeholder=null&ds\u rl=1286981&ds\u rl=1286890&gclid=cj0kcqjwgtwgdbhdzarisadekwgo2jvgldgj3y9hjem0ympbhpji08iddk_g1vODT42ZrVZ-kPm5aISYaAuHpEALw_wcB&gclsrc=aw.ds“

结果我被网站拒绝了

经过一些研究,我发现如果我模仿成为一个“真正的用户”,那么我就尝试这样做(设置一个位置cookie)

然而,这仍然让我在超时时出现同样的错误

加载网站时,我检查了控制台:

但我也不确定我应该寻找什么或过滤什么。 是否有任何文档或提示,任何人都必须模仿使用scrapy的真实用户


谢谢您的帮助!

我所要做的就是添加用户代理

1.)安装旋转用户代理:

pip3 install scrapy-useragents
2.)将代码添加到settings.py

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500,
}
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENTS = [
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/57.0.2987.110 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.79 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) '
     'Gecko/20100101 '
     'Firefox/55.0'),  # firefox
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.91 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/62.0.3202.89 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/63.0.3239.108 '
     
“Safari/537.36”),#铬
]

试图从浏览器中使用完全相同的标题。@david用户代理对吗?不仅是代理,而且是所有标题值和参数
DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500,
}
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENTS = [
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/57.0.2987.110 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.79 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) '
     'Gecko/20100101 '
     'Firefox/55.0'),  # firefox
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.91 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/62.0.3202.89 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/63.0.3239.108 '