Web scraping 属性错误:';非类型';对象没有属性';css';。正在尝试刮除旧reddit,但遇到此错误

Web scraping 属性错误:';非类型';对象没有属性';css';。正在尝试刮除旧reddit,但遇到此错误,web-scraping,scrapy,reddit,Web Scraping,Scrapy,Reddit,我正在尝试刮除,但每次我都会出现以下错误: >>> response.css('div') Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'css' >response.css('div')) 回溯(最近一次呼叫最后一次): 文

我正在尝试刮除,但每次我都会出现以下错误:

>>> response.css('div')

Traceback (most recent call last):

File "<console>", line 1, in <module>

AttributeError: 'NoneType' object has no attribute 'css'
>response.css('div'))
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
AttributeError:“非类型”对象没有属性“css”
是我做错了什么,还是你不能擦掉旧的reddit

这是日志:

[scrapy.core.engine] DEBUG: Crawled (200) <GET https://old.reddit.com/robots.txt> (referer: None)
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://old.reddit.com/> from <GET http://old.reddit.com>
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
[scrapy.core.engine]调试:已爬网(200)(参考:无)
2020-11-02 14:56:09[scrapy.downloadermiddleware.redirect]调试:重定向(301)到
2020-11-02 14:56:09[scrapy.downloadermiddleware.robotstxt]调试:被robots.txt禁止:

这是我的scrapy shell输出,希望对您有所帮助

(scrapy_env) rana@rana-desktop:~/Documents/allproject/scrapy_projt/tutorial$
$ scrapy shell https://old.reddit.com/

In [2]: response.status
Out[2]: 200

In [3]: response.css('div')
Out[3]: 
[<Selector xpath='descendant-or-self::div' data='<div class="GoogleAd HomeAds InArticl...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header" role="banner"><a tab...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="sr-header-area"><div class="...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="width-clip"><div class="d...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="dropdown srdrop" onclick=...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="drop-choices srdrop"><a h...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sr-list"><ul class="flat-...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header-bottom-left"><a href=...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header-bottom-right"><span c...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="side"><div class="spacer"...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><form action="htt...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="searchexpando" class="infoba...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="moresearchinfo"><p>use the f...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><form method="pos...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="g-recaptcha" data-sitekey...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="status"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div id="remember-me"><input type="ch...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="submit"><span class="thro...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="clear"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-lin...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-tex...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><a href="/premium...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="premium-banner__logo"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="premium-banner__title">Ge...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="content" role="main"><sec...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="listingsignupbar__cta-con...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><style type="text...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="happening-now-wrap"><div ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="happening-now"><div><p cl...'>,
 <Selector xpath='descendant-or-self::div' data='<div><p class="icon"><img src="//www....'>,
 <Selector xpath='descendant-or-self::div' data='<div class="close-button">x</div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><style>body >.con...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="siteTable" class="sitetable ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class=" thing id-t3_jmlqpj odd  ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="midcol unvoted"><div clas...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="arrow up login-required a...'>,

 <Selector xpath='descendant-or-self::div' data='<div class="clearleft"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nav-buttons"><span class=...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="footer-parent"><div by-ze...'>,
 <Selector xpath='descendant-or-self::div' data='<div by-zero class="footer rounded"><...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>]
(刮擦环境)rana@rana-桌面:~/Documents/allproject/scrapy\u projt/tutorial$
$scrapy shellhttps://old.reddit.com/
在[2]中:response.status
Out[2]:200
在[3]:response.css('div')中
出[3]:
[,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.con…“>,
,
,
,
,
,
,
,
,
,
,
,
]

您收到此错误是因为您收到了一个空响应(
None
)。因此,您试图在一个空变量中调用
.css()
方法。之所以收到
None
而不是预期的响应对象,是因为您的爬行器过滤了请求

您可以在执行日志的这一行中看到:

2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
这将导致爬行器忽略所有请求的
robots.txt
()


然而,尊重robots.txt规则被认为是网络垃圾处理的良好实践(甚至可以说是合乎道德的)。更多关于
robots.txt
标准的详细信息。

我修复了它。谢谢你
ROBOTSTXT_OBEY = False