Web scraping 属性错误:';非类型';对象没有属性';css';。正在尝试刮除旧reddit,但遇到此错误
我正在尝试刮除,但每次我都会出现以下错误:Web scraping 属性错误:';非类型';对象没有属性';css';。正在尝试刮除旧reddit,但遇到此错误,web-scraping,scrapy,reddit,Web Scraping,Scrapy,Reddit,我正在尝试刮除,但每次我都会出现以下错误: >>> response.css('div') Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'css' >response.css('div')) 回溯(最近一次呼叫最后一次): 文
>>> response.css('div')
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'css'
>response.css('div'))
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
AttributeError:“非类型”对象没有属性“css”
是我做错了什么,还是你不能擦掉旧的reddit
这是日志:
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://old.reddit.com/robots.txt> (referer: None)
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://old.reddit.com/> from <GET http://old.reddit.com>
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
[scrapy.core.engine]调试:已爬网(200)(参考:无)
2020-11-02 14:56:09[scrapy.downloadermiddleware.redirect]调试:重定向(301)到
2020-11-02 14:56:09[scrapy.downloadermiddleware.robotstxt]调试:被robots.txt禁止:
这是我的scrapy shell输出,希望对您有所帮助
(scrapy_env) rana@rana-desktop:~/Documents/allproject/scrapy_projt/tutorial$
$ scrapy shell https://old.reddit.com/
In [2]: response.status
Out[2]: 200
In [3]: response.css('div')
Out[3]:
[<Selector xpath='descendant-or-self::div' data='<div class="GoogleAd HomeAds InArticl...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header" role="banner"><a tab...'>,
<Selector xpath='descendant-or-self::div' data='<div id="sr-header-area"><div class="...'>,
<Selector xpath='descendant-or-self::div' data='<div class="width-clip"><div class="d...'>,
<Selector xpath='descendant-or-self::div' data='<div class="dropdown srdrop" onclick=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="drop-choices srdrop"><a h...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sr-list"><ul class="flat-...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-left"><a href=...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-right"><span c...'>,
<Selector xpath='descendant-or-self::div' data='<div class="side"><div class="spacer"...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form action="htt...'>,
<Selector xpath='descendant-or-self::div' data='<div id="searchexpando" class="infoba...'>,
<Selector xpath='descendant-or-self::div' data='<div id="moresearchinfo"><p>use the f...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form method="pos...'>,
<Selector xpath='descendant-or-self::div' data='<div class="g-recaptcha" data-sitekey...'>,
<Selector xpath='descendant-or-self::div' data='<div class="status"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div id="remember-me"><input type="ch...'>,
<Selector xpath='descendant-or-self::div' data='<div class="submit"><span class="thro...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clear"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-lin...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-tex...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><a href="/premium...'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__logo"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__title">Ge...'>,
<Selector xpath='descendant-or-self::div' data='<div class="content" role="main"><sec...'>,
<Selector xpath='descendant-or-self::div' data='<div class="listingsignupbar__cta-con...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style type="text...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now-wrap"><div ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now"><div><p cl...'>,
<Selector xpath='descendant-or-self::div' data='<div><p class="icon"><img src="//www....'>,
<Selector xpath='descendant-or-self::div' data='<div class="close-button">x</div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style>body >.con...'>,
<Selector xpath='descendant-or-self::div' data='<div id="siteTable" class="sitetable ...'>,
<Selector xpath='descendant-or-self::div' data='<div class=" thing id-t3_jmlqpj odd ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="midcol unvoted"><div clas...'>,
<Selector xpath='descendant-or-self::div' data='<div class="arrow up login-required a...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clearleft"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="nav-buttons"><span class=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="footer-parent"><div by-ze...'>,
<Selector xpath='descendant-or-self::div' data='<div by-zero class="footer rounded"><...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>]
(刮擦环境)rana@rana-桌面:~/Documents/allproject/scrapy\u projt/tutorial$
$scrapy shellhttps://old.reddit.com/
在[2]中:response.status
Out[2]:200
在[3]:response.css('div')中
出[3]:
[,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.con…“>,
,
,
,
,
,
,
,
,
,
,
,
]
您收到此错误是因为您收到了一个空响应(None
)。因此,您试图在一个空变量中调用.css()
方法。之所以收到None
而不是预期的响应对象,是因为您的爬行器过滤了请求
您可以在执行日志的这一行中看到:
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
这将导致爬行器忽略所有请求的robots.txt
。()
然而,尊重robots.txt规则被认为是网络垃圾处理的良好实践(甚至可以说是合乎道德的)。更多关于
robots.txt
标准的详细信息。我修复了它。谢谢你
ROBOTSTXT_OBEY = False