Python Scrapy twisted.internet.defer.\u DefGen\u Return:异常
嘿,我是scrapy的新手,刚刚写了我的第一个潦草蜘蛛。但我一次又一次地遇到这个例外Python Scrapy twisted.internet.defer.\u DefGen\u Return:异常,python,python-3.x,scrapy,Python,Python 3.x,Scrapy,嘿,我是scrapy的新手,刚刚写了我的第一个潦草蜘蛛。但我一次又一次地遇到这个例外 > Traceback (most recent call last): > File "/home/afraz/anaconda3/lib/python3.6/site- > packages/twisted/internet/defer.py", line 1301, in _inlineCallbacks > result = g.send(result)
> Traceback (most recent call last):
> File "/home/afraz/anaconda3/lib/python3.6/site-
> packages/twisted/internet/defer.py", line 1301, in _inlineCallbacks
> result = g.send(result)
> File "/home/afraz/anaconda3/lib/python3.6/site-
> packages/scrapy/core/downloader/middleware.py", line 43, in
> process_request
> defer.returnValue((yield
> download_func(request=request,spider=spider)))
> File "/home/afraz/anaconda3/lib/python3.6/site-
> packages/twisted/internet/defer.py", line 1278, in returnValue
> raise _DefGen_Return(val)
> twisted.internet.defer._DefGen_Return: <200
> http://www.kmart.com.au/product/plain-crew-tee/855808>
> During handling of the above exception, another exception occurred:
我想将OUUT保存在json文件lease help:)您能提供整个回溯和爬网日志吗?你用的是什么版本的刮痧?(scrapy版本-v的输出总是有用的)您使用过中间件吗?scrapy:1.3.3 lxml:3.7.2.0 libxml2:2.9.4 csselect:1.0.1 parsel:1.1.0 w3lib:1.17.0 Twisted:17.1.0 Python:3.6.0 | Anaconda 4.3.1(64位)|(默认值,2016年12月23日12:22:00)-]pyOpenSSL:16.2.0(OpenSSL 1.0.2k 2017年1月26日)平台:Linux-4.8.0-52-generic-x86_64-with-debian-stretch-sid完整的回溯和碎片爬网日志如何?stackoverflow不允许我在这里使用pyt,这太长了。例如,您可以使用pastebin服务或GitHub gist。
def parse_category(self, response):
print("inside parse_catagory with url {}".format(response.url))
urls_to_extract = LinkExtractor(deny=r"javascript",
restrict_css=
('div[id="resultcontent"]'
'> div'),
).extract_links(response)
urls_to_visit = [url_to_extract.url for url_to_extract in
urls_to_extract]
for url in urls_to_visit:
return scrapy.Request(url, callback="parse_item", meta=
{"parent_url": response.url})
def parse_item(self, response):
item = KmartItem()
item.description = self.get_description(response)
item.url = response.url
item.gender = self.get_gender(response)
item.name = response.css('h1[itemprop="name"]
::text').extract_first()
item.category = self.get_category(response)
item.image_urls = response.css('img[id="productMainImage"]
::attr(src)').extract()
item.product_id = response.css('span[class="sku"]
::text').extract_first()
yield item
def get_gender(self, response):
parent_url = response.meta["parent_url"]
category_substr = re.compile("category.*").findall(parent_url)
gender = category_substr[0].split('/')[1]
if gender is not "men" and gender is not "women":
gender = ("boy" if "boy" in parent_url else
"girl" if "girl" in parent_url else
"unisex children")
return gender