Python Django dynamic scraper无法刮取数据
我不熟悉使用动态刮刀,我使用了以下示例进行学习。我已经设置好了所有内容,但仍显示相同的错误:Python Django dynamic scraper无法刮取数据,python,django,scrapy,django-dynamic-scraper,Python,Django,Scrapy,Django Dynamic Scraper,我不熟悉使用动态刮刀,我使用了以下示例进行学习。我已经设置好了所有内容,但仍显示相同的错误:dynamic\u.models.DoesNotExist:RequestPageType匹配查询不存在。 2015-11-20 18:45:11+0000 [article_spider] ERROR: Spider error processing <GET https://en.wikinews.org/wiki/Main_page> Traceback (most recent cal
dynamic\u.models.DoesNotExist:RequestPageType匹配查询不存在。
2015-11-20 18:45:11+0000 [article_spider] ERROR: Spider error processing <GET https://en.wikinews.org/wiki/Main_page>
Traceback (most recent call last):
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/base.py", line 825, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 645, in _tick
taskObj._oneWorkUnit()
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 491, in _oneWorkUnit
result = next(self._iterator)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback
yield next(it)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output
for x in result:
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
return (r for r in result or () if _filter(r))
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/spiders/django_spider.py", line 378, in parse
rpt = self.scraper.get_rpt_for_scraped_obj_attr(url_elem.scraped_obj_attr)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/models.py", line 98, in get_rpt_for_scraped_obj_attr
return self.requestpagetype_set.get(scraped_obj_attr=soa)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/manager.py", line 127, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/query.py", line 334, in get
self.model._meta.object_name
dynamic_scraper.models.DoesNotExist: RequestPageType matching query does not exist.
2015-11-20 18:45:11+0000[文章蜘蛛]错误:蜘蛛错误处理
回溯(最近一次呼叫最后一次):
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/Twisted/internet/base.py”,第825行,rununtillcurrent
call.func(*call.args,**call.kw)
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/Twisted/internet/task.py”,第645行,勾号
taskObj._oneWorkUnit()
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/Twisted/internet/task.py”,第491行,在oneWorkUnit中
结果=下一个(自身迭代)
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/utils/defer.py”,第57行,在
work=(iterable中的elem可调用(elem,*args,**命名)
--- ---
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/utils/defer.py”,第96行,在iter\u errback中
下一个(it)
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/contrib/spidermiddleware/offsite.py”,第26行,进程中的蜘蛛输出
对于结果中的x:
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/contrib/spidermiddleware/referer.py”,第22行
返回(_set_referer(r)表示结果中的r或())
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/contrib/spidermiddleware/urlength.py”,第33行,在
返回(结果中的r表示r或()如果_过滤器(r))
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/scrapy/contrib/spidermiddleware/depth.py”,第50行,in
返回(结果中的r表示r或()如果_过滤器(r))
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/dynamic_scraper/spider/django_spider.py”,第378行,在解析中
rpt=self.scraper.get_rpt_for_scraped_obj_attr(url_elem.scraped_obj_attr)
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/dynamic_scraper/models.py”,第98行,在get_rpt_for_scraped_obj_attr中
返回self.requestpagetype\u set.get(scraped\u obj\u attr=soa)
文件“/home/suz/social network sujit/local/lib/python2.7/site packages/Django-1.8.5-py2.7.egg/Django/db/models/manager.py”,第127行,在manager_方法中
返回getattr(self.get_queryset(),name)(*args,**kwargs)
get中的文件“/home/suz/social network sujit/local/lib/python2.7/site packages/Django-1.8.5-py2.7.egg/Django/db/models/query.py”,第334行
self.model.\u meta.object\u name
dynamic_.models.DoesNotExist:请求页面类型匹配查询不存在。
这是由于缺少“请求页面类型”造成的。
每个“刮板元素”必须有自己的“请求页面类型”
要解决此问题,请执行以下步骤:
scrapy crawl article\u spider-a id=1-a do\u action=yes
。
你应该能够抓取“文章”。
你可以在家里查看›开放新闻›文章
享受~这是由于缺少“请求页面类型”造成的。
每个“刮板元素”必须有自己的“请求页面类型”
要解决此问题,请执行以下步骤:
scrapy crawl article_spider -a id=1 -a do_action=yes