Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scrapy 刮毛圆';无法从第一页获取结果_Scrapy - Fatal编程技术网

Scrapy 刮毛圆';无法从第一页获取结果

Scrapy 刮毛圆';无法从第一页获取结果,scrapy,Scrapy,这是我的蜘蛛: from scrapy.contrib.spiders import CrawlSpider,Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from vrisko.items import VriskoItem class vriskoSpider(CrawlSpider): name

这是我的蜘蛛:

from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from vrisko.items import VriskoItem

class vriskoSpider(CrawlSpider):
    name = 'vrisko'
    allowed_domains = ['vrisko.gr']
    start_urls = ['http://www.vrisko.gr/search/%CE%B3%CE%B9%CE%B1%CF%84%CF%81%CE%BF%CF%82/%CE%BA%CE%BF%CF%81%CE%B4%CE%B5%CE%BB%CE%B9%CE%BF']
    rules = (
        Rule(SgmlLinkExtractor(allow=('\?page=\d')), callback='parse_vrisko'),
    )
    def parse_vrisko(self, response):

        hxs = HtmlXPathSelector(response)
        vriskoit = VriskoItem()
        vriskoit['eponimia'] = hxs.select("//a[@itemprop='name']/text()").extract()
        vriskoit['address'] = hxs.select("//div[@class='results_address_class']/text()").extract()
        print ' '.join(vriskoit['eponimia']).join(vriskoit['address'])
        return vriskoit
我尝试爬网的页面具有相同的格式 其中x=任意整数

我的问题是我的蜘蛛会爬除第一页以外的所有页面! 你知道为什么会这样吗


提前谢谢你

如果您查看scrapy doc,则启动URL响应将转到**

解析

**方法

所以你可以这样改变你的规则

规则=(
规则(SgmlLinkExtractor(allow=('\?page=\d')),callback='parse'),
)

方法名称从
def parse\u vrisko(self,response):
def parse(self,response):


或者,您可以删除start_URL并通过
def start_请求(self)启动您的爬行器:
并回调到
parse_vrisko

非常感谢,但我找到了答案。我用parse_start_url替换了parse_vrisko