Python 2.7 如何在python Scrapy中使用2产量？_Python 2.7_Callback_Request_Scrapy_Yield

Python 2.7 如何在python Scrapy中使用2产量？

python-2.7 scrapy

Python 2.7 如何在python Scrapy中使用2产量？,python-2.7,callback,request,scrapy,yield,Python 2.7,Callback,Request,Scrapy,Yield,我使用的是Scrapy和python 2.7 我需要在我的课上使用2个产量，第一个用于刮子猫，第二个用于分页。我需要那样的东西 class myClass(BaseSpider): cmt = 0 def __init__(self, *args, **kwargs): super(myClass, self).__init__(*args, **kwargs) def start_requests(self): start_urls =

我使用的是Scrapy和python 2.7 我需要在我的课上使用2个产量，第一个用于刮子猫，第二个用于分页。我需要那样的东西

class myClass(BaseSpider):
    cmt = 0
    def __init__(self, *args, **kwargs):
         super(myClass, self).__init__(*args, **kwargs)

    def start_requests(self):
       start_urls = 'https://www.adresse.fr'
       yield Request(start_urls, callback=self.firstDef)

    def firstDef(self,response):
        body = response.css('body').extract_first()
        #put the body in a file
        if (self.cmt > 10) :
            url = 'https://www.theOtherAdresse.com'
            yield Request(url, callback=self.secondDef)
        print self.cmt
        self.cmt = self.cmt + 1
        yield Request(response.url, callback=self.firstDef)

    def secondDef(self,response):
       body = response.css('body').extract_first()
       #put the body in a file
       print "Finish"

我的代码怎么了？为什么我不能有2个收益率

更新：

我阅读并使用爬行蜘蛛，但我还不能给secondDef打电话

更新

我的代码：

class Myclass(CrawlSpider):
    reload(sys)
    pageNumber = 0
    cmt = 0
    sys.setdefaultencoding('utf8')
    name = 'myclass'
    allowed_domains = ["amazon.fr"]
    firstPage = True
    rules = [
        Rule(LinkExtractor(restrict_xpaths=('//div[@id="mainResults"]//h3[@class="newaps"]/a',)),
             callback='parse_page1', follow=True),
        Rule(LinkExtractor(restrict_xpaths=('//div[@id="bottomBar"]/div[@id="pagn"]/span[@class="pagnLink"]/a',)),
             follow=True),
        Rule(LinkExtractor(restrict_xpaths=(
            '//div[@class="s-item-container"]//a[@class="a-link-normal s-access-detail-page a-text-normal"]',)),
            callback='parse_page1', follow=True),
    ]
    arrayCategories = []
    pageCrawled = []
    fileNumbers = 0
    first = 0
    start_urls = ['https://www.amazon.fr/s/ref=sr_nr_p_6_0?fst=as%3Aoff&rh=n%3A197861031%2Cn%3A!197862031%2Cn%3A212130031%2Cn%3A3008171031%2Cp_76%3A211708031%2Cp_6%3AA1X6FK5RDHNB96&bbn=3008171031&ie=UTF8&qid=1463074601&rnid=211045031'
                    ,'https://www.amazon.fr/s/ref=sr_nr_p_6_0?fst=as%3Aoff&rh=n%3A197861031%2Cn%3A!197862031%2Cn%3A212130031%2Cn%3A3008171031%2Cp_76%3A211708031%2Cp_6%3AA1X6FK5RDHNB96&bbn=3008171031&ie=UTF8&qid=1463074601&rnid=211045031',
                    'https://www.amazon.fr/s/ref=sr_nr_n_1/275-0316831-3563928?fst=as%3Aoff&rh=n%3A197861031%2Cn%3A%21197862031%2Cn%3A212130031%2Cn%3A3008171031%2Cp_76%3A211708031%2Cp_6%3AA1X6FK5RDHNB96%2Cn%3A212136031&bbn=3008171031&ie=UTF8&qid=1463075247&rnid=3008171031',
                    ]
    def __init__(self, idcrawl=None, iddrive=None, idrobot=None, proxy=None, *args, **kwargs):
        super(Myclass, self).__init__(*args, **kwargs)

    def start_requests(self):
        for i in range (0, len(self.start_urls)):
            yield Request(self.start_urls[i], callback=self.parse)

    def parse(self, response):
        yield Request(response.url, callback = self.parse_produit)
        hxs = HtmlXPathSelector(response)

        try:
            nextPageLink = hxs.select("//a[@id='pagnNextLink']/@href").extract()[0]
            nextPageLink = urlparse.urljoin(response.url, nextPageLink)
            self.log('\nGoing to next search page: ' + nextPageLink + '\n', log.DEBUG)
            yield Request(nextPageLink, callback=self.parse)
        except:
            self.log('Whole category parsed: ', log.DEBUG)

    def parse_produit(self,response):
        print self.pageNumber
        body = response.css('body').extract_first()
        hxs = HtmlXPathSelector(response)
        body = response.css('body').extract_first()
        f = io.open('./amazon/page%s' % str(self.pageNumber), 'w+', encoding='utf-8')
        f.write(body)
        f.close()
        self.pageNumber = self.pageNumber + 1

我不认为有两个收益率是你的问题，我认为这是

if self.cmt>10

语句，这就是为什么我问你是否见过self.cmt值大于10。下面是一个在一个方法中使用两个Yeld的快速演示

def example():
    for i in range(1,10):
        yield i
        yield i * i


for e in example():
    print e

这是它的输出： 1. 1. 2. 4. 3. 9 4. 16 这正是你所期望的

另一种可能性是，scrapy有一个重复的URL筛选器。如果在请求中添加
，dont_filter=True
，则过滤器将被禁用。见文件
最后，从scrapy.spider继承你的蜘蛛

class myClass(scrapy.Spider)

更新：你有任何证据表明firstDef（）被调用了不止一次，因为它看起来不像是？？？
你得到了什么错误？basespider不是被弃用了吗？@eLRuLL我没有错误，但我不能调用secondDef，我的意思是你从未“完成”@Rafaelalmeda不我试过爬行蜘蛛too@parik你曾经解决过你的问题吗？我需要知道的是如何使用2个不同的2 def中2个回调的2个收益率，也许我的示例不适合否决投票不是我，你的answe帮不了我，我想我的问题是在Scrapy中解析方法。不管怎样，谢谢你的回答现在让Parik担心。我只是想说，一个例程中的两个收益应该可以毫无问题地工作，然后继续尝试并诊断代码发生了什么。啊，好吧。我发现你的例子@Steve对我很有用