Python 刮擦如何从产量立即打印结果_Python_Web Scraping_Scrapy

Python 刮擦如何从产量立即打印结果

python web-scraping scrapy

Python 刮擦如何从产量立即打印结果,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,函数parse在第一页中刮取链接。函数parse_product在下一页中删除详细信息，并检查是否有第三页需要删除。函数parse\u finalreport在第3页中删除详细信息。我面临的问题是，第三个函数parse_finalreport的输出最后一起打印。我想要这样的结果： dfgcontactperson year abstract dfgcontactperson empty empty dfgcontactperson year abstract dfgcontactperso

函数

parse

在第一页中刮取链接。函数

parse_product

在下一页中删除详细信息，并检查是否有第三页需要删除。函数

parse\u finalreport

在第3页中删除详细信息。我面临的问题是，第三个函数

parse_finalreport

的输出最后一起打印。我想要这样的结果：

dfgcontactperson
year
abstract

dfgcontactperson
empty
empty

dfgcontactperson
year
abstract

dfgcontactperson

dfgcontactperson
empty
empty

dfgcontactperson

year
abstract
year
abstract

但我得到的结果是这样的：

dfgcontactperson
year
abstract

dfgcontactperson
empty
empty

dfgcontactperson
year
abstract

dfgcontactperson

dfgcontactperson
empty
empty

dfgcontactperson

year
abstract
year
abstract

我的代码：

 def parse(self,response):
  for row in response.xpath('//div[contains(@class,"eintrag")]'):
        link = row.xpath('.//h2/a/@href').extract()
        link = ['https://gepris.dfg.de' + item + '?language=en' for item in link]
        for p in link:
            yield scrapy.Request(p,callback=self.parse_product)

def parse_product(self, response):
   dfgcontactperson = response.xpath('//div[@class="dfg_contact"]/span[@class="value"]/span/a/text()').extract()
   print(dfgcontactperson)
   finalreport = response.xpath('//ul[@class="tab1"]/li[@id="tabbutton2"]/a/@href').extract()
        finalreport = ['https://gepris.dfg.de' + item + '?language=en' for item in finalreport]
        if not finalreport:
            print('empty')
            print('empty')
        for x in finalreport:
            yield scrapy.Request(x,callback=self.parse_finalreport)

 def parse_finalreport(self,response):
        year = response.xpath('//div[@id="projektbeschreibung"]//span[contains(text(),"Final Report Year")]/following-sibling::span//text()').extract()
        abstract = response.xpath('//div[@id="projektbeschreibung"]/h4[contains(text(),"Abstract")]/following-sibling::p/text()').extract()
        print(year)
        print(abstract)

为什么首先要打印它？我想将输出传递到我的C#app您考虑过使用吗？为什么首先要打印它？我想将输出传递到我的C#app您考虑过使用吗？