Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将lxml转换为scrapy xxs选择器_Python_Xml_Screen Scraping_Scrapy_Lxml - Fatal编程技术网

Python 将lxml转换为scrapy xxs选择器

Python 将lxml转换为scrapy xxs选择器,python,xml,screen-scraping,scrapy,lxml,Python,Xml,Screen Scraping,Scrapy,Lxml,如何将这个纯python lxml转换为scrapy内置的xxs选择器?这一个工作,但我想把它转换为刮xxs选择器 def parse_device_list(self, response): self.log("\n\n\n List of devices \n\n\n") self.log('Hi, this is the parse_device_list page! %s' % response.url) root = lxml.etree.fromstr

如何将这个纯python lxml转换为scrapy内置的xxs选择器?这一个工作,但我想把它转换为刮xxs选择器

    def parse_device_list(self, response):
    self.log("\n\n\n List of devices \n\n\n")
    self.log('Hi, this is the parse_device_list page! %s' % response.url)
    root = lxml.etree.fromstring(response.body)
    for row in root.xpath('//row'):
        allcells = row.xpath('./cell')
        # first cell contain the link to follow
        detail_page_link = allcells[0].get("href")
        yield Request(urlparse.urljoin(response.url, detail_page_link ), callback=self.parse_page)
试一试:

def parse_page(self, response):
    xxs = XmlXPathSelector(response)
    for row in xxs.select('//row'):
        detail_page_link = row.select('.//cell[1]/@href')[0].extract()
        yield Request(urlparse.urljoin(response.url, detail_page_link), callback=self.parse_page)

这似乎是可行的,但我如何让它按顺序迭代呢?出于某种原因,它迭代了列A,但顺序不正确。当列A中的一行为空时,它将获取列B链接。我可以让我的方法只获取A列,如果A列为null,则跳过它并转到A列的下一行。如果A列为空,则不获取B列。