Python 如何使用Scrapy解析源代码的两个不同部分并合并结果?

Python 如何使用Scrapy解析源代码的两个不同部分并合并结果?,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我有两个蜘蛛,我目前正在运行刮一个单一的页面。三脚架如标题和详图所示。我是这样设置的,因为我不知道如何设置查询的开头(在本例中,变量名为listings),以允许我在一个步骤中首先刮取//div[@class='patio-head'],然后刮取//div[@class='patio-details']。有人能帮我吗?因为我想返回每个URL的名称以及一行中的所有相应详细信息?谢谢 标题 from scrapy.spider import BaseSpider from scrapy.select

我有两个蜘蛛,我目前正在运行刮一个单一的页面。三脚架如标题和详图所示。我是这样设置的,因为我不知道如何设置查询的开头(在本例中,变量名为
listings
),以允许我在一个步骤中首先刮取
//div[@class='patio-head']
,然后刮取
//div[@class='patio-details']
。有人能帮我吗?因为我想返回每个URL的
名称
以及一行中的所有相应详细信息?谢谢

标题

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from PatioDetail.items import PatioItem

class MySpider(BaseSpider):
    name = "PDSHeader"
    allowed_domains = ["http://patios.blogto.com/"]
    start_urls = ["http://patios.blogto.com/patio/25-liberty-toronto/", "http://patios.blogto.com/patio/3030-dundas-west-toronto/", 
"http://patios.blogto.com/patio/3-speed/", "http://patios.blogto.com//patio/7numbers/"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    listings = hxs.select("//div[@class='patio-head']")
items = []
    for listings in listings:
        item = PatioItem()
        item ["Name"] = listings.select("div[@class='patio-head-details']/div[@class='patio-name']/h2[@class='name']/text()").extract()
        items.append(item)
    return items 
细节

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from PatioDetail.items import PatioItem

class MySpider(BaseSpider):
    name = "PDSDetails"
    allowed_domains = ["http://patios.blogto.com/"]
    start_urls = ["http://patios.blogto.com/patio/25-liberty-toronto/", "http://patios.blogto.com/patio/3030-dundas-west-toronto/", 
"http://patios.blogto.com/patio/3-speed/", "http://patios.blogto.com//patio/7numbers/"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    listings = hxs.select("//div[@class='patio-details']")
items = []
    for listings in listings:
        item = PatioItem()
        item ["Type"] = listings.select("ul[@class='detail-lister']/li[@class='type-icon']/div[@class='detail-line']/span[@class='detail-desc']/text()").extract()
        item ["Covered"] = listings.select("ul[@class='detail-lister']/li[@class='covered-icon']/div[@class='detail-line']/span[@class='detail-desc']/text()").extract()
        item ["Heated"] = listings.select("ul[@class='detail-lister']/li[@class='heated-icon']/div[@class='detail-line']/span[@class='detail-desc']/text()").extract()
        item ["Capacity"] = listings.select("ul[@class='detail-lister']/li[@class='capacity-icon last']/div[@class='detail-line']/span[@class='detail-desc']/text()").extract()
        items.append(item)
    return items

您想要的两个部分在同一页上。您需要做的唯一一件事是获取页面并对其进行解析,以便从两个部分获取数据,而不是获取两次并解析两次。
在编写spider之前,应该花一些时间分析要获取的网页的结构

代码示例如下所示:

def parse(self, response):
    hxs = HtmlXPathSelector(response)

    item = PatioItem()
    item['Name'] = hxs.select("//div[@class='patio-name']/h2/text()").extract()[0]
    node_type = hxs.select("//ul[@class='detail-lister']/li[@class='type-icon']")
    item['Type'] = node_type.select(".//span[@class='detail-desc']/text()").extract()[0]
    node_covered = hxs.select("//ul[@class='detail-lister']/li[@class='covered-icon']")
    item['Covered'] = node_covered.select(".//span[@class='detail-desc']/text()").extract()[0]
    node_heated = hxs.select("//ul[@class='detail-lister']/li[@class='heated-icon']")
    item['Heated'] = node_heated.select(".//span[@class='detail-desc']/text()").extract()[0]
    node_capacity = hxs.select("//ul[@class='detail-lister']/li[@class='capacity-icon last']")
    item['Capacity'] = node_capacity.select(".//span[@class='detail-desc']/text()").extract()[0]

    return [item,]

这里有一个关于的教程。这将帮你一个忙:)

我得到了代码的第一部分,但是你可以解释一下在将数据传递给项目之前我是如何使用临时dict来存储数据的?我对Python和Scrapy都是新手,所以我还在学习基础知识。感谢代码示例和Xpath教程的链接。实际上,我以前读过该教程,但我很难从教程中的简单示例过渡到更复杂的实际问题。当我尝试插入代码时,收到以下错误:
文件“PatioDetail\spiders\Details.py”,第17行,在解析项['Type']=node\u Type.xpath(“.//span[@class='detail-desc']/text()”)中。extract()[0]exceptions.AttributeError:'XPathSelectorList'对象没有属性“xpath”
。知道我做错了什么吗?@JillAtkins,很抱歉我在scrapy和lxml中使用了选择器。我已经纠正了我的错误。非常感谢。工作完美。非常感谢。