Python Scrapy：遍历文档_Python_Xpath_Web Scraping_Scrapy_Screen Scraping

Python Scrapy：遍历文档

python xpath web-scraping scrapy

Python Scrapy：遍历文档,python,xpath,web-scraping,scrapy,screen-scraping,Python,Xpath,Web Scraping,Scrapy,Screen Scraping,这是我正在处理的文档的一部分的模型。我要做的是首先找到时间和成本元素，然后从中找到它们各自的价值。我试过各种轴选择器，但都没有。我不想直接讨论时间和成本因素，我需要找到它们与相关H4的关系 <ul class="events"> <li id="event-123456" class=eventItem> <div class="details"> <h4>Time</h4>

这是我正在处理的文档的一部分的模型。我要做的是首先找到时间和成本元素，然后从中找到它们各自的价值。我试过各种轴选择器，但都没有。我不想直接讨论时间和成本因素，我需要找到它们与相关H4的关系

<ul class="events">
  <li id="event-123456" class=eventItem>
    <div class="details">                
      <h4>Time</h4>
      <div>
        <p>17:00</p>
      </div>
      <h4>Cost</h4>
      <div>
      <p>10.00</p>
      </div>
    </div>
  </li>
  <li id="event-678901" class=eventItem>
    <div class="details">                
      <h4>Time</h4>
      <div>
        <p>21:00</p>
      </div>
      <h4>Cost</h4>
      <div>
      <p>20.00</p>
      </div>
    </div>
  </li>
</ul>

这将有助于：

events = response.xpath('//ul[@class = "events"]/li')
for event in events:
    item = MyItem()

    item['cost'] = event.xpath(".//h4[. = 'Cost']/following-sibling::div/p/text()").extract_first()
    item['time'] = event.xpath(".//h4[. = 'Time']/following-sibling::div/p/text()").extract_first()

    yield item

这并不完全正确

event.xpath（“.//h4[.='Time']/following sibling:：div/p/text（）”

将返回一个带有两个选择器的

选择器列表，而不是一个。除非您与.extract_first（）
（这可能就是您的意思）结合使用，否则您需要使用//h4[.='Time']/following sibling:：div[1]/p/text（）
events = response.xpath('//ul[@class = "events"]/li')
for event in events:
    item = MyItem()

    item['cost'] = event.xpath(".//h4[. = 'Cost']/following-sibling::div/p/text()").extract_first()
    item['time'] = event.xpath(".//h4[. = 'Time']/following-sibling::div/p/text()").extract_first()

    yield item