Python Scrapy教程xpath代码多次刮表_Python_Xpath_Web Scraping_Scrapy

Python Scrapy教程xpath代码多次刮表

python xpath web-scraping scrapy

Python Scrapy教程xpath代码多次刮表,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我正在学习scrapy documentation教程，并希望从以下站点获取样本数据：在scrapy中运行view命令后，我将为我试图刮取的表获取以下html代码。该页面由每个条目的表组成： <table class="novip"> <tr class="novip"> <td class="novip-portrait-picture" rowspan="5"> <a class="novi

我正在学习scrapy documentation教程，并希望从以下站点获取样本数据：

在scrapy中运行view命令后，我将为我试图刮取的表获取以下html代码。该页面由每个条目的表组成：

  <table class="novip">
    <tr class="novip">
      <td class="novip-portrait-picture"
        rowspan="5">
        <a class="novip-portrait-picture"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html">
          <img class="novip-portrait-picture"
            src="/customer_controlled/pictures/65903/portrait/65903.png"
            alt="Pas d'image encore"
            onError="portrait_m_image_failover(this)" />
        </a>
      </td>
      <td class="novip-left">
        <a class="novip-firmen-name"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html"
          target="_top">
          Baumberger&nbsp;Hans Rudolf
        </a>
      </td>
      <td class="novip-right"
        width="25%">
        <a class="novip"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html"
          target="_top">
          rating info:&nbsp;              <img class="novip-inforating"
            src="/img/general/stars/stars3 "
            alt="rating info"
            width="70" height="14" align="bottom" border="0" />
        </a>
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left">
        Dr. med. Facharzt FMH f&uuml;r Allgemeine Innere Medizin
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left">
        Bahnhofstrasse&nbsp;92, 5000&nbsp;Aarau
      </td>
      <td class="novip-right-telefon">
        t&eacute;l:&nbsp;062 822 46 28
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left-email">
        e-mail:&nbsp;
        <a class="novip-left-send-message-button-inactive"
          href="/eintrag/fr_keine_mitteilung_moeglich.html">
          Envoyer un message
        </a>
          &nbsp;
        <a class="novip-left-make_appointment-button-inactive"
          href="/eintrag/fr_kein_termin_moeglich.html">
          prendre un rendez-vous
        </a>
      </td>
      <td class="novip-right-fax">
        fax:&nbsp;062 822 35 20
      </td>
    </tr>
  </table>

我在json中获得的输出为表中的每个名称生成一个名称字段，但使用所有表中的所有名称填充该字段，如下所示：

[{"name": ["Name1, Name2, ..... NameN"] 
[{"name": ["Name1, Name2, ..... NameN"]

等等。如何更改代码/xpath，使其只使用一个名称填充名称字段，然后移动到下一个表

通过在开头添加点，使

名称的表达式特定于上下文：
for sel in response.xpath('//tr[@class="novip"]'):
    item = DocteurItem()
    item['name'] = sel.xpath('.//a[@class="novip-firmen-name"]/text()[normalize-space()]').extract_first()
    yield item

请注意，我使用的是extract\u first（）
而不是extract（）
。
通过在开头加一个点，使名称的表达式特定于上下文：
for sel in response.xpath('//tr[@class="novip"]'):
    item = DocteurItem()
    item['name'] = sel.xpath('.//a[@class="novip-firmen-name"]/text()[normalize-space()]').extract_first()
    yield item

请注意，我使用的是extract\u first（）
而不是extract（）