Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Scrapy教程xpath代码多次刮表_Python_Xpath_Web Scraping_Scrapy - Fatal编程技术网

Python Scrapy教程xpath代码多次刮表

Python Scrapy教程xpath代码多次刮表,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我正在学习scrapy documentation教程,并希望从以下站点获取样本数据: 在scrapy中运行view命令后,我将为我试图刮取的表获取以下html代码。该页面由每个条目的表组成: <table class="novip"> <tr class="novip"> <td class="novip-portrait-picture" rowspan="5"> <a class="novi

我正在学习scrapy documentation教程,并希望从以下站点获取样本数据:

在scrapy中运行view命令后,我将为我试图刮取的表获取以下html代码。该页面由每个条目的表组成:

  <table class="novip">
    <tr class="novip">
      <td class="novip-portrait-picture"
        rowspan="5">
        <a class="novip-portrait-picture"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html">
          <img class="novip-portrait-picture"
            src="/customer_controlled/pictures/65903/portrait/65903.png"
            alt="Pas d'image encore"
            onError="portrait_m_image_failover(this)" />
        </a>
      </td>
      <td class="novip-left">
        <a class="novip-firmen-name"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html"
          target="_top">
          Baumberger&nbsp;Hans Rudolf
        </a>
      </td>
      <td class="novip-right"
        width="25%">
        <a class="novip"
          href="/medecin/baumberger-hans-rudolf-aarau-5000-medecin.html"
          target="_top">
          rating info:&nbsp;              <img class="novip-inforating"
            src="/img/general/stars/stars3 "
            alt="rating info"
            width="70" height="14" align="bottom" border="0" />
        </a>
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left">
        Dr. med. Facharzt FMH f&uuml;r Allgemeine Innere Medizin
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left">
        Bahnhofstrasse&nbsp;92, 5000&nbsp;Aarau
      </td>
      <td class="novip-right-telefon">
        t&eacute;l:&nbsp;062 822 46 28
      </td>
    </tr>
    <tr class="novip">
      <td class="novip-left-email">
        e-mail:&nbsp;
        <a class="novip-left-send-message-button-inactive"
          href="/eintrag/fr_keine_mitteilung_moeglich.html">
          Envoyer un message
        </a>
          &nbsp;
        <a class="novip-left-make_appointment-button-inactive"
          href="/eintrag/fr_kein_termin_moeglich.html">
          prendre un rendez-vous
        </a>
      </td>
      <td class="novip-right-fax">
        fax:&nbsp;062 822 35 20
      </td>
    </tr>
  </table>
我在json中获得的输出为表中的每个名称生成一个名称字段,但使用所有表中的所有名称填充该字段,如下所示:

[{"name": ["Name1, Name2, ..... NameN"] 
[{"name": ["Name1, Name2, ..... NameN"]

等等。如何更改代码/xpath,使其只使用一个名称填充名称字段,然后移动到下一个表

通过在开头添加点,使
名称的表达式特定于上下文:

for sel in response.xpath('//tr[@class="novip"]'):
    item = DocteurItem()
    item['name'] = sel.xpath('.//a[@class="novip-firmen-name"]/text()[normalize-space()]').extract_first()
    yield item

请注意,我使用的是
extract\u first()
而不是
extract()

通过在开头加一个点,使
名称的表达式特定于上下文:

for sel in response.xpath('//tr[@class="novip"]'):
    item = DocteurItem()
    item['name'] = sel.xpath('.//a[@class="novip-firmen-name"]/text()[normalize-space()]').extract_first()
    yield item
请注意,我使用的是
extract\u first()
而不是
extract()