Xpath 循环在同一页上刮取多个元素,同时将它们分开存储

Xpath 循环在同一页上刮取多个元素,同时将它们分开存储,xpath,scrapy,Xpath,Scrapy,我希望在使用Scrapy <!-- body_text //--> <td width="601" valign="top"> <table border="0" width="100%" cellspacing="0" cellpadding="0"> <tr> <td><img src="images/pixel_trans.gif" border="0" alt

我希望在使用
Scrapy

<!-- body_text //-->

    <td width="601" valign="top">

      <table border="0" width="100%" cellspacing="0" cellpadding="0">

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

       <tr>

         <td class="pageHeading">Pool (Pocket Billiards) Table</td>

        </tr>

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td class="main">A Victoria table is more than mere wood and slate. By paying attention to the details - the hidden differences - Victoria tables have become known name as masterpieces of original design and craftmanship, and most prestigious name in billiards.<br><br>



          These tables, available in two sizes  9’ X 4.5’ and 8’ X 4’, are made of frames with selected good quality solid wood and finely crafted rose wood legs with Mahagony polish.<br><br>

Slate Beds used are either Indian Bangalore Black Slate or Imported Slate. Slates are covered with worsted wool cloth optionally from Jupiter (China) or Strachan (West of England cloth, U.K.) to have proper speed, accuracy and responsiveness of the table to spin. Chrome nuts and adjusters  are used for leveling. It is surrounded with standard imported vulcanized 'L' shaped or 'V' shaped rubber cushions or Northern Cushions (Made in England) to cause billiard balls to rebound while minimizing the lose of kinetic energy.</td>

        </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20bbig.jpg')"><img src="images/products/vs-20b.jpg" alt="VS-20B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20cbig.jpg')"><img src="images/products/vs-20c.jpg" alt="VS-20C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23bbig.jpg')"><img src="images/products/vs-23b.jpg" alt="VS-23B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23cbig.jpg')"><img src="images/products/vs-23c.jpg" alt="VS-23C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs9"></a>VS-9</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Auto Ball Return System</li><li>Pro Speed Cloth</li><li>American Pocket Size</li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-9big.jpg')"><img src="images/products/vs-9.jpg" alt="VS-9" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs7"></a>VS-7</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98"L X 54" W X 31" H</strong></li><li>Solid oak for top/brand rails, Dark cherry finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket.  Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-7big.jpg')"><img src="images/products/vs-7.jpg" alt="VS-7" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs8"></a>VS-8/Light Oak</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, Light oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-8big.jpg')"><img src="images/products/vs-8.jpg" alt="VS-8/Light Oak" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs12"></a>VS-12</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 99-3/4"L X 55 - 3/4" W X 31" H</strong></li><li>Black laminate, pedestal legs, with drop pocket, Steel frame Easy assembly. Accessories included.</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-12big.jpg')"><img src="images/products/vs-12.jpg" alt="VS-12" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs10"></a>VS-10</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" L X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-10big.jpg')"><img src="images/products/vs-10.jpg" alt="VS-10" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs11"></a>VS-11</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails</li><li>Mahogany finish</li><li>Rams head solid rubber with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-11big.jpg')"><img src="images/products/vs-11.jpg" alt="VS-11" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>



            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs13"></a>VS-13</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails,</li><li>Dark cherry finish</li><li>Rams head solid rubber wood<br />
<br />
with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-13big.jpg')"><img src="images/products/vs-13.jpg" alt="VS-13" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>


            <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td>

            <table cellpadding="4" cellspacing="0" width="100%" border="0">

              <tr>

                <td width="50%" valign="top" class="product_name1" colspan="2"><strong>Standard Accessories for Pool</strong></td>

              </tr>

            </table>

            <table cellpadding="4" cellspacing="4" width="100%" border="0" class="product_box1">

              <tr>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Aramith Pool Ball 2.1/4" or 2.1/16"</li>

                  <li>Table Brush</li>

                  <li>60" Rest Stick C/W Brass Cross Head Rest</li>

                  <li>Wall Cue Rack</li>

                </ul></td>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Plastic Triangle</li>

                  <li>Triangle Chalk X 12 Pcs.</li>

                  <li>Pool House Cue X 4 Pcs.</li>

                  <li>Table Cover</li>

                  <li>Round Type Lamp Shade X 2 Pcs.</li>

                </ul></td>

              </tr>

            </table>

          </td>                 

        </tr>

    </table></td>

<!-- body_text_eof //-->

     <td width="45" valign="top">

      <table border="0" width="45" cellspacing="0" cellpadding="0">

<!-- right_navigation //-->

您不需要一个接一个地选择元素(通过像您那样更改循环中的i索引)。路径表达式如下所示:

//td[@class='product_name']/strong/a/@name
已返回包含两个项的节点集。您只需循环返回的元素以提取每个属性字符串

至于第二句话:

//td[@align='center']/a/img/@src
只有一个匹配项,您可以直接提取字符串

names = hxs.xpath('//td[@class="product_name"]/strong/text()')
imageurls = hxs.xpath('//tr/td[@align="center"]/a/img/@src')
for name, url in zip(names, imageurls):
    item["productname"] = name
    item["imgurl"] = url
    yield item

最简单的方法,因为名称和图像URL的顺序在提取时会相互对应。

嘿,整个代码每个都有大约50个匹配项。有什么方法可以运行一个循环同时提取这两个名称吗?嘿,我尝试使用您的代码,但是它只提取第一个出现的产品名称并存储两次。我运行它的页面上有大约50个图片和产品名称。我有没有办法把多个元素作为一次迭代来处理?我的答案基于你写的代码片段。如果你可以编辑它并粘贴一个完整的片段,我可以修复它。或者,您可以在
中循环查找名称、zip中的url(hxs.xpath(//td[@class='product_name']/strong/text()”),hxs.xpath(//td[@align='center']/a/img/@src”):
它仍然只返回第一个值。没什么了。@user3283647,使用yield而不是return。最后一个查询是,如何提取xpath//tr/td[@class=“product_text”]/ul/li/strong下的所有文本?在这里,您可以看到,“产品文本”下的所有内容都需要在每次迭代中提取。
names = hxs.xpath('//td[@class="product_name"]/strong/text()')
imageurls = hxs.xpath('//tr/td[@align="center"]/a/img/@src')
for name, url in zip(names, imageurls):
    item["productname"] = name
    item["imgurl"] = url
    yield item