Python 使用beautifulsoup，如何在html页面中引用表行_Python_Beautifulsoup

Python 使用beautifulsoup，如何在html页面中引用表行

python

Python 使用beautifulsoup，如何在html页面中引用表行,python,beautifulsoup,Python,Beautifulsoup,我有一个html页面，看起来像： <html> .. <form post="/products.hmlt" ..> .. <table ...> <tr>...</tr> <tr> <td>part info</td> .. </tr> </table> ..

我有一个html页面，看起来像：

    <html>

    ..

    <form post="/products.hmlt" ..>
    ..

    <table ...>
    <tr>...</tr>
    <tr>
       <td>part info</td>
    ..
    </tr>

    </table>

    ..


</form>

..

</html>

但我得到一个错误，说：

ResultSet对象没有属性“findAll”

我猜对findAll的调用不会返回'beautifulsoup'对象？那我该怎么办？

更新

此页面上有许多表，但上面显示的标记中只有一个表。

findAll

返回一个列表，因此首先提取元素：

form = soup.findAll('form')[0]
table = form.findAll('table')[0]  # table inside form

当然，在索引到列表之前，您应该进行一些错误检查（即确保它不是空的）。

findAll

返回一个列表，因此首先提取元素：

form = soup.findAll('form')[0]
table = form.findAll('table')[0]  # table inside form

当然，在索引到列表之前，您应该进行一些错误检查（即确保它不是空的）。

我喜欢ars的答案，当然也同意需要进行错误检查
特别是如果这将用于任何类型的生产代码中

这里可能有一种更详细/明确的方式来查找您所查找的数据：

from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
    <form><table><tr><td>some text we care about</td></tr>
    <tr><td>more text we care about</td></tr>
    </table></form></html></body>'''    
soup = bs(html)

for tr in soup.form.findAll('tr'):
    print tr.text
# output:
# some text we care about
# more text we care about

从BeautifulSoup导入BeautifulSoup作为bs
html=''一些文本
一些我们关心的文本
更多我们关心的文本
'''    
soup=bs（html）
对于汤中的tr.form.findAll（'tr'）：
打印文本
#输出：
#一些我们关心的文本
#更多我们关心的文本

以下是清理后的HTML供参考：

>>> print soup.prettify()
<html>
 <body>
  <table>
   <tr>
    <td>
     some text
    </td>
   </tr>
  </table>
  <form>
   <table>
    <tr>
     <td>
      some text we care about
     </td>
    </tr>
    <tr>
     <td>
      more text we care about
     </td>
    </tr>
   </table>
  </form>
 </body>
</html>

>打印汤。美化（）
一些文本
一些我们关心的文本
更多我们关心的文本

我喜欢ars的答案，当然也同意错误检查的必要性
特别是如果这将用于任何类型的生产代码中

这里可能有一种更详细/明确的方式来查找您所查找的数据：

from BeautifulSoup import BeautifulSoup as bs
html = '''<html><body><table><tr><td>some text</td></tr></table>
    <form><table><tr><td>some text we care about</td></tr>
    <tr><td>more text we care about</td></tr>
    </table></form></html></body>'''    
soup = bs(html)

for tr in soup.form.findAll('tr'):
    print tr.text
# output:
# some text we care about
# more text we care about

从BeautifulSoup导入BeautifulSoup作为bs
html=''一些文本
一些我们关心的文本
更多我们关心的文本
'''    
soup=bs（html）
对于汤中的tr.form.findAll（'tr'）：
打印文本
#输出：
#一些我们关心的文本
#更多我们关心的文本

以下是清理后的HTML供参考：

>>> print soup.prettify()
<html>
 <body>
  <table>
   <tr>
    <td>
     some text
    </td>
   </tr>
  </table>
  <form>
   <table>
    <tr>
     <td>
      some text we care about
     </td>
    </tr>
    <tr>
     <td>
      more text we care about
     </td>
    </tr>
   </table>
  </form>
 </body>
</html>

>打印汤。美化（）
一些文本
一些我们关心的文本
更多我们关心的文本