Python 拾取（下一个兄弟姐妹）文本_Python_Beautifulsoup

Python 拾取（下一个兄弟姐妹）文本

python

Python 拾取（下一个兄弟姐妹）文本,python,beautifulsoup,Python,Beautifulsoup,（环境：Python 2.7+BeautifulSoup 4.3.2）目的：提取代码中的文本“2009年1月23日下午12:05” 由于该网页位于公司网站中，需要登录和重定向，因此我将目标网页的源代码复制到一个文件中，并将其保存为C:\中的“example.html”，以便于练习这是原始代码的一部分： <tr class="ghj"> <td> <span class="city-sh"> <sh src="./citys/1

（环境：Python 2.7+BeautifulSoup 4.3.2）

目的：提取代码中的文本“2009年1月23日下午12:05”

由于该网页位于公司网站中，需要登录和重定向，因此我将目标网页的源代码复制到一个文件中，并将其保存为C:\中的“example.html”，以便于练习

这是原始代码的一部分：

<tr class="ghj">
  <td>
    <span class="city-sh">
      <sh src="./citys/1.jpg" alt="boy" title="boy" />
    </span>
    <a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a>
  </td>
  <td class="position">
      <a href="./search.php?id=12563&amp;sr=positions"
        title="Search positions">452</a>
  </td>
  <td class="details">
      <div>South</div>
  </td>
  <td>May 09, 1997</td>
  <td>Jan 23, 2009 12:05 pm&nbsp;</td>
</tr>

我不知道如何直接拿起它，所以请兄弟姐妹。然而，当我运行它时，它会给出如下错误消息，似乎它无法识别兄弟姐妹

Traceback (most recent call last):
File "C:/Python27/Last Activity mydyingbride.py", line 17, in <module>
sis = cities.find_next_siblings('td')
AttributeError: 'ResultSet' object has no attribute 'find_next_siblings'

回溯（最近一次呼叫最后一次）：
文件“C:/Python27/Last Activity mydyingbride.py”，第17行，在
sis=城市。查找下一个兄弟姐妹（“td”）
AttributeError:'ResultSet'对象没有“查找下一个兄弟姐妹”属性

如何使用本地文件进行练习？

我建议您使用Python调试器查看变量的当前值。无论如何，这里有一个解决方案：

soup = BeautifulSoup(page.read())
cities = soup.find_all('td', {'class' : 'details'}) 
counter = 0
while len(cities) > counter:
    sis = cities[counter].find_next_siblings('td')

    for s in sis:
        print s

    counter += 1

输出为：

<td>May 09, 1997</td>
<td>Jan 23, 2009 12:05 pm┬á</td>

Jan 23, 2009 12:05 pm 
Jan 24, 2009 12:05 pm 
Jan 25, 2009 12:05 pm

再次感谢你，钱丹。你能不能再举一个Python调试器的例子？我看到了这个术语，并尝试了几次，但都没有成功……使用Eclipse作为IDE。在Eclipse中安装PyDev插件。更多的细节在这里：顺便说一句，如果您不打算使用Eclipse或Pydev，那么在最基本的形式中，您可以使用Pdb。检查此文档：。还有，如果答案解决了你的问题，别忘了投上一票：汉克斯·钱丹。。。我尝试了你的解决方案，效果不错，但有两个问题：1。它只在网页源代码的第一段（我的示例只是其中的一部分）中选择了想要的内容，有许多类似的段落。2.它选择两个时间文本，但我只想要“2009年1月23日12:05 pm”。你介意再帮我一次吗？谢谢，谢谢钱丹。我正在消化它。对于“print datesColumn[1].string”行，它会给出错误消息“RuntimeError:已超出最大递归深度”。当删除“.string”时，它会工作，但结果会与和一起提供。将其更改为“打印日期列[1].RenderContent（）”时。它也有效。这就是我的Python版本的问题吗？

from bs4 import BeautifulSoup
html_doc = '''
<tr class="ghj">
    <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>
    <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>
    <td class="details"><div>South</div></td>
    <td>May 09, 1997</td>
    <td>Jan 23, 2009 12:05 pm&nbsp;</td>
</tr>
<tr class="ghj">
    <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>
    <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>
    <td class="details"><div>South</div></td>
    <td>May 09, 1997</td>
    <td>Jan 24, 2009 12:05 pm&nbsp;</td>
</tr>
<tr class="ghj">
    <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>
    <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>
    <td class="details"><div>South</div></td>
    <td>May 09, 1997</td>
    <td>Jan 25, 2009 12:05 pm&nbsp;</td>
</tr>
'''
soup = BeautifulSoup(html_doc)
cities = soup.find_all('td', {'class' : 'details'}) 
counter = 0
while len(cities) > counter:
    datesColumn = cities[counter].find_next_siblings('td')
            # Assuming you are interested in second column of date
    if len(datesColumn) == 2:
        print datesColumn[1].string

    counter += 1

Jan 23, 2009 12:05 pm 
Jan 24, 2009 12:05 pm 
Jan 25, 2009 12:05 pm