Python 2.7 Python BeautifulSoup从组div标记中的html文件p标记中提取内容。我正被打印出来_Python 2.7_Beautifulsoup

Python 2.7 Python BeautifulSoup从组div标记中的html文件p标记中提取内容。我正被打印出来

python-2.7

Python 2.7 Python BeautifulSoup从组div标记中的html文件p标记中提取内容。我正被打印出来,python-2.7,beautifulsoup,Python 2.7,Beautifulsoup,我试图从Selenium测试报告html文件中提取一些数据。我正在将空白打印到PyCharm控制台。我想从P标签获取所有数据。它在div标签下 HTML代码段是： <div class='heading'> <h1>Test Report</h1> <p class='attribute'><strong>Start Time:</strong> 2016-08-12 11:57:33</p> <p c

我试图从Selenium测试报告html文件中提取一些数据。我正在将空白打印到PyCharm控制台。我想从P标签获取所有数据。它在div标签下

HTML代码段是：

<div class='heading'>
<h1>Test Report</h1>
<p class='attribute'><strong>Start Time:</strong> 2016-08-12 11:57:33</p>
<p class='attribute'><strong>Duration:</strong> 0:48:09.007000</p>
<p class='attribute'><strong>Status:</strong> Pass 75</p>

<p class='description'>Selenium - ClearCore 501 Regression edit project automated test</p>
</div>

我补充说：

if __name__ == "__main__":
extract_data_from_report_htmltestrunner()

我现在得到的结果是：

test
None

请问我做错了什么

谢谢，Riaz

文本在强标记中，而不是*p，因此请查找该文本并致电。要获取p标记，请执行以下操作：

您可以在以上两种方法中看到差异。只需调用。强标记上的文本将为您提供“开始时间：”：

谢谢，这很有帮助。如果我想从强标记中获取文本开始时间：以及它的值，我如何才能做到这一点。我想要的输出是“开始时间：2016-08-12 11:57:33”是p.find（text=True，recursive=False）@RiazLadhani，看看最后的第二个代码段，调用p.text会给你所有的文本，包括来自子级的文本，recursive=False只来自父级。是的，只是想确认一下。谢谢你的帮助。

test
None

In [10]: html = """<div class='heading'>
   ....: <h1>Test Report</h1>
   ....: <p class='attribute'><strong>Start Time:</strong> 2016-08-12 11:57:33</p>
   ....: <p class='attribute'><strong>Duration:</strong> 0:48:09.007000</p>
   ....: <p class='attribute'><strong>Status:</strong> Pass 75</p>
   ....: 
   ....: <p class='description'>Selenium - ClearCore 501 Regression edit project automated test</p>
   ....: </div>"""

In [11]: from bs4 import BeautifulSoup

In [12]: soup = BeautifulSoup(html, "html.parser")

In [13]: div_heading = soup.find('div', {'class': 'heading'})

In [14]: p = div_heading.find('strong', text='Start Time:').parent

In [15]: print p
<p class="attribute"><strong>Start Time:</strong> 2016-08-12 11:57:33</p>

In [16]: div_heading.find("p", class_="description")
Out[16]: <p class="description">Selenium - ClearCore 501 Regression edit project automated test</p>
In [17]: div_heading.find("p", class_="description").text
Out[17]: u'Selenium - ClearCore 501 Regression edit project automated test'

In [18]: p = div_heading.find('strong', text='Start Time:').parent

In [19]: p.find(text=True, recursive=False)
Out[19]: u' 2016-08-12 11:57:33'
In [20]: p.text
Out[20]: u'Start Time: 2016-08-12 11:57:33'

In [21]:  div_heading.find('strong', text='Start Time:').text
Out[21]: u'Start Time:'