Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup不';不能正确地从h1返回_Python_Python 2.7_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup不';不能正确地从h1返回

Python BeautifulSoup不';不能正确地从h1返回,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我的代码 从美化组导入美化组 htmls='' 名称: 亚历克斯 ... 更多文本 ''' 汤=美汤(htmls) h1=soup.find(“h1”,{“class”:“student”}) 打印h1 预期结果 from BeautifulSoup import BeautifulSoup htmls = ''' <div class="main-content"> <h1 class="student"> <p>Name: <br /&

我的代码

从美化组导入美化组
htmls=''
名称:
亚历克斯

... 更多文本 ''' 汤=美汤(htmls) h1=soup.find(“h1”,{“class”:“student”}) 打印h1
预期结果

from BeautifulSoup import BeautifulSoup

htmls = '''
<div class="main-content">
<h1 class="student">
    <p>Name: <br />
    Alex</p>
    <p>&nbsp;</p>
</h1>
</div>
<div class="department">
... more text
</div>
'''
soup = BeautifulSoup(htmls)
h1 = soup.find("h1", {"class": "student"})
print h1

名称:
亚历克斯

但是,不幸的是,他回来了

<h1 class="student">
    <p>Name: <br />
    Alex</p>
    <p>&nbsp;</p>
</h1>


我的问题是,为什么它会吃掉p标签之间的所有东西?它是否执行渲染内容()?或者是解析失败?

这是因为您在
h1
标记中使用了
p
标记。例如,如果您这样做:

<h1 class="student">
</h1>
你可以看到孩子们


这是HTML
p
标记的行为方式。这就是问题所在。(阅读此处的更多内容)

尝试将不同的解析器传递到您的BeautifulGroup中:

>>> htmls = '''
... <div class="main-content">
... <h1 class="student">
...     <span>Name: <br />
...     Alex</span>
...     <span>&nbsp;</span>
... </h1>
... </div>
... <div class="department">
... ... more text
... </div>
... '''
>>> 
>>> htmls.contents
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'contents'
>>> soup = BeautifulSoup(htmls)
>>> h1 = soup.find("h1", {"class": "student"})
>>> 
>>> h1
<h1 class="student">
<span>Name: <br />
    Alex</span>
<span>&nbsp;</span>
</h1>
pip安装html5lib
>>>htmls=''
... 
... 
...     名称:
... 亚历克斯 ... ... ... ... ... ... 更多文本 ... ... ''' >>>soup=BeautifulSoup(htmls,‘html5lib’) >>>h1=汤。查找('h1','student') >>>打印h1 名称:
亚历克斯

我想这是你想要的。否则,您不应该将块元素放在符合性要求的内部

请参阅:插入解析器的步骤

>>> htmls = '''
... <div class="main-content">
... <h1 class="student">
...     <span>Name: <br />
...     Alex</span>
...     <span>&nbsp;</span>
... </h1>
... </div>
... <div class="department">
... ... more text
... </div>
... '''
>>> 
>>> htmls.contents
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'contents'
>>> soup = BeautifulSoup(htmls)
>>> h1 = soup.find("h1", {"class": "student"})
>>> 
>>> h1
<h1 class="student">
<span>Name: <br />
    Alex</span>
<span>&nbsp;</span>
</h1>
pip install html5lib

>>> htmls = '''
... <div class="main-content">
... <h1 class="student">
...     <span>Name: <br />
...     Alex</span>
...     <span>&nbsp;</span>
... </h1>
... </div>
... <div class="department">
... ... more text
... </div>
... '''

>>> soup = BeautifulSoup(htmls, 'html5lib')
>>> h1 = soup.find('h1', 'student')
>>> print h1
<h1 class="student">
    <p>Name: <br/>
    Alex</p>
    <p> </p>
</h1>