Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/91.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup-不带类获取h2文本_Python_Html_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup-不带类获取h2文本

Python BeautifulSoup-不带类获取h2文本,python,html,beautifulsoup,Python,Html,Beautifulsoup,我的代码: <div id="title"> <h2> My title <span class="subtitle">My Subtitle</span></h2></div> 它与一切都匹配。我想将我的标题和副标题匹配为两个不同的对象: print title >> My title print subtitle >> My subtitle 有什么帮助吗?您可以单独获得字幕: title

我的代码:

<div id="title">
<h2>
My title <span class="subtitle">My Subtitle</span></h2></div>
它与一切都匹配。我想将我的标题和副标题匹配为两个不同的对象:

print title 
>> My title
print subtitle
>> My subtitle

有什么帮助吗?

您可以单独获得字幕:

title = soup.find('div', id="title").h2
subtitle = title.find(class_="subtitle")
print(subtitle.previous_sibling.strip(), subtitle.get_text())
或者,您可以在非递归模式下定位文本节点:

title = soup.find('div', id="title").h2
print(title.find(text=True, recursive=False).strip(), 
      title.find(class_="subtitle").get_text(strip=True))
两种印刷品:

(u'My title', u'My Subtitle')

不使用class属性的一种方法是:

h2 = soup.find('div', id="title").h2
subtitle = h2.span.text
title = str(h2.contents[0])

h2.contents[0]
将在此处返回一个
navigablesting
对象。它的打印行为与它的字符串版本相同。如果只使用print语句,则无需调用
str()

查看此示例以了解

from bs4 import BeautifulSoup

#html source
html_source = '''
<div class="test">
     <h2>paragraph1</h2>
</div>
'''

soup = BeautifulSoup(html_source, 'html.parser')
#find h2 tag
print(soup.h2.string)
从bs4导入美化组
#html源
html_源=“”
第1段
'''
soup=BeautifulSoup(html_源代码'html.parser')
#查找h2标签
打印(soup.h2.string)
输出 第1段

来源 另一种解决方案

from simplified_scrapy import SimplifiedDoc
html = '''
<div id="title">
<h2>
My title <span class="subtitle">My Subtitle</span></h2></div>
'''
doc = SimplifiedDoc(html)
h2 = doc.select('div#title').h2
print ('title:',h2.firstText())
print ('subtitle:',h2.span.text)

添加了不使用“类”的答案。也许有帮助。
from simplified_scrapy import SimplifiedDoc
html = '''
<div id="title">
<h2>
My title <span class="subtitle">My Subtitle</span></h2></div>
'''
doc = SimplifiedDoc(html)
h2 = doc.select('div#title').h2
print ('title:',h2.firstText())
print ('subtitle:',h2.span.text)
title: My title
subtitle: My Subtitle