使用python抓取html文本_Html_Css_Python 3.x_Beautifulsoup

使用python抓取html文本

html css python-3.x

使用python抓取html文本,html,css,python-3.x,beautifulsoup,Html,Css,Python 3.x,Beautifulsoup,我怎样才能从下面的html中得到Rodger Federer这个词 <div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Ten

我怎样才能从下面的html中得到Rodger Federer这个词

<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>

我越来越 #1

用于获取

旁边的文本：

从bs4导入美化组
html=”“”
#1 
费德勒
网球运动员
"""
soup=BeautifulSoup（html，'html.parser'）
name=soup.find（class='profile-heading\uu-rank'）。下一个兄弟
打印（姓名）#-->罗杰·费德勒

另一种方法是在查找

h1

后使用

.find（text=True，recursive=False）

：

from bs4 import BeautifulSoup

html = '<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('h1').find(text=True, recursive=False))

从bs4导入美化组
html=“#1罗杰·费德勒网球运动员”
soup=BeautifulSoup（html，'html.parser'）
打印（soup.find（'h1'）.find（text=True，recursive=False））

输出：

rogerfederer

如果您使用的代码是Python，那么值得添加该代码（以及相应的版本）作为标记。

from bs4 import BeautifulSoup

html = '<div class="profile-heading--desktop"><h1><span class="profile-heading__rank">#1 </span>Roger Federer</h1><div class="profile-subheading">Athlete, Tennis</div></div>'
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('h1').find(text=True, recursive=False))