Python 如何访问嵌套span标记中的数据
我试过更换每根绳子,但都没法用。我可以获取Python 如何访问嵌套span标记中的数据,python,html,beautifulsoup,request,Python,Html,Beautifulsoup,Request,我试过更换每根绳子,但都没法用。我可以获取…之间的所有数据,但如果关闭,我无法获取,我该怎么做?后来我试着替换文本,但我做不到。我对python还很陌生 我也尝试过在汤中使用x的。find_all('/span',class=“textLarge textwite”),但这不会显示任何内容 相关html: <div style="width:100%; display:inline-block; position:relative; text- align:center; border-
…
之间的所有数据,但如果关闭,我无法获取,我该怎么做?后来我试着替换文本,但我做不到。我对python还很陌生
我也尝试过在汤中使用x的。find_all('/span',class=“textLarge textwite”)
,但这不会显示任何内容
相关html:
<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
<div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
<a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
<div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
</a>
<span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
</div>
<div class="blockSheen"></div>
</div>
for y in soup.find_all('span', class_ = "textBold"):
print(y.text) #this gets FIT-FOR-SERVICE:
for x in soup.find_all('span', class_ = "textLarge textWhite"):
print(x.text) #this gets FIT-FOR-SERVICE: 18,740,382 but i only want the number
预期结果:
“18740382”
我相信您有两个选择:
1-在父span
标记上使用正则表达式仅提取数字
2-使用decompose()
函数从树中删除子span
标记,然后提取文本,如下所示:
from bs4 import BeautifulSoup
h = """<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
<div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
<a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
<div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
</a>
<span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
</div>
<div class="blockSheen"></div>
</div>"""
soup = BeautifulSoup(h, "lxml")
soup.find('span', class_ = "textLarge textWhite").span.decompose()
res = soup.find('span', class_ = "textLarge textWhite").text.strip()
print(res)
#18,740,382
从bs4导入美化组
h=”“”
适用于服务:18740382
"""
汤=美汤(高,“lxml”)
soup.find('span',class=“textLarge textwite”).span.decompose()
res=soup.find('span',class=“textLarge textwite”).text.strip()
打印(res)
#18,740,382
我相信您有两种选择:
1-在父span
标记上使用正则表达式仅提取数字
2-使用decompose()
函数从树中删除子span
标记,然后提取文本,如下所示:
from bs4 import BeautifulSoup
h = """<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
<div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
<a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
<div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
</a>
<span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
</div>
<div class="blockSheen"></div>
</div>"""
soup = BeautifulSoup(h, "lxml")
soup.find('span', class_ = "textLarge textWhite").span.decompose()
res = soup.find('span', class_ = "textLarge textWhite").text.strip()
print(res)
#18,740,382
从bs4导入美化组
h=”“”
适用于服务:18740382
"""
汤=美汤(高,“lxml”)
soup.find('span',class=“textLarge textwite”).span.decompose()
res=soup.find('span',class=“textLarge textwite”).text.strip()
打印(res)
#18,740,382
以下是您可以做到的方法:
soup.find('span', {'class':'textLarge textWhite'}).find('span').extract()
output = soup.find('span', {'class':'textLarge textWhite'}).text.strip()
输出:
18740382
以下是您如何做到这一点:
soup.find('span', {'class':'textLarge textWhite'}).find('span').extract()
output = soup.find('span', {'class':'textLarge textWhite'}).text.strip()
输出:
18740382
您可以使用
x.text
而不是使用x.find_all(text=True,recursive=False)
来获取节点的所有顶级文本(在字符串列表中),而无需进入子节点。以下是使用您的数据的示例:
汤中x的查找所有('span',class=“textlagle textwite”):
res=x.find_all(text=True,recursive=False)
#连接并剥离字符串,然后打印
打印(“.”连接(映射(str.strip,res)))
#产出:“18740382”
您可以使用x.text
查找所有(text=True,recursive=False)而不是使用x.find_all(text=True,recursive=False)
来获取节点的所有顶级文本(在字符串列表中),而无需进入子节点。以下是使用您的数据的示例:
汤中x的查找所有('span',class=“textlagle textwite”):
res=x.find_all(text=True,recursive=False)
#连接并剥离字符串,然后打印
打印(“.”连接(映射(str.strip,res)))
#产出:“18740382”