Python 如何访问嵌套span标记中的数据_Python_Html_Beautifulsoup_Request

Python 如何访问嵌套span标记中的数据

python html

Python 如何访问嵌套span标记中的数据,python,html,beautifulsoup,request,Python,Html,Beautifulsoup,Request,我试过更换每根绳子，但都没法用。我可以获取…之间的所有数据，但如果关闭，我无法获取，我该怎么做？后来我试着替换文本，但我做不到。我对python还很陌生我也尝试过在汤中使用x的。find_all（'/span'，class=“textLarge textwite”），但这不会显示任何内容相关html: <div style="width:100%; display:inline-block; position:relative; text- align:center; border-

我试过更换每根绳子，但都没法用。我可以获取

…

之间的所有数据，但如果关闭，我无法获取，我该怎么做？后来我试着替换文本，但我做不到。我对python还很陌生

我也尝试过在汤中使用x的

。find_all（'/span'，class=“textLarge textwite”）

，但这不会显示任何内容

相关html:

<div style="width:100%; display:inline-block; position:relative; text- 
align:center; border-top:thin solid #fff; background-image:linear- 
gradient(#333,#000);">
    <div style="width:100%; max-width:1400px; display:inline-block; 
position:relative; text-align:left; padding:20px 15px 20px 15px;">
        <a href="/manpower-fit-for-military-service.asp" title="Manpower 
Fit for Military Service ranked by country">
            <div class="smGraphContainer"><img class="noBorder" 
src="/imgs/graph.gif" alt="Small graph icon"></div>
        </a>
        <span class="textLarge textWhite"><span 
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
    </div>
    <div class="blockSheen"></div>
</div>

for y in soup.find_all('span', class_ = "textBold"):
    print(y.text) #this gets FIT-FOR-SERVICE:
for x in soup.find_all('span', class_ = "textLarge textWhite"):
    print(x.text) #this gets FIT-FOR-SERVICE: 18,740,382 but i only want the number

预期结果：

“18740382”

我相信您有两个选择：

1-在父

span

标记上使用正则表达式仅提取数字

2-使用

decompose（）

函数从树中删除子

span

标记，然后提取文本，如下所示：

from bs4 import BeautifulSoup

h = """<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
    <div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
        <a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
            <div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
        </a>
        <span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
    </div>
    <div class="blockSheen"></div>
</div>"""

soup = BeautifulSoup(h, "lxml")
soup.find('span', class_ = "textLarge textWhite").span.decompose()
res = soup.find('span', class_ = "textLarge textWhite").text.strip()

print(res)
#18,740,382

从bs4导入美化组
h=”“”
适用于服务：18740382
"""
汤=美汤（高，“lxml”）
soup.find（'span'，class=“textLarge textwite”）.span.decompose（）
res=soup.find（'span'，class=“textLarge textwite”）.text.strip（）
打印（res）
#18,740,382

我相信您有两种选择：

1-在父

span

标记上使用正则表达式仅提取数字

2-使用

decompose（）

函数从树中删除子

span

标记，然后提取文本，如下所示：

from bs4 import BeautifulSoup

h = """<div style="width:100%; display:inline-block; position:relative; text-
align:center; border-top:thin solid #fff; background-image:linear-
gradient(#333,#000);">
    <div style="width:100%; max-width:1400px; display:inline-block;
position:relative; text-align:left; padding:20px 15px 20px 15px;">
        <a href="/manpower-fit-for-military-service.asp" title="Manpower
Fit for Military Service ranked by country">
            <div class="smGraphContainer"><img class="noBorder"
src="/imgs/graph.gif" alt="Small graph icon"></div>
        </a>
        <span class="textLarge textWhite"><span
class="textBold">FIT-FOR-SERVICE:</span> 18,740,382</span>
    </div>
    <div class="blockSheen"></div>
</div>"""

soup = BeautifulSoup(h, "lxml")
soup.find('span', class_ = "textLarge textWhite").span.decompose()
res = soup.find('span', class_ = "textLarge textWhite").text.strip()

print(res)
#18,740,382

从bs4导入美化组
h=”“”
适用于服务：18740382
"""
汤=美汤（高，“lxml”）
soup.find（'span'，class=“textLarge textwite”）.span.decompose（）
res=soup.find（'span'，class=“textLarge textwite”）.text.strip（）
打印（res）
#18,740,382

以下是您可以做到的方法：

soup.find('span', {'class':'textLarge textWhite'}).find('span').extract()
output = soup.find('span', {'class':'textLarge textWhite'}).text.strip()

输出：

18740382

以下是您如何做到这一点：

soup.find('span', {'class':'textLarge textWhite'}).find('span').extract()
output = soup.find('span', {'class':'textLarge textWhite'}).text.strip()

输出：

18740382

您可以使用

x.text

而不是使用

x.find_all（text=True，recursive=False）

来获取节点的所有顶级文本（在字符串列表中），而无需进入子节点。以下是使用您的数据的示例：

汤中x的

查找所有（'span'，class=“textlagle textwite”）：
res=x.find_all（text=True，recursive=False）
#连接并剥离字符串，然后打印
打印（“.”连接（映射（str.strip，res）））
#产出：“18740382”

您可以使用

x.text

查找所有（text=True，recursive=False）而不是使用

x.find_all（text=True，recursive=False）

来获取节点的所有顶级文本（在字符串列表中），而无需进入子节点。以下是使用您的数据的示例：

汤中x的

查找所有（'span'，class=“textlagle textwite”）：
res=x.find_all（text=True，recursive=False）
#连接并剥离字符串，然后打印
打印（“.”连接（映射（str.strip，res）））
#产出：“18740382”