Python 美丽的汤-查找子标签属性内容

Python 美丽的汤-查找子标签属性内容,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,源代码: <div class="wrapper"> <div id="mask" style="display: none;"></div> <div id="video"> <span id="pid" hidden="">2</span> <div poster="https://thumbs.vodgc.net/57377706F7D28069F41A23A14D

源代码:

<div class="wrapper">
    <div id="mask" style="display: none;"></div>
    <div id="video">
        <span id="pid" hidden="">2</span>
        <div poster="https://thumbs.vodgc.net/57377706F7D28069F41A23A14DC5CC64.jpg?673333" autoplay="true" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
            preload="none" class="video-js vjs-default-skin vjs-controls-enabled vjs-workinghover vjs-has-started media_player-dimensions vjs-paused vjs-user-inactive"
            id="media_player" role="region" aria-label="video player">
            <video id="media_player_html5_api" class="vjs-tech" preload="none" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
                autoplay="" src="blob:https://api.vodgc.net/5bb5a7a7-6c9b-49f1-883b-784871f95d8b">
                <source src="https://vod.vodgc.net/manifest/57377706F7D28069F41A23A14DC5CC64.m3u8" type="application/x-mpegURL">
            </video>
            <div>

通过将
source
标记传递给
find_all
方法,访问
src
属性:

from bs4 import BeautifulSoup as soup

s = """
<div class="wrapper">
<div id="mask" style="display: none;"></div>
<div id="video">
    <span id="pid" hidden="">2</span>
    <div poster="https://thumbs.vodgc.net/57377706F7D28069F41A23A14DC5CC64.jpg?673333" autoplay="true" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
        preload="none" class="video-js vjs-default-skin vjs-controls-enabled vjs-workinghover vjs-has-started media_player-dimensions vjs-paused vjs-user-inactive"
        id="media_player" role="region" aria-label="video player">
        <video id="media_player_html5_api" class="vjs-tech" preload="none" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
            autoplay="" src="blob:https://api.vodgc.net/5bb5a7a7-6c9b-49f1-883b-784871f95d8b">
            <source src="https://vod.vodgc.net/manifest/57377706F7D28069F41A23A14DC5CC64.m3u8" type="application/x-mpegURL">
        </video>
        <div>
"""
d = soup(s, 'lxml')
print([i['src'] for i in d.find_all('source')])

为什么不直接使用
请求
?抱歉,有点离题了。我建议用dict而不是direct
id
查找它。它可能会喜欢这个问题。我想,OP希望直接从web上获得同样的结果。我知道这是离题的,但将BeautifulSoup作为
soup
导入是一个好做法吗?@KeyurPotdar它只是在当前名称空间中为类
BeautifulSoup
创建一个别名。它完全是可选的,但是,我认为在创建
BeautifulSoup
对象时,它会更短、更容易。
from bs4 import BeautifulSoup as soup

s = """
<div class="wrapper">
<div id="mask" style="display: none;"></div>
<div id="video">
    <span id="pid" hidden="">2</span>
    <div poster="https://thumbs.vodgc.net/57377706F7D28069F41A23A14DC5CC64.jpg?673333" autoplay="true" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
        preload="none" class="video-js vjs-default-skin vjs-controls-enabled vjs-workinghover vjs-has-started media_player-dimensions vjs-paused vjs-user-inactive"
        id="media_player" role="region" aria-label="video player">
        <video id="media_player_html5_api" class="vjs-tech" preload="none" data-setup="{ &quot;techOrder&quot;: [&quot;html5&quot;]}"
            autoplay="" src="blob:https://api.vodgc.net/5bb5a7a7-6c9b-49f1-883b-784871f95d8b">
            <source src="https://vod.vodgc.net/manifest/57377706F7D28069F41A23A14DC5CC64.m3u8" type="application/x-mpegURL">
        </video>
        <div>
"""
d = soup(s, 'lxml')
print([i['src'] for i in d.find_all('source')])
['https://vod.vodgc.net/manifest/57377706F7D28069F41A23A14DC5CC64.m3u8']