Python 如何使用BeautifulSoup获取特定数据_Python_Python 3.x_Beautifulsoup

Python 如何使用BeautifulSoup获取特定数据

python python-3.x

Python 如何使用BeautifulSoup获取特定数据,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我不确定如何从中获得具体结果： <div class="videoPlayer"> <div class="border-radius-player"> <div id="allplayers" style="position:relative;width:100%;height:100%;overflow: hidden;"> <div id="box"> <d

我不确定如何从中获得具体结果：

<div class="videoPlayer">
    <div class="border-radius-player">
        <div id="allplayers" style="position:relative;width:100%;height:100%;overflow: hidden;">
            <div id="box">
                <div id="player_content" class="todo" style="text-align: center; display: block;">
                     <div id="player" class="jwplayer jew-reset jew-skin-seven jw-state-paused jw-flag-user-inactive" tabindex="0">
                         <div class="jw-media jw-reset">
                              <video class="jw-video jw-reset" x-webkit-playsinline="" src="https:EXAMPLE-URL-HERE" preload="metadata"></video>
                         </div">

它返回部分内容，但不返回到

视频类

请求。它是一个简单的html客户端，不能执行javascripts

不过，您还有三个选择要尝试

试着浏览一下html源代码（b），看看站点中的Java脚本是否有您需要的数据。通常，页面上会有url（我想您可能想要删除它），它位于某种持有者（javascript代码或json对象）中，您可以删除它

尝试查看站点的XHR请求，看看是否有任何请求查询视频数据的外部源。在本例中，请查看是否可以模拟该请求以获取所需的数据

（万不得已）您需要使用phantomjs+selenium浏览器下载网站（，）。您可以在这篇文章中了解有关如何使用硒的更多信息：

请求只下载静态网页，无法处理javascript代码。你能在b中做一个简单的字符串搜索，以确保你需要的元素存在于html代码中吗？它不存在，它会进入

框

，但我认为BeautifulSoup能够处理这个问题。尝试

查找汤中的视频类。选择（“div.videoPlayer video.jw video.jw reset”）：打印（video\u class.attrs['src']）

import urllib.request
from bs4 import BeautifulSoup

url = "https://someurlhere"

a = urllib.request.Request(url, headers={'User-Agent' : "Cliqz"})
b = urllib.request.urlopen(a) # prevent "Permission denies"

soup = BeautifulSoup(b, 'html.parser')

for video_class in soup.select("div.videoPlayer"):
    print(video_class.text)