Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup进行Web刮取:表不在页面源中_Python_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup进行Web刮取:表不在页面源中

Python 使用BeautifulSoup进行Web刮取:表不在页面源中,python,beautifulsoup,Python,Beautifulsoup,我正试图从以下网页上的表中提取数据: 这是我到目前为止编写的代码 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'http://ontariohockeyleague.com/stats/players/60' #open webpage, read html, close webpage uClient = uReq(my_url) page_html =

我正试图从以下网页上的表中提取数据:

这是我到目前为止编写的代码

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://ontariohockeyleague.com/stats/players/60'

#open webpage, read html, close webpage
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")
问题是,据我所知,该表实际上并不包含在html代码中。通过查看网页,该表位于此主块中,但无论出于何种原因,BeautifulSoup都没有阅读该表

page_soup.main
<main class="container">
<div class="container-content" data-feed_key="2976319eb44abe94" data-is-league="1" data-lang="en" data-league="ohl" data-league-code="" data-pagesize="100" data-season="63" id="stats"></div>
</main>
page\u soup.main
如果我查看页面源代码,它也不包含表,只包含上面的主块。我还将其他解析器用于BeautifulSoup,它返回相同的结果


如何访问该表

该表是使用Javascript呈现的,因此它不会显示在urllib加载的初始HTML中。您可以找到页面正在使用的API并从中获取数据,也可以使用无头浏览器获取完整的Javascript呈现HTML。

从网络检查器中,页面似乎是从
动态加载的http://lscluster.hockeytech.com/feed/
JSON格式。要获取任何数据,需要从主站点获取密钥。示例如下(数据存储在变量
seasures\u data
teamsbyseason\u data
statviewtype\u data
)中:

印刷品:

{
    "SiteKit": {
        "Copyright": {
            "powered_by": "Powered by HockeyTech.com",
            "powered_by_url": "http://hockeytech.com",
            "required_copyright": "Official statistics provided by Ontario Hockey League",
            "required_link": "http://leaguestat.com"
        },
        "Parameters": {
            "client_code": "ohl",
            "feed": "modulekit",
            "first": "0",
            "fmt": "json",
            "key": "2976319eb44abe94",
            "lang": "en",
            "lang_id": 1,
            "league_code": "",
            "league_id": "1",
            "limit": "100",
            "order_direction": "",
            "season_id": 60,
            "sort": "active",
            "stat": "all",
            "team_id": 0,
            "type": "topscorers",
            "view": "statviewtype"
        },

... and so on...

谢谢你的回复。我现在的思路是正确的。这个解决方案非常好,因为它可以访问所有玩家数据,而不仅仅是表中包含的数据。
{
    "SiteKit": {
        "Copyright": {
            "powered_by": "Powered by HockeyTech.com",
            "powered_by_url": "http://hockeytech.com",
            "required_copyright": "Official statistics provided by Ontario Hockey League",
            "required_link": "http://leaguestat.com"
        },
        "Parameters": {
            "client_code": "ohl",
            "feed": "modulekit",
            "first": "0",
            "fmt": "json",
            "key": "2976319eb44abe94",
            "lang": "en",
            "lang_id": 1,
            "league_code": "",
            "league_id": "1",
            "limit": "100",
            "order_direction": "",
            "season_id": 60,
            "sort": "active",
            "stat": "all",
            "team_id": 0,
            "type": "topscorers",
            "view": "statviewtype"
        },

... and so on...