Python 3.x 使用Python3从网页抓取数据_Python 3.x_Web Scraping_Data Analysis

Python 3.x 使用Python3从网页抓取数据

python-3.x web-scraping

Python 3.x 使用Python3从网页抓取数据,python-3.x,web-scraping,data-analysis,Python 3.x,Web Scraping,Data Analysis,我正在执行与我刚学到的相同的web抓取模式，但是，我无法使用下面的脚本来抓取。我一直收到一张空的报税单，我知道标签在那里。我想找到所有的“mubox”，然后提取O/U值和守门员信息。这太奇怪了，我错过了什么 from bs4 import BeautifulSoup import requests import pandas as pd page_link = 'https://www.thespread.com/nhl-scores-matchups' page_response = re

我正在执行与我刚学到的相同的web抓取模式，但是，我无法使用下面的脚本来抓取。我一直收到一张空的报税单，我知道标签在那里。我想找到所有的“mubox”，然后提取O/U值和守门员信息。这太奇怪了，我错过了什么

from bs4 import BeautifulSoup
import requests
import pandas as pd

page_link = 'https://www.thespread.com/nhl-scores-matchups'

page_response = requests.get(page_link, timeout=10)

# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")

# Take out the <div> of name and get its value
tables = page_content.find_all("div", class_="mubox")

print (tables)

# Iterate through rows
rows = []

从bs4导入美化组
导入请求
作为pd进口熊猫
页面链接https://www.thespread.com/nhl-scores-matchups'
page\u response=requests.get（page\u链接，超时=10）
#这里，我们使用请求库从url获取内容
page\u content=BeautifulSoup（page\u response.content，“html.parser”）
#取出of name并获取其值
tables=页面内容。查找所有内容（“div”，class=“mubox”）
打印（表格）
#遍历行
行=[]

此站点在呈现数据之前使用内部API。这个api是一个xml文件，您可以得到它包含所有匹配信息。您可以使用beautiful soup解析它：

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.thespread.com/matchups/NHL/matchup-list_20181030.xml'
page_response = requests.get(page_link, timeout=10)
body = BeautifulSoup(page_response.content, "lxml")

data = [
    (
        t.find("road").text, 
        t.find("roadgoalie").text, 
        t.find("home").text,
        t.find("homegoalie").text,
        float(t.find("ot").text),
        float(t.find("otmoney").text),
        float(t.find("ft").text),
        float(t.find("ftmoney").text)
    )
    for t in body.find_all('event')
]

print(data)

这是一项伟大的技术。我有几个相关的问题。您是否使用开发人员工具来查找api？如何在标签上下移动？例如，我希望从开始，然后向下移动到.text。