Python 从网站上的图表中拖拽数据，只有将鼠标悬停在图表上才能看到这些数据_Python_Web Scraping

Python 从网站上的图表中拖拽数据，只有将鼠标悬停在图表上才能看到这些数据

python web-scraping

Python 从网站上的图表中拖拽数据，只有将鼠标悬停在图表上才能看到这些数据,python,web-scraping,Python,Web Scraping,我在这里有一个网站：我需要通过python将图表的信息放到网上。以下是图表：当鼠标悬停在上面时，我可以看到当天的日期和流感发病率。例如：10月24日，利率为1.04。我想每周从这张图表中收集最新的日期和费率信息。但我检查了网站页面，发现只有当我将鼠标移到页面上时，日期和费率信息才会显示出来。有人能帮我获取数据吗？？我已经研究过类似问题的解决方案，并使用了多种方法来解决它，但我仍然无法得到它。感谢您提前为任何人的帮助由于数据是通过JavaScript更新的，因此这对于web抓取来说很难做到。相

我在这里有一个网站：我需要通过python将图表的信息放到网上。以下是图表：

当鼠标悬停在上面时，我可以看到当天的日期和流感发病率。例如：10月24日，利率为1.04。我想每周从这张图表中收集最新的日期和费率信息。但我检查了网站页面，发现只有当我将鼠标移到页面上时，日期和费率信息才会显示出来。有人能帮我获取数据吗？？我已经研究过类似问题的解决方案，并使用了多种方法来解决它，但我仍然无法得到它。感谢您提前为任何人的帮助

由于数据是通过JavaScript更新的，因此这对于web抓取来说很难做到。相反，直接从站点获取数据然后解析数据更容易

要了解哪个请求包含您需要的数据，您可以查看浏览器开发人员工具中的“网络”选项卡。以下是我在Firefox上看到的：

因此,，我看到您可以通过向URL发送POST请求来获取数据

https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet?p_l_id=20734&p_p_id=chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9&p_p_lifecycle=0&p_t_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=8&p_p_col_count=11&p_p_insolated=1¤tURL=%2F

。您可以使用该库从Python脚本获取此URL

该请求返回一个HTML页面，其中包含作为JavaScript对象的数据的脚本。我们需要做一些工作，使数据从Python中可读。以下是我的解决方案：

from datetime import datetime
import json

import requests


def fetch_chart_data():
    # Fetch the data from the remote server, and raise an error if there was a
    # network problem or the server returned an error status code.
    result = requests.post(
        "https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet",
        params={
            "p_l_id": "20734",
            "p_p_id": "chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9",
            "p_p_lifecycle": "0",
            "p_t_lifecycle": "0",
            "p_p_state": "normal",
            "p_p_mode": "view",
            "p_p_col_id": "column-1",
            "p_p_col_pos": "8",
            "p_p_col_count": "11",
            "p_p_isolated": "1",
            "currentURL": "/",
        }
    )
    result.raise_for_status()

    # Parse the HTML to get the JavaScript data object, and edit it to be valid
    # JSON. This way of doing it is liable to break if the structure of the
    # data or the HTML changes, but it works. Fancier ways of doing it involve
    # using HTML and JavaScript parsers.
    text = result.text
    chart_json = text[text.index("series:[{") + 7 : text.index("})})")]
    for key in ("name", "data", "visible"):
        chart_json = chart_json.replace(key, '"' + key + '"')

    # Convert the JSON data into a Python object. The dates in the raw data
    # are in Unix timestamp format (in thousandths of a second), so convert
    # them into Python datetime objects.
    chart_data = json.loads(chart_json)
    for disease_data in chart_data:
        for data_point in disease_data["data"]:
            data_point[0] = datetime.utcfromtimestamp(data_point[0] // 1000)

    return chart_data


if __name__ == "__main__":
    print(fetch_chart_data())

这很难通过web抓取实现，因为数据是通过JavaScript更新的。相反，直接从站点获取数据然后解析数据更容易

要了解哪个请求包含您需要的数据，您可以查看浏览器开发人员工具中的“网络”选项卡。以下是我在Firefox上看到的：

因此,，我看到您可以通过向URL发送POST请求来获取数据

https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet?p_l_id=20734&p_p_id=chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9&p_p_lifecycle=0&p_t_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=8&p_p_col_count=11&p_p_insolated=1¤tURL=%2F

。您可以使用该库从Python脚本获取此URL

该请求返回一个HTML页面，其中包含作为JavaScript对象的数据的脚本。我们需要做一些工作，使数据从Python中可读。以下是我的解决方案：

from datetime import datetime
import json

import requests


def fetch_chart_data():
    # Fetch the data from the remote server, and raise an error if there was a
    # network problem or the server returned an error status code.
    result = requests.post(
        "https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet",
        params={
            "p_l_id": "20734",
            "p_p_id": "chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9",
            "p_p_lifecycle": "0",
            "p_t_lifecycle": "0",
            "p_p_state": "normal",
            "p_p_mode": "view",
            "p_p_col_id": "column-1",
            "p_p_col_pos": "8",
            "p_p_col_count": "11",
            "p_p_isolated": "1",
            "currentURL": "/",
        }
    )
    result.raise_for_status()

    # Parse the HTML to get the JavaScript data object, and edit it to be valid
    # JSON. This way of doing it is liable to break if the structure of the
    # data or the HTML changes, but it works. Fancier ways of doing it involve
    # using HTML and JavaScript parsers.
    text = result.text
    chart_json = text[text.index("series:[{") + 7 : text.index("})})")]
    for key in ("name", "data", "visible"):
        chart_json = chart_json.replace(key, '"' + key + '"')

    # Convert the JSON data into a Python object. The dates in the raw data
    # are in Unix timestamp format (in thousandths of a second), so convert
    # them into Python datetime objects.
    chart_data = json.loads(chart_json)
    for disease_data in chart_data:
        for data_point in disease_data["data"]:
            data_point[0] = datetime.utcfromtimestamp(data_point[0] // 1000)

    return chart_data


if __name__ == "__main__":
    print(fetch_chart_data())

哇！这简直太神奇了！！非常感谢你，杰克！感谢stackoverflow社区！这是我第一次在stackoverflow上发帖，没想到我会得到这么快的回复@文庆武，不客气！如果这有用，请将我的答案标记为已接受的答案。：）哇！这简直太神奇了！！非常感谢你，杰克！感谢stackoverflow社区！这是我第一次在stackoverflow上发帖，没想到我会得到这么快的回复@文庆武，不客气！如果这有用，请将我的答案标记为已接受的答案。：）