Python 从网站上的图表中拖拽数据,只有将鼠标悬停在图表上才能看到这些数据
我在这里有一个网站:我需要通过python将图表的信息放到网上。以下是图表:Python 从网站上的图表中拖拽数据,只有将鼠标悬停在图表上才能看到这些数据,python,web-scraping,Python,Web Scraping,我在这里有一个网站:我需要通过python将图表的信息放到网上。以下是图表: 当鼠标悬停在上面时,我可以看到当天的日期和流感发病率。例如:10月24日,利率为1.04。我想每周从这张图表中收集最新的日期和费率信息。但我检查了网站页面,发现只有当我将鼠标移到页面上时,日期和费率信息才会显示出来。有人能帮我获取数据吗??我已经研究过类似问题的解决方案,并使用了多种方法来解决它,但我仍然无法得到它。感谢您提前为任何人的帮助 由于数据是通过JavaScript更新的,因此这对于web抓取来说很难做到。相
当鼠标悬停在上面时,我可以看到当天的日期和流感发病率。例如:10月24日,利率为1.04。我想每周从这张图表中收集最新的日期和费率信息。但我检查了网站页面,发现只有当我将鼠标移到页面上时,日期和费率信息才会显示出来。有人能帮我获取数据吗??我已经研究过类似问题的解决方案,并使用了多种方法来解决它,但我仍然无法得到它。感谢您提前为任何人的帮助 由于数据是通过JavaScript更新的,因此这对于web抓取来说很难做到。相反,直接从站点获取数据然后解析数据更容易 要了解哪个请求包含您需要的数据,您可以查看浏览器开发人员工具中的“网络”选项卡。以下是我在Firefox上看到的: 因此,,我看到您可以通过向URL发送POST请求来获取数据
https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet?p_l_id=20734&p_p_id=chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9&p_p_lifecycle=0&p_t_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=8&p_p_col_count=11&p_p_insolated=1¤tURL=%2F
。您可以使用该库从Python脚本获取此URL
该请求返回一个HTML页面,其中包含作为JavaScript对象的数据的脚本。我们需要做一些工作,使数据从Python中可读。以下是我的解决方案:
from datetime import datetime
import json
import requests
def fetch_chart_data():
# Fetch the data from the remote server, and raise an error if there was a
# network problem or the server returned an error status code.
result = requests.post(
"https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet",
params={
"p_l_id": "20734",
"p_p_id": "chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9",
"p_p_lifecycle": "0",
"p_t_lifecycle": "0",
"p_p_state": "normal",
"p_p_mode": "view",
"p_p_col_id": "column-1",
"p_p_col_pos": "8",
"p_p_col_count": "11",
"p_p_isolated": "1",
"currentURL": "/",
}
)
result.raise_for_status()
# Parse the HTML to get the JavaScript data object, and edit it to be valid
# JSON. This way of doing it is liable to break if the structure of the
# data or the HTML changes, but it works. Fancier ways of doing it involve
# using HTML and JavaScript parsers.
text = result.text
chart_json = text[text.index("series:[{") + 7 : text.index("})})")]
for key in ("name", "data", "visible"):
chart_json = chart_json.replace(key, '"' + key + '"')
# Convert the JSON data into a Python object. The dates in the raw data
# are in Unix timestamp format (in thousandths of a second), so convert
# them into Python datetime objects.
chart_data = json.loads(chart_json)
for disease_data in chart_data:
for data_point in disease_data["data"]:
data_point[0] = datetime.utcfromtimestamp(data_point[0] // 1000)
return chart_data
if __name__ == "__main__":
print(fetch_chart_data())
这很难通过web抓取实现,因为数据是通过JavaScript更新的。相反,直接从站点获取数据然后解析数据更容易 要了解哪个请求包含您需要的数据,您可以查看浏览器开发人员工具中的“网络”选项卡。以下是我在Firefox上看到的: 因此,,我看到您可以通过向URL发送POST请求来获取数据
https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet?p_l_id=20734&p_p_id=chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9&p_p_lifecycle=0&p_t_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_pos=8&p_p_col_count=11&p_p_insolated=1¤tURL=%2F
。您可以使用该库从Python脚本获取此URL
该请求返回一个HTML页面,其中包含作为JavaScript对象的数据的脚本。我们需要做一些工作,使数据从Python中可读。以下是我的解决方案:
from datetime import datetime
import json
import requests
def fetch_chart_data():
# Fetch the data from the remote server, and raise an error if there was a
# network problem or the server returned an error status code.
result = requests.post(
"https://aspren.dmac.adelaide.edu.au/c/portal/render_portlet",
params={
"p_l_id": "20734",
"p_p_id": "chartportlet_WAR_chartportlet_INSTANCE_j3c9CC2nTCP9",
"p_p_lifecycle": "0",
"p_t_lifecycle": "0",
"p_p_state": "normal",
"p_p_mode": "view",
"p_p_col_id": "column-1",
"p_p_col_pos": "8",
"p_p_col_count": "11",
"p_p_isolated": "1",
"currentURL": "/",
}
)
result.raise_for_status()
# Parse the HTML to get the JavaScript data object, and edit it to be valid
# JSON. This way of doing it is liable to break if the structure of the
# data or the HTML changes, but it works. Fancier ways of doing it involve
# using HTML and JavaScript parsers.
text = result.text
chart_json = text[text.index("series:[{") + 7 : text.index("})})")]
for key in ("name", "data", "visible"):
chart_json = chart_json.replace(key, '"' + key + '"')
# Convert the JSON data into a Python object. The dates in the raw data
# are in Unix timestamp format (in thousandths of a second), so convert
# them into Python datetime objects.
chart_data = json.loads(chart_json)
for disease_data in chart_data:
for data_point in disease_data["data"]:
data_point[0] = datetime.utcfromtimestamp(data_point[0] // 1000)
return chart_data
if __name__ == "__main__":
print(fetch_chart_data())
哇!这简直太神奇了!!非常感谢你,杰克!感谢stackoverflow社区!这是我第一次在stackoverflow上发帖,没想到我会得到这么快的回复@文庆武,不客气!如果这有用,请将我的答案标记为已接受的答案。:)哇!这简直太神奇了!!非常感谢你,杰克!感谢stackoverflow社区!这是我第一次在stackoverflow上发帖,没想到我会得到这么快的回复@文庆武,不客气!如果这有用,请将我的答案标记为已接受的答案。:)