Post 使用python请求发布-如何获得请求的正确表数据?

Post 使用python请求发布-如何获得请求的正确表数据?,post,web-scraping,html-table,beautifulsoup,python-requests,Post,Web Scraping,Html Table,Beautifulsoup,Python Requests,我试图从这个网站上获取历史经济日历数据——从以下日期(2020年2月1日至2020年2月5日) 今天是2020年2月4日 如果我使用下面的url,我可以使用beautifulsoup提取表,但我无法选择除当前日期以外的任何日期。我在python脚本中为(2020年2月4日)保存了一个表,即今天 import requests import pandas as pd from bs4 import BeautifulSoup payload = {"country[]":["25","32","

我试图从这个网站上获取历史经济日历数据——从以下日期(2020年2月1日至2020年2月5日)

今天是2020年2月4日

如果我使用下面的url,我可以使用beautifulsoup提取表,但我无法选择除当前日期以外的任何日期。我在python脚本中为(2020年2月4日)保存了一个表,即今天

import requests
import pandas as pd
from bs4 import BeautifulSoup

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

urlheader = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

url = "https://www.investing.com/economic-calendar/"

req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")

table变量如下所示

我可以看到,每当我更改日期范围或过滤器设置时,它都会向“”发送post请求

这是我找到的请求数据

这是帖子链接

所以我使用下面的代码,因为我想选择日期

import requests
import pandas as pd
from bs4 import BeautifulSoup

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

urlheader = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"

req = requests.post(url, data=payload, headers=urlheader)
print(req)
soup = BeautifulSoup(req.content, "lxml")
table = soup.find('table', id="economicCalendarData")

但这一次,没有economicCalendarData,因此表变量为空。 soup变量中有数据,但其中没有表数据

这就是我要保存的桌子

如前所述,如果我使用url作为,我只能获取当天(2020年2月4日)的表格数据;无论我在有效负载中输入什么日期(dateFrom、dateTo)


出于某种原因,当我尝试post to时,表变为空,即使soup变量包含数据,但它不是我请求的数据。我做错了什么?如何在我选择的日期保存表格?

你真的很接近了。如果我了解您的要求,那么以下几点应该可以帮助您达到目的:

import requests
from bs4 import BeautifulSoup

url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

req = requests.post(url, data=payload, headers={
    "User-Agent":"Mozilla/5.0",
    "X-Requested-With": "XMLHttpRequest"
    })
soup = BeautifulSoup(req.json()['data'],"lxml")
for items in soup.select("tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

你真的很接近。如果我了解您的要求,那么以下几点应该可以帮助您达到目的:

import requests
from bs4 import BeautifulSoup

url = "https://www.investing.com/economic-calendar/Service/getCalendarFilteredData"

payload = {"country[]":["25","32","6","37","72","22","17","39","14","10","35","43","56","36","110","11","26","12","4","5"],
                "dateFrom":"2020-02-01",
                "dateTo":"2020-02-05",
                "timeZone":"8",
                "timeFilter":"timeRemain",
                "currentTab":"custom",
                "limit_from":"0"}

req = requests.post(url, data=payload, headers={
    "User-Agent":"Mozilla/5.0",
    "X-Requested-With": "XMLHttpRequest"
    })
soup = BeautifulSoup(req.json()['data'],"lxml")
for items in soup.select("tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

嗨,如果你不介意的话,你能解释一下json['data']部分吗?请求会给你json内容。但是,您感兴趣的表格数据位于其数据键内,您需要使用BeautifulSoup.hi处理该数据键。如果您不介意的话,请解释一下json['data']部分,请求会为您提供json内容。但是,您感兴趣的表格数据位于需要使用BeautifulSoup处理的数据键内。您应该添加(并查看)由浏览器发送的请求标头的完整列表。我只看到了上面提到的有效负载,我在哪里可以找到它们?我想你的
中的滚动条这里是POST链接
屏幕截图是为了获得更多的请求标题,但它可能是为了响应标题,所以你应该添加(并查看)你的浏览器发送的请求标题的完整列表。我只看到了上面提到的有效负载,我在哪里可以找到它们?我想你的
中的滚动条这里是POST链接
屏幕截图是为了更多的请求标题,但它可能是为了响应标题,对不起