python web使用请求抓取post表单数据无效
我试图使用requests.session将输入数据发布到表单中,它返回500状态。 我希望看到检索到的搜索结果 多亏了Bertrand Martel的帮助,我才能够通过uu RequestVerificationToken和cookies解决以前的登录问题。在我的过程中的下一步是获取搜索页面,我能够成功地获取该页面。现在,当我尝试将数据发布到表单上的日期字段(构成搜索条件)时失败。当我手动填写表单并按submit键时工作。对我来说,这一切似乎都很简单,但我不确定为什么它不起作用。还是饼干问题吗?任何帮助都将不胜感激 这是我的密码:python web使用请求抓取post表单数据无效,python,post,web-scraping,python-requests,Python,Post,Web Scraping,Python Requests,我试图使用requests.session将输入数据发布到表单中,它返回500状态。 我希望看到检索到的搜索结果 多亏了Bertrand Martel的帮助,我才能够通过uu RequestVerificationToken和cookies解决以前的登录问题。在我的过程中的下一步是获取搜索页面,我能够成功地获取该页面。现在,当我尝试将数据发布到表单上的日期字段(构成搜索条件)时失败。当我手动填写表单并按submit键时工作。对我来说,这一切似乎都很简单,但我不确定为什么它不起作用。还是饼干问题吗
import requests
from bs4 import BeautifulSoup
EMAIL = 'myemail@gmail.com'
PASSWORD = 'somepwd'
LOGIN_URL = 'https://www.idocmarket.com/Security/LogOn'
SEARCH_URL = 'https://www.idocmarket.com/RIOCO/Document/Search'
s = requests.Session()
s.get(LOGIN_URL)
result = s.post(LOGIN_URL, data = {
"Login.Username": EMAIL,
"Login.Password": PASSWORD
})
soup = BeautifulSoup(result.text, "html.parser")
# Report successful login
print("Login succeeded: ", result.ok)
print("Status code:", result.status_code)
result = s.get(SEARCH_URL)
auth_token = soup.find("input", {'name': '__RequestVerificationToken'}).get('value')
print('auth token:', auth_token )
print("Get Search succeaeded: ", result.ok)
print("get Search Statusa code:", result.status_code)
result = s.post(SEARCH_URL, data = {
"__RequestVerificationToken": auth_token,
"StartRecordDate": "03/01/2019",
"EndRecordDate": "03/31/2019",
"StartDocNumber": "",
"EndDocNumber": "",
"Book": "",
"Page": "",
"Instrument": "",
"InstrumentGroup": "",
"PartyType": "Either",
"PartyMatchType": "Contains",
"PartyName": "",
"Subdivision": "",
"StartLot": "",
"EndLot": "",
"Block": "",
"Section":"",
"Township": "",
"Range": "",
"Legal": "",
"CountyKey": "RIOCO"
})
print("post Dates succeeded: ", result.ok)
print("post Dates Status code:", result.status_code)
print(result.text)
这一次,帖子中似乎需要xsrf令牌以及所有现有参数。一个简单的解决方案是获取所有输入值并将其传递给请求:
import requests
from bs4 import BeautifulSoup
LOGIN_URL = 'https://www.idocmarket.com/Security/LogOn'
SEARCH_URL = 'https://www.idocmarket.com/RIOCO/Document/Search'
EMAIL = 'myemail@gmail.com'
PASSWORD = 'somepwd'
s = requests.Session()
s.get(LOGIN_URL)
r = s.post(LOGIN_URL, data = {
"Login.Username": EMAIL,
"Login.Password": PASSWORD
})
if (r.status_code == 200):
r = s.get(SEARCH_URL)
soup = BeautifulSoup(r.text, "html.parser")
payload = {}
for input_item in soup.select("input"):
if input_item.has_attr('name'):
payload[input_item["name"]] = input_item["value"]
payload["StartRecordDate"] = '09/01/2019'
payload["EndRecordDate"] = '09/30/2019'
r = s.post(SEARCH_URL, data = payload)
soup = BeautifulSoup(r.text, "html.parser")
print(soup)
else:
print("authentication failure")
还可以使用有效负载的理解列表编写:
temp_pl = [
(t['name'], t['value'])
for t in soup.select("input")
if t.has_attr('name')
]
payload = dict(temp_pl)
payload["StartRecordDate"] = '09/01/2019'
payload["EndRecordDate"] = '09/30/2019'
谢谢你,伯特兰。我刚刚得出了同样的结论,并更新了我的代码。尽管如此,关于提供表单输入字段的代码解决方案要优雅得多。再次感谢你的帮助。