Python 身份验证web抓取问题

Python 身份验证web抓取问题,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我不知道为什么在我做了所有这些之后我不能发布我的会话 我试着查看是否遗漏了表单中的任何信息,比如隐藏的令牌,但看起来他们甚至没有表单 有人能给我指个方向吗?事先非常感谢 import requests from bs4 import BeautifulSoup username = myUserName password = myPassword scrape_url = 'https://ags.aspengrove.net/Property/PropertySummary.aspx?Pr

我不知道为什么在我做了所有这些之后我不能发布我的会话

我试着查看是否遗漏了表单中的任何信息,比如隐藏的令牌,但看起来他们甚至没有表单

有人能给我指个方向吗?事先非常感谢

import requests
from bs4 import BeautifulSoup

username = myUserName
password = myPassword

scrape_url = 'https://ags.aspengrove.net/Property/PropertySummary.aspx?PropertyID=1366919'


login_url = 'https://ags.aspengrove.net/Library/Security/Login.aspx?ReturnUrl=%2fIndex.aspx'
login_info = {'ctl01$MainContent$tbxPerson': username,'ctl01$MainContent$tbxPassword': password}

#Start session.
session = requests.session()

#Login
r = session.post(url=login_url, data=login_info)

#Request page you want to scrape.
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

url = session.get(url=scrape_url,headers=headers)

soup = BeautifulSoup(url.content, 'html.parser')

print(r.status_code)


for td in soup.findAll('td'):
  print('\n\n\n')
  print('text: ' + str(td.text))

session.close()

开始查看这样一个页面的最好方法是简单地发出POST请求,看看它能做什么。在Chrome中,POST数据显示为以下字段:

ctl01_ctlScriptManager_HiddenField: 
__EVENTTARGET: ctl01$MainContent$btnLogin
__EVENTARGUMENT: 
__VIEWSTATE__: RU7PS8MwGKVZQ91AcitebHPYYQOVzR842M2hFx2MKl5L2nztwtJEk9S5v17TMvHyvsd773u8n4CcBTjJ85VWzmhpM/hshYGNtu6BlbtnOOR5HOC0dHI2H6+ZUF0SlBuX210GDTQFmDUQPLqc3y9mi+ubu1sSh8noSRjrXnQtFMYVkxaS0wwqMEaoesNq4DGqiAc06DH0GPb8BAWkN2OURO/CikJCzb0VoxR5Ev2RYf9yDHcdAel+wjf4dvji0a809KBbQ6FhQlLGuQFrKVOcfjBr99pwWoDU+yvOjyuC/550AF7GvTAk3UkirUopyh0+N+Bao+ikcOqVfUG+6uSJ2wo7nS75Lw==
__VIEWSTATE: 
__EVENTVALIDATION: Z+yHsUlIzPcsXdpj1bBqQkEDPqzzZPfBKwo/SI3nW5r4vyVU240IulzvcQOvQ5FLpkCLPwPUhdDRs0dGzhW3VQyWQjAjktxQ6FbmHS6dY0bEhbG6hkPAIxF3rEfHyQpnmuCflUGUC0HWxtr8LNx1oiUzOSrdrMhLuCLvWi01mvoc7vnsES6K97wbg1AUfun/Z2062CHFXbUcQYyr1KBLwVs13Y6FWr+e3Ruyb5EaftqQOSbtSRg8ZP1zE1aj05qY4tWBlG7hCIfl00xq6n6Zv0q6p9WrbkPdUv6/Gw==
ctl01$TimeOffset: 
ctl01$MainContent$hidPassExpression: /^.*(?=.*\d)(?=.*[a-z])(?=.*[@#$%^*!_=?:|,()-]).*$/
ctl01$MainContent$hidPassLength: 8
ctl01$MainContent$hidPassCode: 
ctl01$MainContent$tbxPerson: abcdef@efghij.com
ctl01$MainContent$tbxPassword: a@@@@@@@@1
这是一个ASP.net页面,所以有很多东西需要获取。正确的做法是查看整个登录页面并匹配元素。快速(但肮脏)找出字段外观的方法是让bs4抓取所有输入标记

import bs4
import requests

headers = {"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}

r = requests.get("https://ags.aspengrove.net/Library/Security/Login.aspx?ReturnUrl=%2fProperty%2fPropertySummary.aspx%3fPropertyID%3d1366919&PropertyID=1366919", headers=headers)
soup = bs4.BeautifulSoup(r.text)
itags = soup.find_all(name="input")
for tag in itags:
    print(tag)
结果看起来像

<input id="ctl01_ctlScriptManager_HiddenField" name="ctl01_ctlScriptManager_HiddenField" type="hidden" value=""/>
<input id="__EVENTTARGET" name="__EVENTTARGET" type="hidden" value=""/>
<input id="__EVENTARGUMENT" name="__EVENTARGUMENT" type="hidden" value=""/>
<input id="__VIEWSTATE__" name="__VIEWSTATE__" type="hidden" value="RU7PS8MwGKVZQ91AcitebHPYYQOVzR842M2hFx2MKl5L2nztwtJEk9S5v17TMvHyvsd773u8n4CcBTjJ85VWzmhpM/hshYGNtu6BlbtnOOR5HOC0dHI2H6+ZUF0SlBuX210GDTQFmDUQPLqc3y9mi+ubu1sSh8noSRjrXnQtFMYVkxaS0wwqMEaoesNq4DGqiAc06DH0GPb8BAWkN2OURO/CikJCzb0VoxR5Ev2RYf9yDHcdAel+wjf4dvji0a809KBbQ6FhQlLGuQFrKVOcfjBr99pwWoDU+yvOjyuC/550AF7GvTAk3UkirUopyh0+N+Bao+ikcOqVfUG+6uSJ2wo7nS75Lw=="/>
<input id="__VIEWSTATE" name="__VIEWSTATE" type="hidden" value=""/>
<input id="__EVENTVALIDATION" name="__EVENTVALIDATION" type="hidden" value="Nw7wmof2VXeD0/HsHnbqEV3JYs/jUm1FUFYbO2NxwJVUOXSdi+ulpjvZ501wLkSCJVkUlTOMNkaCw9d+fr74I9lkObY9N2zwbqbcEcac6af8hP5vblYExcMszLJNqOrAuNPqRUjsV91y5/PPekrgOuvM1O1ep5kvpzMfljrCLngSTNYbU9iEruOYL29RwQPz4+521uAjowigFf7fCEYTaqfuJZrML5WYNKW7eu7KxyxeEXpjG1K+Ufxxs7X1PTU3XoYw+qkUYp1RexvoCgdFlCkbZstCiOpU8PI5TA=="/>
<input id="ctl01_TimeOffset" name="ctl01$TimeOffset" type="hidden"/>
<input id="ctl01_MainContent_hidPassExpression" name="ctl01$MainContent$hidPassExpression" type="hidden" value="/^.*(?=.*\d)(?=.*[a-z])(?=.*[@#$%^*!_=?:|,()-]).*$/"/>
<input id="ctl01_MainContent_hidPassLength" name="ctl01$MainContent$hidPassLength" type="hidden" value="8"/>
<input id="ctl01_MainContent_hidPassCode" name="ctl01$MainContent$hidPassCode" type="hidden"/>
<input class="TextBox" id="ctl01_MainContent_tbxPerson" name="ctl01$MainContent$tbxPerson" size="50" type="text"/>
<input class="TextBox" id="ctl01_MainContent_tbxPassword" name="ctl01$MainContent$tbxPassword" size="30" type="password"/>
<input class="button" id="ctl01_MainContent_btnLogin" name="ctl01$MainContent$btnLogin" onclick="this.disabled=true; this.value = 'Logging In';__doPostBack('ctl01$MainContent$btnLogin','')" type="button" value="Login"/>
<input id="ctl01_MainContent_chkRememberMe" name="ctl01$MainContent$chkRememberMe" type="checkbox"/>
您唯一没有现成的值是电子邮件/密码和_EVENTTARGET,这只是提交按钮输入的名称


从那里,您应该能够提交正确的登录POST数据。

为什么您没有表单?谢谢。我要试试看!然后回复你,我做到了。非常感谢你。每天学点新东西小伙子,你明白了!
for tag in itags:
    print(tag["name"])