在Python中登录后爬行_Python_Python 3.x_Beautifulsoup_Web Crawler

在Python中登录后爬行

python python-3.x web-crawler

在Python中登录后爬行,python,python-3.x,beautifulsoup,web-crawler,Python,Python 3.x,Beautifulsoup,Web Crawler,我正在学习用Python爬行我的目标是下载该文件我现在正在学习登录，这很难例如，我需要登录以从该站点下载文件我查阅了各种资料但我想要的网站似乎有点不同我能够抓取大多数不需要登录的网站但是，我不能抓取需要登录的站点所以我真的很想学习这一部分我的目标是登录，然后在html中查看爬行代码下面是我的代码。这样做对吗 from requests import session # ex) ID = abcd / PW = 1234 payload = { 'ctl00$Cont

我正在学习用Python爬行

我的目标是下载该文件

我现在正在学习登录，这很难

例如，我需要登录以从该站点下载文件

我查阅了各种资料

但我想要的网站似乎有点不同

我能够抓取大多数不需要登录的网站

但是，我不能抓取需要登录的站点

所以我真的很想学习这一部分

我的目标是登录，然后在html中查看爬行代码

下面是我的代码。这样做对吗

from requests import session

# ex) ID = abcd  / PW = 1234

payload = {
'ctl00$ContentPlaceHolder1$tbxLoginID' : 'abcd',
'ctl00$ContentPlaceHolder1$tbxLoginPW' : '1234'
}

with session() as c:
    c.post('http://www.kif.re.kr/kif2/login/login.aspx', data=payload)
    response = c.get('What should I write here?')
    # response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

您遗漏了一些登录数据表单，下面是有效负载的外观

payload = { 
    '__LASTFOCUS': '',#empty
    '__VIEWSTATE': 'get this value from the login page source',
    '__VIEWSTATEGENERATOR': 'get this value from the login page source',
    '__EVENTTARGET': '',#empty
    '__EVENTARGUMENT': '',#empty
    '__EVENTVALIDATION': 'get this value from the login page source',
    'ctl00$agentPlatform': '1',
    'ctl00$menu_nav1$tbxSearchWord': '',#empty
    'ctl00$ContentPlaceHolder1$radiobutton':    '0',
    'ctl00$ContentPlaceHolder1$tbxLoginID': 'abcd',
    'ctl00$ContentPlaceHolder1$tbxLoginPW': '1234',
    'ctl00$ContentPlaceHolder1$ibtnLogin.x': '36', #i think this is the mouse cursor position
    #when clicked on login, not sure if its necessary
    'ctl00$ContentPlaceHolder1$ibtnLogin.y': '25'
}

response=c.get（'我应该在这里写什么？'）

写下受保护页面的url！如果您可以成功获取它，则您已登录