Python 3.x 如何使用Python请求模块登录web?
我一直在读关于并尝试几种不同的方法 但是,在web身份验证方面存在一个问题Python 3.x 如何使用Python请求模块登录web?,python-3.x,authentication,python-requests,Python 3.x,Authentication,Python Requests,我一直在读关于并尝试几种不同的方法 但是,在web身份验证方面存在一个问题 Testing site: http://testing-ground.scraping.pro/login Username: admin Password: 12345 下面是示例代码 >>> import requests, re >>> url = 'http://testing-ground.scraping.pro/login' >>> username
Testing site: http://testing-ground.scraping.pro/login
Username: admin
Password: 12345
下面是示例代码
>>> import requests, re
>>> url = 'http://testing-ground.scraping.pro/login'
>>> username = 'admin'
>>> password = '12345'
>>> requests.get(url)
<Response [200]>
导入请求,重新
>>>url='1〕http://testing-ground.scraping.pro/login'
>>>用户名='admin'
>>>密码='12345'
>>>requests.get(url)
未经认证
>>> print(requests.get(url).text)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>
<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>>
>>> print(requests.get(url, auth=(username, password)).text)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>
<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>>
打印(requests.get(url.text)
刮网机试验场
var _gaq=_gaq | |[];
_gaq.push([''设置帐户','UA-4436411-8']);
_gaq.push([''u setDomainName','extract web data.com']);
_gaq.push([''u trackPageview']);
(功能(){
var ga=document.createElement('script');ga.type='text/javascript';ga.async=true;
ga.src=('https:'==document.location.protocol?'https://ssl' : 'http://www“)+”.google analytics.com/ga.js';
var s=document.getElementsByTagName('script')[0];s.parentNode.insertBefore(ga,s);
})();
登录
通常,为了获得所需信息,您需要登录网站。今天的大多数网站使用所谓的基于表单的身份验证,这意味着使用POST方法发送用户凭据,在服务器上进行身份验证,并将用户会话存储在cookie中
这项简单的测试表明,铲运机能够:
请登录: 用户名: 密码:
>>> 通过身份验证
>>> print(requests.get(url).text)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>
<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>>
>>> print(requests.get(url, auth=(username, password)).text)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<title>Web Scraper Testing Ground</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="/css/normalize.css">
<link rel="stylesheet" href="/css/main.css">
<script src="/js/vendor/modernizr-2.6.1.min.js"></script>
<script src="/js/vendor/jquery-1.9.1.min.js"></script>
<script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
<script src="/js/plugins.js"></script>
<script src="/js/main.js"></script>
<link rel="stylesheet" href="/css/QapTcha.jquery.css" />
<script src="/js/QapTcha.jquery.js"></script>
<link rel="stylesheet" href="/fancy-captcha/captcha.css" />
<script src="/fancy-captcha/jquery.captcha.js"></script>
</head>
<body>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-4436411-8']);
_gaq.push(['_setDomainName', 'extract-web-data.com']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div id="topbar"></div>
<a href="/" style="text-decoration: none">
<div id="title">WEB SCRAPER TESTING GROUND</div>
<div id="logo"></div>
</a>
<div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
<ol>
<li>Send user credentials via POST method</li>
<li>Receive, Keep and Return a session cookie</li>
<li>Process HTTP redirect (302)</li>
</ol>
<p>How to test:</p>
<ol>
<li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
<li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
<li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
<li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
<li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
<li>Click <b>GO BACK</b> to start again</li>
</ol>
</div>
<hr/>
<div id="case_login">
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
</div>
<br/><br/><br/>
</div>
</body>
</html>
>>>
打印(requests.get(url,auth=(用户名,密码)).text)
刮网机试验场
var _gaq=_gaq | |[];
_gaq.push([''设置帐户','UA-4436411-8']);
_gaq.push([''u setDomainName','extract web data.com']);
_gaq.push([''u trackPageview']);
(功能(){
var ga=document.createElement('script');ga.type='text/javascript';ga.async=true;
ga.src=('https:'==document.location.protocol?'https://ssl' : 'http://www“)+”.google analytics.com/ga.js';
var s=document.getElementsByTagName('script')[0];s.parentNode.insertBefore(ga,s);
})();
登录
通常,为了获得所需信息,您需要登录网站。今天的大多数网站使用所谓的基于表单的身份验证,这意味着使用POST方法发送用户凭据,在服务器上进行身份验证,并将用户会话存储在cookie中
这项简单的测试表明,铲运机能够:
请登录: 用户名: 密码:
>>> 由于输出中有一个web登录表单,我认为身份验证没有按预期工作
<h3>Please, login:</h3>
<form action="login?mode=login" method="POST">
<label for="usr">User name:</label>
<input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
<label for="pwd">Password:</label>
<input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
<input type="submit" value="Login">
</form>
请登录:
用户名:
密码:
在这种情况下有什么问题,我应该怎么做才能解决它?您应该在登录页面的引导下发布一篇文章:
>>>导入请求,重新
>>>url='1〕http://testing-ground.scraping.pro/login?mode=login'
>>>用户名='admin'
>>>密码='12345'
>>>post(url,数据={'usr':用户名,'pwd':密码})
谢谢@avloss,我忽略了http方法。应该是post,但我用get试过了。