Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/asp.net/31.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python,未识别身份验证-urllib2,请求,asp.net_Python_Asp.net_Passwords_Python Requests_Robobrowser - Fatal编程技术网

python,未识别身份验证-urllib2,请求,asp.net

python,未识别身份验证-urllib2,请求,asp.net,python,asp.net,passwords,python-requests,robobrowser,Python,Asp.net,Passwords,Python Requests,Robobrowser,虽然我在这方面不是特别先进,但我在使用urrlib2、requests和scrapy方面取得了一些成功,但这让我感到困惑。所以,在进行了大量的搜索并用头敲击键盘之后,我会继续问 我想得到一个网站的html源代码,但在使用我的用户名和密码后,我不断得到一个页面,上面说我的用户名和密码是错误的。它们在浏览器中工作良好,一旦登录,源代码就可以随时使用(通过浏览器)。但我似乎无法通过python/terminal实现相同的结果。我将在下面介绍我的一些尝试(从这些有用的页面中可以看出): 使用urllib

虽然我在这方面不是特别先进,但我在使用urrlib2、requests和scrapy方面取得了一些成功,但这让我感到困惑。所以,在进行了大量的搜索并用头敲击键盘之后,我会继续问

我想得到一个网站的html源代码,但在使用我的用户名和密码后,我不断得到一个页面,上面说我的用户名和密码是错误的。它们在浏览器中工作良好,一旦登录,源代码就可以随时使用(通过浏览器)。但我似乎无法通过python/terminal实现相同的结果。我将在下面介绍我的一些尝试(从这些有用的页面中可以看出):

使用urllib2:

req = Request(website, headers={ 'User-Agent': 'Mozilla/5.0' })
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
readweb = urlopen(req).read()
另一个版本:

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)

authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)

pagehandle = opener.open(theurl)
return pagehandle.read()
以及尝试使用以下请求:

r = requests.session()
try:
    r.post(theurl, data={'username' : 'username', 'password' : 'password', 'remember':'1'})
except:
    print('Sorry, Unable to...')
result = r.get(theurl)
return result.text
我也尝试过使用scrapy,但不管我使用哪个库,它都会返回一个页面的html,上面显示我的密码/详细信息是错误的。我猜这与我发送的标题/授权(?)有关,但我不太确定。任何非常感谢的帮助,请让我知道我可以更新哪些其他细节(我已经为此熬夜了半个晚上,所以如果这篇文章没有意义,请原谅我!)

编辑:

下面是对Prashant回答的回溯(不包括密码等):

文件“/Users/Hatsaw/newpy/pras.py”,第3行,在 r=requests.get(URL,auth=('username','password')) get中第67行的文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/requests-2.9.0-py2.7.egg/requests/api.py” 返回请求('get',url,params=params,**kwargs) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/requests-2.9.0-py2.7.egg/requests/api.py”,请求中第53行 return session.request(method=method,url=url,**kwargs) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/requests-2.9.0-py2.7.egg/requests/sessions.py”,请求中第468行 resp=自我发送(准备,**发送) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/requests-2.9.0-py2.7.egg/requests/sessions.py”,第576行,在send中 r=适配器.send(请求,**kwargs) 文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site packages/requests-2.9.0-py2.7.egg/requests/adapters.py”,第437行,在send中 raise ConnectionError(e,请求=请求) requests.exceptions.ConnectionError:HTTPConnectionPool(host='website',port=80):url:/dashboard超过了最大重试次数/(由NewConnectionError引起(':未能建立新连接:[Errno 8]提供的节点名或服务名,或未知'))

编辑:

好的,我现在正在使用mechanize(推荐如下),下面是我得到的反馈(不确定这是否是我根本问题的另一个实例,还是我无法使用mechanize!):

文件“/Users/Hatsaw/newpy/pras2.py”,第13行,在 browser.form['email']='email address' 文件“build/bdist.macosx-10.6-intel/egg/mechanize/_form.py”,第2780行,位于setitem 文件“build/bdist.macosx-10.6-intel/egg/mechanize/_form.py”,第3101行,在find_控件中 文件“build/bdist.macosx-10.6-intel/egg/mechanize/_form.py”,第3185行,在_find_控件中 mechanize.\u form.ControlNotFoundError:没有与名称“email”匹配的控件

编辑:

仍然在努力解决这个问题,所以在这个项目的时间用完之前,这里有一个最后的努力,我必须手动获取所有的html!祈祷

好的,根据barny的建议,我又开始使用请求了,我正试图为帖子提供cookie信息,这些信息是我从成功的浏览器登录中获得的。我不确定我做得是否正确,但我正在使用:

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0'}
result = sess.get(the_url, cookies=cookies)
现在,我得到了一个内部服务器错误响应。经过一些研究,aspnet表单似乎是个问题:

我只是想先检查一下我的请求是否有问题,然后也许我会按照Martijn Pieters在上面的SO链接中的建议探索BeautifulSoup/robobrowser

以下是html表单部分的要求:

<form name="aspnetForm" method="post" action="" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATEFIELDCOUNT" id="__VIEWSTATEFIELDCOUNT" value="2" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkwNzg1NTQ3OA9kFgJmD2QWAmYPZBYGAgetc." />
<input type="hidden" name="__VIEWSTATE1" id="__VIEWSTATE1"     value="ZyBBIEhvbWUVIE5lZ290aWF0ZSBBZ3JlZW1lbnRzEiBSZetc." />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
    theForm.__EVENTTARGET.value = eventTarget;
    theForm.__EVENTARGUMENT.value = eventArgument;
    theForm.submit();
}
}
//]]>
</script>


<script src="/WebResource.axd?d=t2SAOwDGkbrEfkmUaMOR9sPLXqgxfeenNayRja3DNK2R8JEcH-StTTuiaqXpzp--PAISn3vzVbWQ7biREwPkibCmbAE1&amp;t=635586505120000000" type="text/javascript"></script>


<script src="/ScriptResource.axd?d=EL6tXtJfNfGSoQwhYtVnYEqw4oKvuwBBI4etc."     type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>

<script src="/ScriptResource.axd?d=qCmNMcECQa0tfmMcZdwJeeOdcyetc." type="text/javascript"></script>
<div>

<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FC5C7135" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdABB2xJRvPLCcg6GsBqRFCtw6Xg91QEu10etc." />
</div>
抱歉,这已经变得相当长了,如果我需要把它分成几个帖子,请让我知道-什么我认为是一个简单的问题在一开始已经变成了其他东西

import requests
URL = "http://www.facebook.com'
r = requests.get(URL, auth=('username','password'))
source = r.text
print source
-----改变-----

胜利

好的,感谢Prashant和barny的回复,并通过本文向Martijn Pieters表示衷心的感谢:

我找到了我的救赎 .

代码如下:

from robobrowser import RoboBrowser

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

browser = RoboBrowser(parser='lxml')

browser.open(login)
form = browser.get_forms()  

# You can use '.get_form()' for a specific form but I'm finding it easier to 
# using '.get_forms()' to get all the forms and then I'm just interested 
# in the first one:

form = form[0]
print form     # this will give you the information you need to 
               # now enter your password details:   

form['the_user'].value = username
form['the_pass'].value = password

browser.submit_form(form)

# and then because I'm after the html of certain content pages:

browser.open(content)
source = str(browser.parsed)
return source

很酷,我没听说过mechanize,我现在下载了它,并尝试了一下(我假设你的代码的后半部分需要放在函数?或类?)它还没有打印任何响应,但我会尝试一下,看看我能做些什么,为Prashant干杯,我会更新soonOk,所以我认为mechanize运行正常,但我得到一个错误(见上文)-这是否与您提到的表单编号有关?我把它放在了0。登录网站,查看你的cookies并复制PHPSESSID的值。然后将其粘贴到config.ini中:cookie=并设置keepsignedin=1Hi Prashant,感谢您的帮助,我现在已经获得了PHPSESSID,但我有点不知道将其粘贴到何处等。-config.ini是一个机械化的东西吗?还是网络浏览器?我已经搜索过了,但仍然不确定..如果使用请求会话,它将保存cookie并在后续请求中自动提供cookie。
import requests

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

sess = requests.Session()
sess.auth = ('username', 'password')
sess.get(the_url)

payload = {'ctl00$cphMain$tbUsername': username, 'ctl00$cphMain$tbPassword': password}
r_login = sess.post(login, data=payload)

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0', 'ASP.NET_SessionId':'aspnet', 'BRO_LOGIN':'bro_login'}
r_data = s.get(content, cookies=cookies, data=payload)

print r_data.text
import requests
URL = "http://www.facebook.com'
r = requests.get(URL, auth=('username','password'))
source = r.text
print source
import mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
cookies = mechanize.CookieJar()
browser.set_cookiejar(cookies)
browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7')]
browser.set_handle_refresh(False)

url = 'http://www.facebook.com/login.php'
self.browser.open(url)
self.browser.select_form(nr = 0)       #This is login-password form -> nr = number = 0
self.browser.form['email'] = YourLogin
self.browser.form['pass'] = YourPassw
response = self.browser.submit()
print response.read()
from robobrowser import RoboBrowser

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

browser = RoboBrowser(parser='lxml')

browser.open(login)
form = browser.get_forms()  

# You can use '.get_form()' for a specific form but I'm finding it easier to 
# using '.get_forms()' to get all the forms and then I'm just interested 
# in the first one:

form = form[0]
print form     # this will give you the information you need to 
               # now enter your password details:   

form['the_user'].value = username
form['the_pass'].value = password

browser.submit_form(form)

# and then because I'm after the html of certain content pages:

browser.open(content)
source = str(browser.parsed)
return source