Authentication 如何使用python Scrapy抓取Factiva数据？_Authentication_Login_Scrapy_Python 3.5

Authentication 如何使用python Scrapy抓取Factiva数据？

authentication login scrapy

Authentication 如何使用python Scrapy抓取Factiva数据？,authentication,login,scrapy,python-3.5,Authentication,Login,Scrapy,Python 3.5,我正在用Python 3.5.2从Factiva获取数据。我必须使用学校登录才能看到数据我跟随这篇文章尝试创建然而，我得到了这个错误：这是我的代码： # Test Login Spider import scrapy from scrapy.selector import HtmlXPathSelector from scrapy.http import Request login_url = "https://login.proxy.lib.sfu.ca/login?qurl=ht

我正在用Python 3.5.2从Factiva获取数据。我必须使用学校登录才能看到数据

我跟随这篇文章尝试创建

然而，我得到了这个错误：

这是我的代码：

# Test Login Spider
import scrapy
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request


login_url = "https://login.proxy.lib.sfu.ca/login?qurl=https%3a%2f%2fglobal.factiva.com%2fen%2fsess%2flogin.asp%3fXSID%3dS002sbj1svr2sVo5DEs5DEpOTAvNDAoODZyMHn0YqYvMq382rbRQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQAA"
user_name = b"[my_user_name]"
pswd = b"[my_password]"
response_page = "https://global-factiva-com.proxy.lib.sfu.ca/hp/printsavews.aspx?pp=Save&hc=All"


class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        return [scrapy.FormRequest(login_url,
                               formdata={'user': user_name, 'pass': pswd},
                               callback=self.logged_in)]

    def logged_in(self, response):
        # login failed
        if "authentication failed" in response.body:
            print ("Login failed")
        # login succeeded
        else:
            print ('login succeeded')
            # return Request(url=response_page,
            #        callback=self.parse_responsepage)

    def parse_responsepage(self, response):
        hxs = HtmlXPathSelector(response)
        yum = hxs.select('//span/@enHeadline')


def main():
    test_spider = MySpider(scrapy.Spider)
    test_spider.start_requests()

if __name__ == "__main__":
    main()

为了运行此代码，我使用了项目顶部目录中的terminal命令行：

scrapy runspider [my_file_path]/auth_spider.py

您知道如何处理这里的错误吗？

当您使用Python 3.x时，

“身份验证失败”

是一个

str，而响应。body
的类型为bytes

要解决此问题，请在str
中执行测试：
if "authentication failed" in response.body_as_unicode():

或在字节中
：
if b"authentication failed" in response.body:

哦，天哪，它显示登录成功。我以为我永远解决不了这个问题。。。。非常感谢你！！