Html 使用MechanicalSoup成功登录后，站点在抓取时再次返回登录页面？_Html_Python 3.x_Web Scraping_Beautifulsoup_Mechanicalsoup

Html 使用MechanicalSoup成功登录后，站点在抓取时再次返回登录页面？

html python-3.x web-scraping

Html 使用MechanicalSoup成功登录后，站点在抓取时再次返回登录页面？,html,python-3.x,web-scraping,beautifulsoup,mechanicalsoup,Html,Python 3.x,Web Scraping,Beautifulsoup,Mechanicalsoup,作为项目的一部分，我试图使用BeautifulSoup从Twitter上获取一些数据。为了抓取“following”部分，我需要首先登录，所以我尝试使用MechanicalSoup。我知道登录是成功的，因为我收到了一封电子邮件，这样说，但当我去同一个网站的另一个页面刮数据时，它会再次将我重定向到登录页面导入机械组 browser=mechanicalsoup.StatefulBrowser（soup_config={'features'：'lxml'}，在_404上升起_=真，用户_age

作为项目的一部分，我试图使用BeautifulSoup从Twitter上获取一些数据。为了抓取“following”部分，我需要首先登录，所以我尝试使用MechanicalSoup。我知道登录是成功的，因为我收到了一封电子邮件，这样说，但当我去同一个网站的另一个页面刮数据时，它会再次将我重定向到登录页面

导入机械组
browser=mechanicalsoup.StatefulBrowser（soup_config={'features'：'lxml'}，
在_404上升起_=真，
用户_agent='MyBot/0.1:mysite.example.com/bot\u info'，）
登录页面=浏览器。获取（“https://twitter.com/login")
login\u form=login\u page.soup.findAll（“表单”）
登录表单=登录表单[2]
login_form.find（“输入”，{“名称”：“会话[用户名或电子邮件]”}）[“值”]=“puturusername”
login_form.find（“输入”，{“名称”：“会话[密码]”}）[“值”]=“puturpassword”
login\u response=browser.submit（登录表单，登录页面.url）
login_response.soup（）

这给我发送了一封成功的登录电子邮件，我尝试：

page\u stml=browser.open（'https://twitter.com/MKBHD/following）。文本
page_soup=soup（page_html，“html.parser”）
佩奇汤

我收到了包含

https://twitter.com/login?redirect_after_login=%2FMKBHD%2Ffollowing&

而不是实际的“后续”页面

如果我尝试下面给出的代码而不是“browser.open（“”）.text”：

#验证我们现在已登录
页面=浏览器。获取当前页面（）
打印（第页）
消息=页面查找（“div”，class=“flash消息”）
如果消息：
打印（messages.text）
断言页面。选择（“.logout表单”）
打印（页面、标题、文本）
#在浏览其他页面时，验证我们是否保持登录状态（感谢cookies）
#场地
page3=浏览器。打开（“https://github.com/MechanicalSoup/MechanicalSoup")
断言page3.soup.select（“.logout表单”）

我得到输出：

----> 4 messages = page.find("div", class_="flash-messages")
AttributeError: 'NoneType' object has no attribute ‘find’

更新：

login\u response.soup（）

给我以下信息：

 </style>, <body>
 <noscript>
 <center>If you’re not redirected soon, please <a href="/">use this link</a>.</center>
 </noscript>
 <script nonce="O1gf092z/sXmKkH64mLOzQ==">

       document.cookie = "app_shell_visited=1;path=/;max-age=5";

       location.replace(location.href.split("#")[0]);
     </script>
 </body>, <noscript>
 <center>If you’re not redirected soon, please <a href="/">use this link</a>.</center>
 </noscript>, <center>If you’re not redirected soon, please <a href="/">use this link</a>.</center>, <a href="/">use this link</a>, <script nonce="O1gf092z/sXmKkH64mLOzQ==">

       document.cookie = "app_shell_visited=1;path=/;max-age=5";

       location.replace(location.href.split("#")[0]);
     </script>]

，
如果你没有很快被重定向，请。
document.cookie=“app\u shell\u访问次数=1；路径=/；最大年龄=5”；
location.replace（location.href.split（“#”）[0]）；
, 
如果你没有很快被重定向，请。
，如果您没有很快被重定向，请。，
document.cookie=“app\u shell\u访问次数=1；路径=/；最大年龄=5”；
location.replace（location.href.split（“#”）[0]）；
]

为了避免获取重定向页面，可以使用StatefulBrowser（）对象而不是Browser（）

我写了一篇关于它的短文：

资料来源：

为了避免获取重定向页面，可以使用StatefulBrowser（）对象而不是Browser（）

我写了一篇关于它的短文：

资料来源：

如果禁用JavaScript，该网站在浏览器中工作吗？@MatthieuMoy刚刚尝试过，它不起作用…无论如何要解决这个问题？阅读文档；-）？如果禁用JavaScript，该网站在浏览器中工作吗？可能重复？@MatthieuMoy刚刚尝试过，它不起作用…无论如何要解决这个问题？阅读文档；-）？可能重复美食

import mechanicalsoup

if __name__ == "__main__":

    URL = "https://twitter.com/login"
    LOGIN = "your_login"
    PASSWORD = "your_password"
    TWITTER_NAME = "displayed_name" # Displayed username on Twitter

    # Create a browser object
    browser = mechanicalsoup.StatefulBrowser()

    # request Twitter login page
    browser.open(URL)

    # we grab the login form
    browser.select_form('form[action="https://twitter.com/sessions"]')

    # print form inputs
    browser.get_current_form().print_summary()

    # specify username and password
    browser["session[username_or_email]"] = LOGIN
    browser["session[password]"] = PASSWORD

    # submit form
    response = browser.submit_selected()

    # get current page output
    response_after_login = browser.get_current_page()

    # verify we are now logged in ( get img alt element containing username )
    # if you found a better way to check, let me know. Since twitter generate dynamically all theirs classes, its
    # pretty complicated to get better information
    user_element = response_after_login.select_one("img[alt="+TWITTER_NAME+"]")

    # if username is in the img field, it means the user is successfully connected
    if TWITTER_NAME in str(user_element):
        print("You're connected as " + TWITTER_NAME)
    else:
        print("Not connected")