Python 请求无法保持登录会话
我正试图从中截取一些电子邮件,这些电子邮件只对登录用户可用。但当我尝试这样做时,它失败了。我越来越 注销时: 代码本身:Python 请求无法保持登录会话,python,parsing,session,beautifulsoup,python-requests,Python,Parsing,Session,Beautifulsoup,Python Requests,我正试图从中截取一些电子邮件,这些电子邮件只对登录用户可用。但当我尝试这样做时,它失败了。我越来越 注销时: 代码本身: import requests from bs4 import BeautifulSoup import traceback login_data = {'form[email]': 'xxxxxxx@gmail.com', 'form[password]': 'xxxxxxxxx', 'remember': 1,} base_url = 'http://www.mdpi
import requests
from bs4 import BeautifulSoup
import traceback
login_data = {'form[email]': 'xxxxxxx@gmail.com', 'form[password]': 'xxxxxxxxx', 'remember': 1,}
base_url = 'http://www.mdpi.com'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; rv:40.0) Gecko/20100101 Firefox/40.0'}
session = requests.Session()
session.headers = headers
# log_in
s = session.post('https://susy.mdpi.com/user/login', data=login_data)
print(s.text)
print(session.cookies)
def make_soup(url):
try:
r = session.get(url)
soup = BeautifulSoup(r.content, 'lxml')
return soup
except:
traceback.print_exc()
return None
example_link = 'http://www.mdpi.com/search?journal=medsci&year_from=1996&year_to=2017&page_count=200&sort=relevance&view=default'
def article_finder(soup):
one_page_articles_divs = soup.find_all('div', class_='article-content')
for article_div in one_page_articles_divs:
a_link = article_div.find('a', class_='title-link')
link = base_url + a_link.get('href')
print(link)
article_soup = make_soup(link)
grab_author_info(article_soup)
def grab_author_info(article_soup):
# title of the article
article_title = article_soup.find('h1', class_="title").text
print(article_title)
# affiliation
affiliations_div = article_soup.find('div', class_='art-affiliations')
affiliation_dict = {}
aff_indexes = affiliations_div.find_all('div', class_='affiliation-item')
aff_values = affiliations_div.find_all('div', class_='affiliation-name')
for i, index in enumerate(aff_indexes): # 0, 1
affiliation_dict[int(index.text)] = aff_values[i].text
# authors names
authors_div = article_soup.find('div', class_='art-authors')
authors_spans = authors_div.find_all('span', class_='inlineblock')
for span in authors_spans:
name_and_email = span.find_all('a') # name and email
name = name_and_email[0].text
# email
email = name_and_email[1].get('href')[7:]
# affiliation_index
affiliation_index = span.find('sup').text
indexes = set()
if len(affiliation_index) > 2:
for i in affiliation_index.strip():
try:
ind = int(i)
indexes.add(ind)
except ValueError:
pass
print(name)
for index in indexes:
print('affiliation =>', affiliation_dict[index])
print('email: {}'.format(email))
if __name__ == '__main__':
article_finder(make_soup(example_link))
我应该怎么做才能得到我想要的?啊,这很简单,你没有正确登录。如果您查看第一次呼叫的响应,您将看到返回的是登录页面HTML,而不是我的个人资料页面。原因是您没有提交表单上的隐藏令牌 解决方案请求登录页面,然后使用lxml或BeautifulSoup解析隐藏的输入“form[_token]”。获取该值,然后将其添加到登录数据有效负载中
然后提交您的登录请求,您就可以登录了。呃,您是否在代码中输入了您的实际登录名和密码?你不想隐藏它们吗?是的,它们是,我编辑了你的问题,这样随机的人就不能访问你的帐户。不客气。也许你现在应该更改密码。是的。添加:
login\u data['form[\u-token]]=bs.find('input',id='form\u-token')。get('value')
,它可以工作。谢谢