用pythonwebcrawler模拟cookie
我需要一些帮助。我正在尝试使用“请求”库和BeautifulSoup4库制作一个网络爬虫程序,但为了成功地做到这一点,我必须访问一个链接来激活特定的cookies,以允许我搜索该查询的内容用pythonwebcrawler模拟cookie,python,cookies,python-3.x,Python,Cookies,Python 3.x,我需要一些帮助。我正在尝试使用“请求”库和BeautifulSoup4库制作一个网络爬虫程序,但为了成功地做到这一点,我必须访问一个链接来激活特定的cookies,以允许我搜索该查询的内容 import requests from bs4 import BeautifulSoup def web_spider(max_pages, query): page = 1 while page <= max_pages: url = r'http://websit
import requests
from bs4 import BeautifulSoup
def web_spider(max_pages, query):
page = 1
while page <= max_pages:
url = r'http://website.com/search/index?page=' + str(page) + '&q=' + query
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.finaAll('a', {'class': 'comments_link'}):
href = 'http://website.com/' + link.get('href')
print(href)
page += 1
导入请求
从bs4导入BeautifulSoup
def网络蜘蛛(最大页面数,查询):
页码=1
页面使用时,cookies将自动处理:
session = requests.Session()
def web_spider(max_pages, query):
page = 1
while page <= max_pages:
url = 'http://website.com/search/index'
params = {'page': page, 'q': query}
source_code = session.get(url, params=params)
plain_text = source_code.content
soup = BeautifulSoup(plain_text)
for link in soup.select('a.comments_link[href]'):
href = 'http://website.com/' + link['href']
print(href)
page += 1
session=requests.session()
def网络蜘蛛(最大页面数,查询):
页码=1
页面使用时,cookies将自动处理:
session = requests.Session()
def web_spider(max_pages, query):
page = 1
while page <= max_pages:
url = 'http://website.com/search/index'
params = {'page': page, 'q': query}
source_code = session.get(url, params=params)
plain_text = source_code.content
soup = BeautifulSoup(plain_text)
for link in soup.select('a.comments_link[href]'):
href = 'http://website.com/' + link['href']
print(href)
page += 1
session=requests.session()
def网络蜘蛛(最大页面数,查询):
页码=1
当我输入'session=session()'时,我收到一个错误,上面写着“未解析的引用'session'”,我把它改为'session=requests.session()'可以吗?@ThatBenderGuy:是的,对不起,我弄错了。最后一个问题是,session.get('http://website.com/cookieToggle“)
?或者更好(很抱歉有这么多问题)如何查看python为该会话存储的当前cookies?当我输入'session=session()'时,我收到一个错误,说“Unresolved reference'session'”,我将其更改为'session=requests.session()'可以吗?@thabenderguy:是的,对不起,我的错误。最后一个问题是,我将session.get放置在哪里('http://website.com/cookieToggle“)
?或者更好(抱歉,有这么多问题)如何查看python为该会话存储的当前cookie?