Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用pythonwebcrawler模拟cookie_Python_Cookies_Python 3.x - Fatal编程技术网

用pythonwebcrawler模拟cookie

用pythonwebcrawler模拟cookie,python,cookies,python-3.x,Python,Cookies,Python 3.x,我需要一些帮助。我正在尝试使用“请求”库和BeautifulSoup4库制作一个网络爬虫程序,但为了成功地做到这一点,我必须访问一个链接来激活特定的cookies,以允许我搜索该查询的内容 import requests from bs4 import BeautifulSoup def web_spider(max_pages, query): page = 1 while page <= max_pages: url = r'http://websit

我需要一些帮助。我正在尝试使用“请求”库和BeautifulSoup4库制作一个网络爬虫程序,但为了成功地做到这一点,我必须访问一个链接来激活特定的cookies,以允许我搜索该查询的内容

import requests
from bs4 import BeautifulSoup

def web_spider(max_pages, query):
    page = 1
    while page <= max_pages:
        url = r'http://website.com/search/index?page=' + str(page) + '&q=' + query
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        for link in soup.finaAll('a', {'class': 'comments_link'}):
            href = 'http://website.com/' + link.get('href')
            print(href)
        page += 1
导入请求
从bs4导入BeautifulSoup
def网络蜘蛛(最大页面数,查询):
页码=1
页面使用时,cookies将自动处理:

session = requests.Session()

def web_spider(max_pages, query):
    page = 1
    while page <= max_pages:
        url = 'http://website.com/search/index'
        params = {'page': page, 'q': query}
        source_code = session.get(url, params=params)
        plain_text = source_code.content
        soup = BeautifulSoup(plain_text)
        for link in soup.select('a.comments_link[href]'):
            href = 'http://website.com/' + link['href']
            print(href)
        page += 1
session=requests.session()
def网络蜘蛛(最大页面数,查询):
页码=1
页面使用时,cookies将自动处理:

session = requests.Session()

def web_spider(max_pages, query):
    page = 1
    while page <= max_pages:
        url = 'http://website.com/search/index'
        params = {'page': page, 'q': query}
        source_code = session.get(url, params=params)
        plain_text = source_code.content
        soup = BeautifulSoup(plain_text)
        for link in soup.select('a.comments_link[href]'):
            href = 'http://website.com/' + link['href']
            print(href)
        page += 1
session=requests.session()
def网络蜘蛛(最大页面数,查询):
页码=1

当我输入'session=session()'时,我收到一个错误,上面写着“未解析的引用'session'”,我把它改为'session=requests.session()'可以吗?@ThatBenderGuy:是的,对不起,我弄错了。最后一个问题是,
session.get('http://website.com/cookieToggle“)
?或者更好(很抱歉有这么多问题)如何查看python为该会话存储的当前cookies?当我输入'session=session()'时,我收到一个错误,说“Unresolved reference'session'”,我将其更改为'session=requests.session()'可以吗?@thabenderguy:是的,对不起,我的错误。最后一个问题是,我将
session.get放置在哪里('http://website.com/cookieToggle“)
?或者更好(抱歉,有这么多问题)如何查看python为该会话存储的当前cookie?