Python BS4：谷歌下一页“；仅实现以下伪类：类型为“的第n个”；_Python_Web Scraping_Beautifulsoup

Python BS4：谷歌下一页“；仅实现以下伪类：类型为“的第n个”；

python web-scraping

Python BS4：谷歌下一页“；仅实现以下伪类：类型为“的第n个”；,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,虽然能够成功地抓取第一页，但它不允许我做第二页。请注意，我不想对Selinum执行此操作 import requests from bs4 import BeautifulSoup url = 'https://google.com/search?q=In+order+to&hl=en' headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/7

虽然能够成功地抓取第一页，但它不允许我做第二页。请注意，我不想对Selinum执行此操作

import requests
from bs4 import BeautifulSoup


url = 'https://google.com/search?q=In+order+to&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

page = 1
while True:
    print()
    print('Page {}...'.format(page))
    print('-' * 80)

    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
    for h in soup.select('h3'):
        print(h.get_text(strip=True))

    next_link = soup.select_one('a:contains("Next")')
    if not next_link:
        break

    url = 'https://google.com' + next_link['href']
    page += 1

结果:

Page 1...
--------------------------------------------------------------------------------
In order to Synonyms, In order to Antonyms | Thesaurus.com
In order to - English Grammar Today - Cambridge Dictionary
in order to - Wiktionary
What is another word for "in order to"? - WordHippo
In Order For (someone or something) To | Definition of In ...
In Order For | Definition of In Order For by Merriam-Webster
In order to definition and meaning | Collins English Dictionary
Using "in order to" in English - English Study Page
IN ORDER (FOR SOMEONE / SOMETHING ) TO DO ...
262 In Order To synonyms - Other Words for In Order To
Searches related to In order to
Only the following pseudo-classes are implemented: nth-of-type.

错误在于：

next_link = soup.select_one('a:contains("Next")')

您可以使用

lxml

作为解析器，而不是

html.parser

使用

pip安装lxml安装它
import requests
from bs4 import BeautifulSoup


url = 'https://google.com/search?q=In+order+to&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

page = 1
while True:
    print()
    print('Page {}...'.format(page))
    print('-' * 80)

    soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
    for h in soup.select('h3'):
        print(h.get_text(strip=True))

    next_link = soup.select_one('a:contains("Next")')
    if not next_link:
        break

    url = 'https://google.com' + next_link['href']
    page += 1

尝试不同的解析器，我知道lxml可以做到这一点。