Python BS4:谷歌下一页“;仅实现以下伪类:类型为“的第n个”;
虽然能够成功地抓取第一页,但它不允许我做第二页。请注意,我不想对Selinum执行此操作Python BS4:谷歌下一页“;仅实现以下伪类:类型为“的第n个”;,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,虽然能够成功地抓取第一页,但它不允许我做第二页。请注意,我不想对Selinum执行此操作 import requests from bs4 import BeautifulSoup url = 'https://google.com/search?q=In+order+to&hl=en' headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/7
import requests
from bs4 import BeautifulSoup
url = 'https://google.com/search?q=In+order+to&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
page = 1
while True:
print()
print('Page {}...'.format(page))
print('-' * 80)
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for h in soup.select('h3'):
print(h.get_text(strip=True))
next_link = soup.select_one('a:contains("Next")')
if not next_link:
break
url = 'https://google.com' + next_link['href']
page += 1
结果:
Page 1...
--------------------------------------------------------------------------------
In order to Synonyms, In order to Antonyms | Thesaurus.com
In order to - English Grammar Today - Cambridge Dictionary
in order to - Wiktionary
What is another word for "in order to"? - WordHippo
In Order For (someone or something) To | Definition of In ...
In Order For | Definition of In Order For by Merriam-Webster
In order to definition and meaning | Collins English Dictionary
Using "in order to" in English - English Study Page
IN ORDER (FOR SOMEONE / SOMETHING ) TO DO ...
262 In Order To synonyms - Other Words for In Order To
Searches related to In order to
Only the following pseudo-classes are implemented: nth-of-type.
错误在于:
next_link = soup.select_one('a:contains("Next")')
您可以使用
lxml
作为解析器,而不是html.parser
使用pip安装lxml安装它
import requests
from bs4 import BeautifulSoup
url = 'https://google.com/search?q=In+order+to&hl=en'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
page = 1
while True:
print()
print('Page {}...'.format(page))
print('-' * 80)
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
for h in soup.select('h3'):
print(h.get_text(strip=True))
next_link = soup.select_one('a:contains("Next")')
if not next_link:
break
url = 'https://google.com' + next_link['href']
page += 1
尝试不同的解析器,我知道lxml可以做到这一点。