如何使用Python和BeautifulSoup刮取多个google页面_Python_Beautifulsoup

如何使用Python和BeautifulSoup刮取多个google页面

python

如何使用Python和BeautifulSoup刮取多个google页面,python,beautifulsoup,Python,Beautifulsoup,我写了一个代码，可以刮谷歌新闻搜索结果。但它总是只刮第一页。如何写一个循环，让我刮头2，3…n页我知道在url中，我需要为页面添加参数，并为循环添加所有参数，但我不知道如何添加此代码为我提供了第一个搜索页面的标题、段落和日期： from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, li

我写了一个代码，可以刮谷歌新闻搜索结果。但它总是只刮第一页。如何写一个循环，让我刮头2，3…n页

我知道在

url

中，我需要为页面添加参数，并为循环

添加所有参数，但我不知道如何添加
此代码为我提供了第一个搜索页面的标题、段落和日期：
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

term = 'usa'
url = 'https://www.google.com/search?q={0}&source=lnms&tbm=nws'.format(term)# i know that I need to add this parameter for page, but I  do not know how

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

headline_text = soup.find_all('h3', class_= "r dO0Ag")

snippet_text = soup.find_all('div', class_='st')

news_date = soup.find_all('div', class_='slp')

另外，谷歌新闻和网页的这种逻辑是否可以应用于例如bing新闻或雅虎新闻，我的意思是，我可以使用相同的参数还是url不同？
我想你需要更改你的url。试试下面的代码看看这是否行得通
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

term = 'usa'
page=0


while True:
    url = 'https://www.google.com/search?q={}&tbm=nws&sxsrf=ACYBGNTx2Ew_5d5HsCvjwDoo5SC4U6JBVg:1574261023484&ei=H1HVXf-fHfiU1fAP65K6uAU&start={}&sa=N&ved=0ahUKEwi_q9qog_nlAhV4ShUIHWuJDlcQ8tMDCF8&biw=1280&bih=561&dpr=1.5'.format(term,page)
    print(url)

    response = requests.get(url, headers=headers,verify=False)
    if response.status_code!=200:
        break
    soup = BeautifulSoup(response.text, 'html.parser')

    headline_text = soup.find_all('h3', class_= "r dO0Ag")

    snippet_text = soup.find_all('div', class_='st')

    news_date = soup.find_all('div', class_='slp')
    page=page+10

要小心，因为谷歌有一些强大的反刮措施，你可能会被阻止。如果你不想开发一个非常安全的刮削器（IP旋转，人体运动复制等），你可以考虑使用谷歌的API来获取你的数据，你可以做<代码> URL=https://www.google.com/search?q={0}&source=lnms&tbm=nws&page={1}'。格式（术语，页面）

看一看，我已经试过了，但它总是返回第一页，我可以为页面输入任何数字，它总是返回num 1页面的内容。它现在可以工作了，但是有没有一种方法可以不用这么长的url呢？我的意思是，如果我只想要第1页，url会短3首。也，这个url如何查找yahoo和bingThen请求可能对您没有帮助。然后您必须使用浏览器工具（如selenium WebDriver）并单击每个分页链接以获取新的页面值。@taga：如果您有任何新的研究，并且发现了问题，请发布一个新的问题，并提及您的目标。如果我不是其他人我一定会帮你的。谢谢。嘿，你能帮我吗？