Python 用靓汤抓取网页，输入所有链接并获取信息_Python_Web Scraping_Beautifulsoup_Python Requests

Python 用靓汤抓取网页，输入所有链接并获取信息

python web-scraping

Python 用靓汤抓取网页，输入所有链接并获取信息,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,我正在尝试打开StackOverflow公司的每家公司，并获取具体信息（如整个描述）。有没有一种简单的方法可以使用Beautiful Soup实现这一点？目前，我正在获取第一页公司的链接 import requests from bs4 import BeautifulSoup r = requests.get('https://stackoverflow.com/jobs/companies') src = r.content soup = BeautifulSoup(src,'lxml')

我正在尝试打开StackOverflow公司的每家公司，并获取具体信息（如整个描述）。有没有一种简单的方法可以使用Beautiful Soup实现这一点？目前，我正在获取第一页公司的链接

import requests
from bs4 import BeautifulSoup

r = requests.get('https://stackoverflow.com/jobs/companies')
src = r.content
soup = BeautifulSoup(src,'lxml')
urls=[]

for h2_tag in soup.find_all("h2"):
    a_tag = h2_tag.find('a')
    urls.append(a_tag.attrs['href'])

print(urls)

还有，您可以滚动第一页，然后使用selenium转到第二页，单击第二页按钮，每次都传递页面源代码，我认为这应该可以工作

import requests
from bs4 import BeautifulSoup as bsoup

for i in range(0, 5):
    site_source = requests.get(
        f"https://stackoverflow.com/jobs/companies?pg={i}"
    ).content
    soup = bsoup(site_source, "html.parser")
    company_list = soup.find("div", class_="company-list")
    company_block = company_list.find_all("div", class_="grid--cell fl1 text")
    for company in company_block:
        if company.find("a"):
            company_url = company.find("a").attrs["href"]
            base_url = "https://stackoverflow.com"
            company_source = requests.get(base_url + company_url).content
            company_soup = bsoup(company_source, "html.parser")
            company_info = company_soup.find("div", id="company-name-tagline")
            print("Name: ", company_info.find("h1").text)
            print("Info: ", company_info.find("p").text)
            print()

我基本上是在第1页到第5页之间循环，获得每个公司的链接，然后转到公司名称，打印出公司名称和描述

我的输出

Name:  BigCommerce
Info:  Think BIG

Name:  Facebook
Info:  Our mission is to give people the power to build community and bring the world closer together.   

Name:  trivago N.V.
Info:  A diverse team of talents that make a blazing fast accommodation search powered by cutting-edge tech and entrepreneurial innovation. 

Name:  General Dynamics UK
Info:  General Dynamics UK is one of the UK’s leading defence companies, and an important supplier to the UK Ministry of Defence (MoD).   

Name:  EDF
Info:  EDF is leading the transition to a cleaner, low emission electric future, tackling climate change and helping Britain reach net zero.

Name:  Radix DLT
Info:  Delivering Scalable Trust.

谢谢你的回答。我的意思是我想打开每家公司的链接，获取我需要的信息，然后回去为下一家公司做同样的事情（我知道如何循环浏览页面）。无论如何，谢谢你！：）你可以查看我的更新答案，我正在做你提到的事情，这是打开每家公司的链接来获取信息。非常感谢你对我的帮助！