Python 找不到Web Scraping Div类_Python_Web Scraping_Beautifulsoup

Python 找不到Web Scraping Div类

python web-scraping

Python 找不到Web Scraping Div类,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图在chambers.com上搜集信息，更具体地说，在本例中。我想要的信息是“排名部门”选项卡上“英国”部分下的不同部门和乐队。以下部分的图像：我目前遇到的问题是beautiful soup的find_all，我假设是解析器。我想查找所有到目前为止我拥有的代码是： import requests from bs4 import BeautifulSoup url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-globa

我试图在chambers.com上搜集信息，更具体地说，在本例中。我想要的信息是“排名部门”选项卡上“英国”部分下的不同部门和乐队。以下部分的图像：

我目前遇到的问题是beautiful soup的

find_all

，我假设是解析器。我想查找所有

到目前为止我拥有的代码是：

import requests
from bs4 import BeautifulSoup
url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-global-2:7'

plain_html_text = requests.get(url_to_scrape)

soup = BeautifulSoup(plain_html_text.content, "lxml")

search = soup.find_all("div", {"class": "mb-3"})

print(search)

列表中没有返回任何内容。我使用浏览器上的inspector从HTML中获取了类

我尝试过将HTML直接添加到pyhton文件中，也尝试过使用

HTML.parser

，但仍然没有返回任何结果

任何帮助都会非常感激，即使它是一个寻找的建议。

而不是写

汤。查找所有（“div”，“class”：“mb-3”}）

使用

soup.find_all（“div”，class=“mb-3”}）

而不是写

汤。查找所有（“div”，“class”：“mb-3”}）

使用

soup.find_all（“div”，class=“mb-3”}）

检查页面的来源，您会发现此页面中没有此类元素。刮掉API：

import requests

url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
    if location['description'] == 'UK':
        for info in location['rankedEntities']:
            print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")

打印：

Banking & Finance: Borrowers
Band 1

Banking & Finance: Lenders
Band 1

Banking & Finance: Sponsors
Band 2

Capital Markets: Debt
Band 1

Capital Markets: Derivatives
Band 1

Capital Markets: Equity
Band 1

Capital Markets: Securitisation
Band 1

Capital Markets: Structured Finance
Band 1

Competition Law
Band 2

Corporate M&A (International & Cross-Border)
Band 1

Dispute Resolution: International Arbitration
Band 2

Dispute Resolution: Litigation
Band 1

Disputes (International & Cross-Border)
Band 1

Employment
Band 2

Energy & Natural Resources: Oil & Gas
Band 1

Energy & Natural Resources: Power
Band 1

Energy & Natural Resources: Renewables & Alternative Energy
Band 1

Energy Sector (International & Cross-Border)
Band 1

Finance & Capital Markets (International & Cross-Border)
Band 1

Insurance: Mainly Policyholders
Band 1

Intellectual Property
Band 2

Intellectual Property: Patent Litigation
Band 1

Investigations & Enforcement (International & Cross-Border)
Band 2

Investment Funds & Asset Management (International & Cross-Border)
Band 2

Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2

Projects
Band 1

Restructuring/Insolvency
Band 1

检查页面的源代码，您会发现此页面中没有此类元素。刮掉API：

import requests

url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
    if location['description'] == 'UK':
        for info in location['rankedEntities']:
            print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")

打印：

Banking & Finance: Borrowers
Band 1

Banking & Finance: Lenders
Band 1

Banking & Finance: Sponsors
Band 2

Capital Markets: Debt
Band 1

Capital Markets: Derivatives
Band 1

Capital Markets: Equity
Band 1

Capital Markets: Securitisation
Band 1

Capital Markets: Structured Finance
Band 1

Competition Law
Band 2

Corporate M&A (International & Cross-Border)
Band 1

Dispute Resolution: International Arbitration
Band 2

Dispute Resolution: Litigation
Band 1

Disputes (International & Cross-Border)
Band 1

Employment
Band 2

Energy & Natural Resources: Oil & Gas
Band 1

Energy & Natural Resources: Power
Band 1

Energy & Natural Resources: Renewables & Alternative Energy
Band 1

Energy Sector (International & Cross-Border)
Band 1

Finance & Capital Markets (International & Cross-Border)
Band 1

Insurance: Mainly Policyholders
Band 1

Intellectual Property
Band 2

Intellectual Property: Patent Litigation
Band 1

Investigations & Enforcement (International & Cross-Border)
Band 2

Investment Funds & Asset Management (International & Cross-Border)
Band 2

Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2

Projects
Band 1

Restructuring/Insolvency
Band 1

web抓取中最大的问题之一是客户端渲染。您是否确实知道在将文档加载到web浏览器中后，没有一些javascript加载此信息？您可能需要使用Selenium之类的库。参见文章示例。感谢您的评论Caleb。我不知道是否有javascript加载此信息，是否有办法解决此问题？我会看看你所附的文章。谢谢agian。我会查看“纯html\u text.content”并根据内容构建搜索查询。好的，他们使用Angular，通常是客户端。我还使用了

curl

来请求页面，而您要查找的数据不会返回，因此您需要使用某种工具，可以通过web刮取客户端呈现的网站。希望这个链接能有所帮助，好的luckOne在web抓取中最大的问题是客户端渲染。您是否确实知道在将文档加载到web浏览器中后，没有一些javascript加载此信息？您可能需要使用Selenium之类的库。参见文章示例。感谢您的评论Caleb。我不知道是否有javascript加载此信息，是否有办法解决此问题？我会看看你所附的文章。谢谢agian。我会查看“纯html\u text.content”并根据内容构建搜索查询。好的，他们使用Angular，通常是客户端。我还使用了

curl

来请求页面，而您要查找的数据不会返回，因此您需要使用某种工具，可以通过web刮取客户端呈现的网站。希望这个链接有帮助，幸运的是，事实上，这没有什么区别。你可以自己检查--搜索仍然是一个空列表。好的，我会在我的系统上尝试。实际上，这没有什么区别。你可以自己检查——搜索结果仍然是一个空列表。好的，我将在我的系统上试用。太好了，谢谢！太好了，谢谢你！