Python 找不到Web Scraping Div类

Python 找不到Web Scraping Div类,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图在chambers.com上搜集信息,更具体地说,在本例中。我想要的信息是“排名部门”选项卡上“英国”部分下的不同部门和乐队。以下部分的图像: 我目前遇到的问题是beautiful soup的find_all,我假设是解析器。我想查找所有到目前为止我拥有的代码是: import requests from bs4 import BeautifulSoup url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-globa

我试图在chambers.com上搜集信息,更具体地说,在本例中。我想要的信息是“排名部门”选项卡上“英国”部分下的不同部门和乐队。以下部分的图像:

我目前遇到的问题是beautiful soup的
find_all
,我假设是解析器。我想查找所有
到目前为止我拥有的代码是:

import requests
from bs4 import BeautifulSoup
url_to_scrape = 'https://chambers.com/law-firm/allen-overy-llp-global-2:7'

plain_html_text = requests.get(url_to_scrape)

soup = BeautifulSoup(plain_html_text.content, "lxml")

search = soup.find_all("div", {"class": "mb-3"})

print(search)
列表中没有返回任何内容。我使用浏览器上的inspector从HTML中获取了类

我尝试过将HTML直接添加到pyhton文件中,也尝试过使用
HTML.parser
,但仍然没有返回任何结果


任何帮助都会非常感激,即使它是一个寻找的建议。

而不是写
汤。查找所有(“div”,“class”:“mb-3”})
使用

soup.find_all(“div”,class=“mb-3”})

而不是写
汤。查找所有(“div”,“class”:“mb-3”})
使用

soup.find_all(“div”,class=“mb-3”})

检查页面的来源,您会发现此页面中没有此类元素。刮掉API:

import requests

url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
    if location['description'] == 'UK':
        for info in location['rankedEntities']:
            print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")
打印:

Banking & Finance: Borrowers
Band 1

Banking & Finance: Lenders
Band 1

Banking & Finance: Sponsors
Band 2

Capital Markets: Debt
Band 1

Capital Markets: Derivatives
Band 1

Capital Markets: Equity
Band 1

Capital Markets: Securitisation
Band 1

Capital Markets: Structured Finance
Band 1

Competition Law
Band 2

Corporate M&A (International & Cross-Border)
Band 1

Dispute Resolution: International Arbitration
Band 2

Dispute Resolution: Litigation
Band 1

Disputes (International & Cross-Border)
Band 1

Employment
Band 2

Energy & Natural Resources: Oil & Gas
Band 1

Energy & Natural Resources: Power
Band 1

Energy & Natural Resources: Renewables & Alternative Energy
Band 1

Energy Sector (International & Cross-Border)
Band 1

Finance & Capital Markets (International & Cross-Border)
Band 1

Insurance: Mainly Policyholders
Band 1

Intellectual Property
Band 2

Intellectual Property: Patent Litigation
Band 1

Investigations & Enforcement (International & Cross-Border)
Band 2

Investment Funds & Asset Management (International & Cross-Border)
Band 2

Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2

Projects
Band 1

Restructuring/Insolvency
Band 1

检查页面的源代码,您会发现此页面中没有此类元素。刮掉API:

import requests

url = 'https://api.chambers.com/api/organisations/7/ranked-departments?publicationTypeGroupId=2'
response = requests.get(url).json()
for location in response['locations']:
    if location['description'] == 'UK':
        for info in location['rankedEntities']:
            print(info["displayName"], info['rankings'][0]['rankingDescription'], sep="\n", end="\n\n")
打印:

Banking & Finance: Borrowers
Band 1

Banking & Finance: Lenders
Band 1

Banking & Finance: Sponsors
Band 2

Capital Markets: Debt
Band 1

Capital Markets: Derivatives
Band 1

Capital Markets: Equity
Band 1

Capital Markets: Securitisation
Band 1

Capital Markets: Structured Finance
Band 1

Competition Law
Band 2

Corporate M&A (International & Cross-Border)
Band 1

Dispute Resolution: International Arbitration
Band 2

Dispute Resolution: Litigation
Band 1

Disputes (International & Cross-Border)
Band 1

Employment
Band 2

Energy & Natural Resources: Oil & Gas
Band 1

Energy & Natural Resources: Power
Band 1

Energy & Natural Resources: Renewables & Alternative Energy
Band 1

Energy Sector (International & Cross-Border)
Band 1

Finance & Capital Markets (International & Cross-Border)
Band 1

Insurance: Mainly Policyholders
Band 1

Intellectual Property
Band 2

Intellectual Property: Patent Litigation
Band 1

Investigations & Enforcement (International & Cross-Border)
Band 2

Investment Funds & Asset Management (International & Cross-Border)
Band 2

Life Sciences & Pharmaceutical Sector (International & Cross-Border)
Band 2

Projects
Band 1

Restructuring/Insolvency
Band 1

web抓取中最大的问题之一是客户端渲染。您是否确实知道在将文档加载到web浏览器中后,没有一些javascript加载此信息?您可能需要使用Selenium之类的库。参见文章示例。感谢您的评论Caleb。我不知道是否有javascript加载此信息,是否有办法解决此问题?我会看看你所附的文章。谢谢agian。我会查看“纯html\u text.content”并根据内容构建搜索查询。好的,他们使用Angular,通常是客户端。我还使用了
curl
来请求页面,而您要查找的数据不会返回,因此您需要使用某种工具,可以通过web刮取客户端呈现的网站。希望这个链接能有所帮助,好的luckOne在web抓取中最大的问题是客户端渲染。您是否确实知道在将文档加载到web浏览器中后,没有一些javascript加载此信息?您可能需要使用Selenium之类的库。参见文章示例。感谢您的评论Caleb。我不知道是否有javascript加载此信息,是否有办法解决此问题?我会看看你所附的文章。谢谢agian。我会查看“纯html\u text.content”并根据内容构建搜索查询。好的,他们使用Angular,通常是客户端。我还使用了
curl
来请求页面,而您要查找的数据不会返回,因此您需要使用某种工具,可以通过web刮取客户端呈现的网站。希望这个链接有帮助,幸运的是,事实上,这没有什么区别。你可以自己检查--搜索仍然是一个空列表。好的,我会在我的系统上尝试。实际上,这没有什么区别。你可以自己检查——搜索结果仍然是一个空列表。好的,我将在我的系统上试用。太好了,谢谢!太好了,谢谢你!