无法在Python的Beautiful Soup中获取div标记，_Python_Beautifulsoup

无法在Python的Beautiful Soup中获取div标记，

python

无法在Python的Beautiful Soup中获取div标记，,python,beautifulsoup,Python,Beautifulsoup,我正在尝试下载官方网站上所有的口袋妖怪图片。我这样做的原因是因为我想要高质量的图像。下面是我编写的代码 from bs4 import BeautifulSoup as bs4 import requests request = requests.get('https://www.pokemon.com/us/pokedex/') soup = bs4(request.text, 'html') print(soup.findAll('div',{'class':'container

我正在尝试下载官方网站上所有的口袋妖怪图片。我这样做的原因是因为我想要高质量的图像。下面是我编写的代码

from bs4 import BeautifulSoup as bs4
import requests
request = requests.get('https://www.pokemon.com/us/pokedex/')
soup = bs4(request.text, 'html')
print(soup.findAll('div',{'class':'container       pokedex'}))

输出为

[]

有什么我做错了吗？另外，从官方网站上刮东西合法吗？有没有标签或者什么东西能说明这一点？。谢谢

附言：我不熟悉BS和html。

图像是动态加载的，因此您必须使用

selenium

来清除它们。以下是执行此操作的完整代码：

from selenium import webdriver
import time
import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(4)

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:
    img_link = li.find_element_by_xpath('.//img').get_attribute('src')
    name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

    r = requests.get(img_link)
    
    with open(f"D:\\{name}.png", "wb") as f:
        f.write(r.content)

    li_num += 1

driver.close()

输出：

12个口袋妖怪图片。以下是前两张图片：

图1：

图2：

另外，我注意到页面底部有一个加载更多按钮。单击会加载更多图像。单击“加载更多”按钮后，我们必须继续向下滚动以加载更多图像。如果我没有错的话，网站上总共有893张图片。为了刮取所有893图像，您可以使用以下代码：

from selenium import webdriver
import time
import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(3)

load_more = driver.find_element_by_xpath('//*[@id="loadMore"]')

driver.execute_script("arguments[0].click();",load_more)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
        lastCount = lenOfPage
        time.sleep(1.5)
        lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if lastCount==lenOfPage:
            match=True

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:
    img_link = li.find_element_by_xpath('.//img').get_attribute('src')
    name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

    r = requests.get(img_link)
    
    with open(f"D:\\{name}.png", "wb") as f:
        f.write(r.content)

    li_num += 1

driver.close()

如果您先查看“网络”选项卡，这本可以轻松得多：

导入时间
导入请求
端点=”https://www.pokemon.com/us/api/pokedex/kalos"
#包含所有元数据
data=requests.get（endpoint.json（））
#收集保存图片所需的密钥
items=[{“name”：item[“name”]，“link”：item[“ThumbnailImage”]}用于数据中的项]
#删除重复项
d=[dict（t）for t in{tuple（d.items（））for d in items}]
断言len（d）==893
对于d中的口袋妖怪：
response=requests.get（口袋妖怪[“链接”]）
时间。睡眠（1）
以open（f“{pokemon['name']}.png”，“wb”）作为f:
f、 写（response.content）

Hi Shshil。当我在我的机器上运行代码时，它会下载“加载更多口袋妖怪”按钮下面的所有三个图像。无法复制您的结果。您不想要最后三张图像吗？我已更新代码，使其无法下载这三张图像。过来看。另外，编辑后的代码还保存了口袋妖怪的图片，图片的名称不是数字。我不认为从他们的网站上抓取图片是非法的，除非你出于个人需要使用它。如果你将其用于商业目的，那么这绝对是非法的，因为你未经他们允许使用他们的图像。这很好。看看我的答案，看看如果没有selenium@Sushil在“网络标签”上，你是如何做到的！！这比较容易。