Python 使用BeautifulSoup和请求提取数据_Python_Python Requests

Python 使用BeautifulSoup和请求提取数据

python

Python 使用BeautifulSoup和请求提取数据,python,python-requests,Python,Python Requests,我想循环浏览并从Unsplash下载狗的图片。但是，当我使用BeautifulSoup访问div时，只有一些循环显示div类中的URL。有什么办法解决这个问题吗我的代码如下： import requests from bs4 import BeautifulSoup as soup import os res = requests.get('https://unsplash.com/s/photos/shiba') doggo_soup = soup(res.text,'html.par

我想循环浏览并从Unsplash下载狗的图片。但是，当我使用BeautifulSoup访问div时，只有一些循环显示div类中的URL。有什么办法解决这个问题吗

我的代码如下：


import requests
from bs4 import BeautifulSoup as soup
import os

res = requests.get('https://unsplash.com/s/photos/shiba')

doggo_soup = soup(res.text,'html.parser')

containers = doggo_soup.findAll('div',{'class','IEpfq'})

if not os.path.exists('shiba'):
    os.makedirs('shiba')

os.chdir('shiba')

index = 1

for container in containers:
    img_tag = container.img
    source = requests.get(img_tag)
    with open('shiba-'+str(index)+'jpg','wb') as output:
        output.write(source.content)

当我在开发人员控制台上检查div类IEpfq时，所有div类IEpfq都包含图片的URL

然而，当我运行代码时，它只显示了第4张图片之后相同div类下的部分信息（没有URL）。（如上面的输出所示）任何帮助都将不胜感激

你的代码有几个问题，试试这个，它对我有用。我添加了异常处理程序，以便在任何图像下载失败时继续该过程，并且您的代码不会在每次迭代时更新

索引

计数器：

import requests
from bs4 import BeautifulSoup as soup
import os

res = requests.get('https://unsplash.com/s/photos/shiba')

doggo_soup = soup(res.text,'html.parser')

containers = doggo_soup.findAll('div',{'class','IEpfq'})

if not os.path.exists('shiba'):
    os.makedirs('shiba')

os.chdir('shiba')

index = 1

for container in containers:
    try:
        img_tag = container.img
        source = requests.get(img_tag.get('src'))
        with open('shiba-'+str(index)+'.jpg','wb') as output:
            output.write(source.content)
        index += 1
    except:
      pass

这是一个稍加修改的代码。它为我下载了20张照片

import requests
from bs4 import BeautifulSoup as soup
import os

res = requests.get('https://unsplash.com/s/photos/shiba')

doggo_soup = soup(res.text,'html.parser')

containers = doggo_soup.find_all('div',class_='_2BSIe _3pmDG')


if not os.path.exists('shiba'):
    os.makedirs('shiba', exist_ok=True)

index = 1

for container in containers:
    imgUrl = container.find('a')['href']
    source = requests.get(imgUrl)
    imageFile = open(os.path.join('shiba', os.path.basename(str(index) + '.jpg')), 'wb')
    for chunk in source.iter_content(1000000):
        imageFile.write(chunk)
    imageFile.close()
    index +=1

你能举一个你得到的部分信息的例子吗？当我使用脚本（第四个容器）时，它将输出显示如下。无论我检查元素时，它将显示图片下载的URL。请将其添加到问题正文中，以便问题包含回答问题所需的所有信息。您好，我试过你的代码，它是有效的。但它只提取页面中的前3张图片。当我检查元素时，可以在开发人员控制台中找到第四张图片之后的URL。但是，当我运行脚本时。它无法检测或检索URL。谢谢！它对我也有用！我可以知道你是如何找到这个类的吗。我试图用Inspect找到它，但没有成功。我在Chrome中使用了Isnpect工具。使用“拾取元素”并指向图片上的“下载照片”按钮。

import requests
from bs4 import BeautifulSoup as soup
import os

res = requests.get('https://unsplash.com/s/photos/shiba')

doggo_soup = soup(res.text,'html.parser')

containers = doggo_soup.find_all('div',class_='_2BSIe _3pmDG')


if not os.path.exists('shiba'):
    os.makedirs('shiba', exist_ok=True)

index = 1

for container in containers:
    imgUrl = container.find('a')['href']
    source = requests.get(imgUrl)
    imageFile = open(os.path.join('shiba', os.path.basename(str(index) + '.jpg')), 'wb')
    for chunk in source.iter_content(1000000):
        imageFile.write(chunk)
    imageFile.close()
    index +=1