Python Beauty Soup 4 findall（）与<；中的元素不匹配；img>；标签_Python_Python 3.x_Beautifulsoup

Python Beauty Soup 4 findall（）与<；中的元素不匹配；img>；标签

python python-3.x

Python Beauty Soup 4 findall（）与<；中的元素不匹配；img>；标签,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我正在尝试使用BeautifulSoup4来帮助我从Imgur下载图像，尽管我怀疑Imgur的部分是否相关。例如，我在这里使用的网页：我的代码如下： import webbrowser、time、sys、requests、os、bs4#此代码段中并没有使用所有库从selenium导入webdriver browser=webdriver.Firefox（） browser.get（“https://imgur.com/t/lenovo/mLwnorj") res=requests.get(h

我正在尝试使用BeautifulSoup4来帮助我从Imgur下载图像，尽管我怀疑Imgur的部分是否相关。例如，我在这里使用的网页：

我的代码如下：

import webbrowser、time、sys、requests、os、bs4#此代码段中并没有使用所有库
从selenium导入webdriver
browser=webdriver.Firefox（）
browser.get（“https://imgur.com/t/lenovo/mLwnorj")
res=requests.get(https://imgur.com/t/lenovo/mLwnorj)
res.为_状态提高_（）
soup=bs4.BeautifulSoup（res.text，features=“html.parser”）
imageElement=soup.findAll（'img'，{'class'：'post-image-placeholder'}）
打印（图像元素）

Imgur链接上的HTML代码包含以下部分：

<img alt="" src="//i.imgur.com/JfLsH5y.jpg" class="post-image-placeholder" style="max-width: 100%; min-height: 546px;" original-title="">

，只是为了测试，print函数确实返回了一个匹配项，这让我怀疑它是否与标记有关

[<h1 class="post-title">Cable management increases performance. </h1>]

[电缆管理提高了性能。]

感谢您花费的时间和精力

如果网站在页面加载后插入对象，您需要使用Selenium而不是

请求

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://imgur.com/t/lenovo/mLwnorj'
browser = webdriver.Firefox()
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')
images = soup.find_all('img', {'class': 'post-image-placeholder'})

[print(image['src']) for image in images]

# //i.imgur.com/JfLsH5yr.jpg
# //i.imgur.com/lLcKMBzr.jpg

这里的基本问题似乎是当页面第一次加载时，实际的

元素不存在。在我看来，最好的解决方案是利用SeleniumWebDriver来抓取图像。Selenium将允许页面正确呈现（使用JavaScript和all），然后定位您关心的任何元素

例如：

我不能说我已经测试了这段代码，但是一般的概念应该是可行的

更新：

我继续进行测试，修复了代码中的一些错误，然后得到了我希望看到的结果：

您是否运行了

print（res.text）

以在首次请求页面时实际验证图像是否在HTML中？网站加载页面，然后使用JavaScript插入元素是很常见的。@Spenced啊，我刚刚运行了它，但找不到任何图像标记。谢谢你指出这一点！你知道我怎样才能得到更新的HTML吗？非常感谢。是的，等一下，我会按照这些思路给出答案。

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://imgur.com/t/lenovo/mLwnorj'
browser = webdriver.Firefox()
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'html.parser')
images = soup.find_all('img', {'class': 'post-image-placeholder'})

[print(image['src']) for image in images]

# //i.imgur.com/JfLsH5yr.jpg
# //i.imgur.com/lLcKMBzr.jpg

import webbrowser, time, sys, requests, os, bs4      # Not all libraries are used in this code snippet
from selenium import webdriver

# For pretty debugging output
import pprint


browser = webdriver.Firefox()
browser.get("https://imgur.com/t/lenovo/mLwnorj")

# Give the page up to 10 seconds of a grace period to finish rendering
# before complaining about images not being found.
browser.implicitly_wait(10)

# Find elements via Selenium's search
selenium_image_elements = browser.find_elements_by_css_selector('img.post-image-placeholder')
pprint.pprint(selenium_image_elements)

# Use page source to attempt to find them with BeautifulSoup 4
soup = bs4.BeautifulSoup(browser.page_source, features="html.parser")

soup_image_elements = soup.findAll('img', {'class': 'post-image-placeholder'})
pprint.pprint(soup_image_elements)