Python 如何返回尺寸最大的图像_Python_Opencv_Web Scraping_Computer Vision_Python Imaging Library

Python 如何返回尺寸最大的图像

python opencv web-scraping computer-vision

Python 如何返回尺寸最大的图像,python,opencv,web-scraping,computer-vision,python-imaging-library,Python,Opencv,Web Scraping,Computer Vision,Python Imaging Library,我已经能够从一个页面过滤所有的图像url，并一个接一个地显示它们 import requests from bs4 import BeautifulSoup article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c" response = requests.get(article_URL) soup = bs4.BeautifulS

我已经能够从一个页面过滤所有的图像url，并一个接一个地显示它们

import requests
from bs4 import BeautifulSoup


article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"
response = requests.get(article_URL)
soup = bs4.BeautifulSoup(response.text,'html.parser')
images = soup.find('body').find_all('img')
i = 0
image_url = []
for im in images:
    print(im)
    i+=1
    url = im.get('src')
    image_url.append(url)
    print('Downloading: ', url) 
    try:
        response = requests.get(url, stream=True)
        with open(str(i) + '.jpg', 'wb') as out_file:
            shutil.copyfileobj(response.raw, out_file)
            del response
    except:
        print('Could not download: ', url)

new = [x for x in image_url if x is not None]
for url in new:
    resp = requests.get(url, stream=True).raw
    image = np.asarray(bytearray(resp.read()), dtype="uint8")
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)
#     height, width, channels = image.shape
    height, width, _ = image.shape
    dimension = []
    for items in height, width:
        dimension.append(items)
#     print(height, width)
    print(dimension)

我想打印url列表中维度最大的图像

这是我从列表中得到的不够好的结果

[72, 72]
[95, 96]
[13, 60]
[227, 973]
[17, 60]
[229, 771]

我看到两个问题

您在循环内创建

dimention=[]

以便它删除以前的值。在循环和内部循环使用之前，必须创建

维度=[]

dimension.append( (width, height) )

循环后，您可以使用

max（dimension）

获得与max

width

max(data, key=lambda x:x['width'])

在

维度中只保留宽度、高度
，因此不知道哪个文件具有此维度。你应该保留所有信息
dimension.append( (width, height, url, filename) ) 

data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })



我的版本
我使用字典数据保存所有信息
dimension.append( (width, height, url, filename) ) 

data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

然后我在max（）
中使用key
来获取最大宽度

max(data, key=lambda x:x['width'])

但是我可以用同样的方法使用x['height']
或x['width']*x['height']

import requests
from bs4 import BeautifulSoup
import shutil
import cv2

article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"

response = requests.get(article_URL)
soup = BeautifulSoup(response.text, 'html.parser')
images = soup.find('body').find_all('img')

# --- loop --- 

data = []
i = 0

for img in images:
    print('HTML:', img)
    
    url = img.get('src')

    if url:  # skip `url` with `None`
        print('Downloading:', url) 
        try:
            response = requests.get(url, stream=True)

            i += 1
            url = url.rsplit('?', 1)[0]  # remove ?opt=20 after filename
            ext = url.rsplit('.', 1)[-1] # .png, .jpg, .jpeg
            filename = f'{i}.{ext}' 
            print('Filename:', filename)

            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)

            image = cv2.imread(filename)
            height, width = image.shape[:2]

            data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

        except Exception as ex:
            print('Could not download: ', url)
            print('Exception:', ex)

    print('---')

# --- after loop ---

print('max:', max(data, key=lambda x:x['width']))

all_sorted = sorted(data, key=lambda x:x['width'], reverse=True)

print('Top 3:', all_sorted[:3])
# or
for item in all_sorted[:3]:
    print(item['width'], item['url'])


顺便说一句：仅使用src

 .find_all('img', {'src': True})

我看到两个问题
您在循环内创建dimention=[]
以便它删除以前的值。在循环和内部循环使用之前，必须创建维度=[]

dimension.append( (width, height) )

循环后，您可以使用max（dimension）
获得与maxwidth

max(data, key=lambda x:x['width'])


在维度中只保留宽度、高度
，因此不知道哪个文件具有此维度。你应该保留所有信息
dimension.append( (width, height, url, filename) ) 

data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })



我的版本
我使用字典数据保存所有信息
dimension.append( (width, height, url, filename) ) 

data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

然后我在max（）
中使用key
来获取最大宽度

max(data, key=lambda x:x['width'])

但是我可以用同样的方法使用x['height']
或x['width']*x['height']

import requests
from bs4 import BeautifulSoup
import shutil
import cv2

article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"

response = requests.get(article_URL)
soup = BeautifulSoup(response.text, 'html.parser')
images = soup.find('body').find_all('img')

# --- loop --- 

data = []
i = 0

for img in images:
    print('HTML:', img)
    
    url = img.get('src')

    if url:  # skip `url` with `None`
        print('Downloading:', url) 
        try:
            response = requests.get(url, stream=True)

            i += 1
            url = url.rsplit('?', 1)[0]  # remove ?opt=20 after filename
            ext = url.rsplit('.', 1)[-1] # .png, .jpg, .jpeg
            filename = f'{i}.{ext}' 
            print('Filename:', filename)

            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)

            image = cv2.imread(filename)
            height, width = image.shape[:2]

            data.append({
                'url': url,
                'path': filename,
                'width': width,
                'height': height,
            })

        except Exception as ex:
            print('Could not download: ', url)
            print('Exception:', ex)

    print('---')

# --- after loop ---

print('max:', max(data, key=lambda x:x['width']))

all_sorted = sorted(data, key=lambda x:x['width'], reverse=True)

print('Top 3:', all_sorted[:3])
# or
for item in all_sorted[:3]:
    print(item['width'], item['url'])


顺便说一句：仅使用src

 .find_all('img', {'src': True})

在创建新阵列后，在代码中进行以下更改：
images = []
for url in new:
    resp = requests.get(url, stream=True).raw
    image = np.asarray(bytearray(resp.read()), dtype="uint8")
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)
    images.append((image.shape, image))
# sort images by area (largest to smallest)
images.sort (key = lambda x: x[0][0] * x[0][1], reverse=True)

最大的图像现在位于索引0处，可通过图像[0][1]访问，其形状可使用图像[0][0]打印。您也可以将lambda函数更改为x[0][0]（按高度排序）或x[0][1]（按宽度排序）。
在创建新数组后，在代码中进行以下更改：
images = []
for url in new:
    resp = requests.get(url, stream=True).raw
    image = np.asarray(bytearray(resp.read()), dtype="uint8")
    image = cv2.imdecode(image, cv2.IMREAD_COLOR)
    images.append((image.shape, image))
# sort images by area (largest to smallest)
images.sort (key = lambda x: x[0][0] * x[0][1], reverse=True)

最大的图像现在位于索引0处，可通过图像[0][1]访问，其形状可使用图像[0][0]打印。您也可以将lambda函数更改为x[0][0]（按高度排序）或x[0][1]（按宽度排序）。
使用max（）
获得最大的宽度或高度或宽度*高度我不明白您对高度、宽度：维度中的项目如何处理
-为什么不直接标注。追加（（高度，宽度））
或标注。追加（（高度*宽度，高度，宽度））
和后的-循环最大（尺寸）
。你必须在之前为循环创建维度=[]
。我不明白你为什么要再次下载相同的图像。若你们已经下载并保存在磁盘上，那个么从磁盘读取的速度应该更快。如果你使用cv2.imread（）
我没有下载图像，我只是直接分析图像，我以前试过max，但是结果是一个单数数组，但是首先你使用response=requests.get（url，stream=True）
和open（str（i）+'.jpg'，wb'）
下载文件，但稍后您使用resp=requests.get（url，stream=True）.raw
从服务器获取相同的图像-但您已经在磁盘上使用它使用max（）
获取最大的宽度或高度或宽度*height
我不明白您如何处理高度项目，宽度：尺寸。附加（项目）
-为什么不直接尺寸。附加（（高度，宽度））
或尺寸。附加（（高度*宽度，高度，宽度））
和之后用于-循环最大（尺寸）
。你必须在之前为循环创建维度=[]
。我不明白你为什么要再次下载相同的图像。若你们已经下载并保存在磁盘上，那个么从磁盘读取的速度应该更快。如果你使用cv2.imread（）
我没有下载图像，我只是直接分析图像，我以前试过max，但是结果是一个单数数组，但是首先你使用response=requests.get（url，stream=True）
和open（str（i）+'.jpg'，wb'）
下载文件，但稍后您可以使用resp=requests.get（url，stream=True）.raw
从服务器获取相同的图像-但您已经将其保存在disknice，one上，您可以轻松地帮助打印出实际的图像，而不是像素吗？我不太确定打印图像是什么意思。你想把它展示出来吗？为此，您可以通过导入matplotlib来使用plt.imshow（image），或者使用cv2.imshow（'image'，image）cv2.waitKey（0）。如果你想把它写到磁盘上，你可以使用cv2.imwrite（）。很好，第一，你能很容易地帮助打印出实际的图像，而不是像素吗？我不太清楚你所说的打印图像是什么意思。你想把它展示出来吗？为此，您可以通过导入matplotlib来使用plt.imshow（image），或者使用cv2.imshow（'image'，image）cv2.waitKey（0）。如果您想将其写入磁盘，您可以使用cv2.imwrite（）。除了返回最大url之外，我们还可以返回前3个url吗？您一直在尝试使用排序（data，key=lambda x:x['width'，reverse=True）
而不是max（）
，然后您可以使用[:3]
获取前3个url。除了返回最大url之外，我们还可以返回前3个url吗？我们一直在尝试使用排序（data，key=lambda x:x['width'，reverse=True）
而不是max（）
，然后您可以使用[: