Python–；需要帮助存储<；img>；src&x27；s在CSV中，从CSV列表下载图像_Python_Pandas_Csv_Python Requests_Urllib

Python–；需要帮助存储<；img>；src&x27；s在CSV中，从CSV列表下载图像

python pandas csv

Python–；需要帮助存储<；img>；src&x27；s在CSV中，从CSV列表下载图像,python,pandas,csv,python-requests,urllib,Python,Pandas,Csv,Python Requests,Urllib,我需要帮助。此代码当前从所有站点获取所有src属性 srcs=[img['src']表示汤中的img.findAll（'img'）] #下面的代码=Writer写入所有URL，并在其后面加逗号打印（'将URL下载到文件'）睡眠（1）将open（'output.csv'，'w'，newline='\n'，encoding='utf-8'）作为csvfile： writer=csv.writer（csvfile） writer.writerow（srcs） #下面是只从第一个url下载图像的

我需要帮助。

此代码当前从所有站点获取所有src属性 srcs=[img['src']表示汤中的img.findAll（'img'）] #下面的代码=Writer写入所有URL，并在其后面加逗号打印（'将URL下载到文件'）睡眠（1）将open（'output.csv'，'w'，newline='\n'，encoding='utf-8'）作为csvfile： writer=csv.writer（csvfile） writer.writerow（srcs） #下面是只从第一个url下载图像的代码。我打算让代码从所有URL下载所有图像打印（'将图像下载到文件夹'）睡眠（1） filename=“输出” 将open（“{0}.csv.”格式（文件名），'r'）作为csvfile： #在所有行上迭代 i=0 对于csvfile中的行：分割线=线。分割（'，'）） #检查是否有图像URL 如果拆分了_行[1]！=''和分裂的_线[1]！=“\n”： urllib.request.urlretrieve（拆分的_行[1]，“img_”+str（i）+.png”）打印（“为{0}保存的图像”。格式（拆分的_行[0]）） i+=1 其他：打印（{0}没有结果）。格式（拆分的_行[0]））

以下是无CSVsless解决方案：

from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import base64
import csv
import time





# Get site
headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }
page = driver.page_source
soup = BeautifulSoup(page)
# Gets srcs from all <img> from site 
srcs = [img['src'] for img in soup.findAll('img')]




# BELOW code = Writer writes all urls WITH comma after them

print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='\n', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(srcs)



# Below is the code that only downloads the image from the first url. I intend for the code to download all images from all urls

print ('Downloading images to folder')
sleep(1)

filename = "output"

with open("{0}.csv".format(filename), 'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
            print ("Image saved for {0}".format(splitted_line[0]))
            i += 1
        else:
            print ("No result for {0}".format(splitted_line[0]))

样本输出：

import os
import requests
import urllib.request
from bs4 import BeautifulSoup

page = requests.get('https://igromania.ru').text
soup = BeautifulSoup(page)
tags = soup.findAll('img')

for tag in tags:
    url = tag['src']
    try:
        urllib.request.urlretrieve(url, os.path.basename(url))
        print(f'Image downloaded: {url}')
    except ValueError:
        print(f'Error downloading: {url}')

下面是另一个保持CSV的解决方案

Error downloading: //cdn.igromania.ru/-Engine-/SiteTemplates/igromania/images/logo_mania.png
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/b/8/b/2904/preview/3d0a4043f5dfd3e9443ce0b27d2a8329_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/7/c/7/3124/preview/8df8f4505157e4928187b5450c03e82b_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/c/6/8/2912/preview/4a70f416181b77f6b543053ea8e5d300_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/2/e/0/3123/preview/0eb2f280f1b9e089d5a12bc0df1120bc_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/c/9/2/3130/preview/29e962c5444f67fa95b3714c7ae7683f_400x225.jpg

我的目标是将所有img URL存储到CSV中，然后从CSVMY中的URL下载图像。非常感谢

from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import base64
import csv
import time


# Get site
headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }
    
#page = driver.page_source
page = "https://unsplash.com/"
r = requests.get(page)
soup = BeautifulSoup(r.text, "html.parser")
# Gets srcs from all <img> from site 
srcs = [img['src'] for img in soup.findAll('img')]

# BELOW code = Writer writes all urls WITH comma after them

print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='\n', encoding='utf-8') as csvfile:
#    writer = csv.writer(csvfile)
    for i,s in enumerate(srcs):  # each image number and URL
       csvfile.write(str(i) +','+s+'\n')

# Below is the code that only downloads the image from the first url. I intend for the code to download all images from all urls

print ('Downloading images to folder')
sleep(1)

filename = "output"

with open("{0}.csv".format(filename), 'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
            print ("Image saved for {0}".format(splitted_line[0]))
            i += 1
        else:
            print ("No result for {0}".format(splitted_line[0]))

0,https://sb.scorecardresearch.com/p?c1=2&c2=32343279&cv=2.0&cj=1
1,https://images.unsplash.com/photo-1597523565663-916cf059f524?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format%2Ccompress&fit=crop&w=1000&h=1000
2,https://images.unsplash.com/profile-1574526450714-e5d331168827image?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
3,https://images.unsplash.com/photo-1599687350404-88b32c067289?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
4,https://images.unsplash.com/profile-1583427783052-3da8ceab5579image?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
5,https://images.unsplash.com/photo-1600181957705-92f267a2740e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
6,https://images.unsplash.com/profile-1545567671893-842f479b15e2?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
7,https://images.unsplash.com/photo-1600187723541-04457a98cc47?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
8,https://images.unsplash.com/photo-1599687350404-88b32c067289?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
9,https://images.unsplash.com/photo-1600181957705-92f267a2740e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
10,https://images.unsplash.com/photo-1600187723541-04457a98cc47?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80