Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/305.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python–;需要帮助存储<;img>;src&x27;s在CSV中,从CSV列表下载图像_Python_Pandas_Csv_Python Requests_Urllib - Fatal编程技术网

Python–;需要帮助存储<;img>;src&x27;s在CSV中,从CSV列表下载图像

Python–;需要帮助存储<;img>;src&x27;s在CSV中,从CSV列表下载图像,python,pandas,csv,python-requests,urllib,Python,Pandas,Csv,Python Requests,Urllib,我需要帮助。 此代码当前从所有站点获取所有src属性 srcs=[img['src']表示汤中的img.findAll('img')] #下面的代码=Writer写入所有URL,并在其后面加逗号 打印('将URL下载到文件') 睡眠(1) 将open('output.csv','w',newline='\n',encoding='utf-8')作为csvfile: writer=csv.writer(csvfile) writer.writerow(srcs) #下面是只从第一个url下载图像的

我需要帮助。

此代码当前从所有站点获取所有src属性 srcs=[img['src']表示汤中的img.findAll('img')] #下面的代码=Writer写入所有URL,并在其后面加逗号 打印('将URL下载到文件') 睡眠(1) 将open('output.csv','w',newline='\n',encoding='utf-8')作为csvfile: writer=csv.writer(csvfile) writer.writerow(srcs) #下面是只从第一个url下载图像的代码。我打算让代码从所有URL下载所有图像 打印('将图像下载到文件夹') 睡眠(1) filename=“输出” 将open(“{0}.csv.”格式(文件名),'r')作为csvfile: #在所有行上迭代 i=0 对于csvfile中的行: 分割线=线。分割(',')) #检查是否有图像URL 如果拆分了_行[1]!=''和分裂的_线[1]!=“\n”: urllib.request.urlretrieve(拆分的_行[1],“img_”+str(i)+.png”) 打印(“为{0}保存的图像”。格式(拆分的_行[0])) i+=1 其他: 打印({0}没有结果)。格式(拆分的_行[0]))
以下是无CSVsless解决方案:

from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import base64
import csv
import time





# Get site
headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }
page = driver.page_source
soup = BeautifulSoup(page)
# Gets srcs from all <img> from site 
srcs = [img['src'] for img in soup.findAll('img')]




# BELOW code = Writer writes all urls WITH comma after them

print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='\n', encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(srcs)



# Below is the code that only downloads the image from the first url. I intend for the code to download all images from all urls

print ('Downloading images to folder')
sleep(1)

filename = "output"

with open("{0}.csv".format(filename), 'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
            print ("Image saved for {0}".format(splitted_line[0]))
            i += 1
        else:
            print ("No result for {0}".format(splitted_line[0]))
样本输出:

import os
import requests
import urllib.request
from bs4 import BeautifulSoup

page = requests.get('https://igromania.ru').text
soup = BeautifulSoup(page)
tags = soup.findAll('img')

for tag in tags:
    url = tag['src']
    try:
        urllib.request.urlretrieve(url, os.path.basename(url))
        print(f'Image downloaded: {url}')
    except ValueError:
        print(f'Error downloading: {url}')

下面是另一个保持CSV的解决方案

Error downloading: //cdn.igromania.ru/-Engine-/SiteTemplates/igromania/images/logo_mania.png
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/b/8/b/2904/preview/3d0a4043f5dfd3e9443ce0b27d2a8329_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/7/c/7/3124/preview/8df8f4505157e4928187b5450c03e82b_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/c/6/8/2912/preview/4a70f416181b77f6b543053ea8e5d300_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/2/e/0/3123/preview/0eb2f280f1b9e089d5a12bc0df1120bc_400x225.jpg
Image downloaded: https://cdn.igromania.ru/mnt/mainpage_promo/c/9/2/3130/preview/29e962c5444f67fa95b3714c7ae7683f_400x225.jpg

我的目标是将所有img URL存储到CSV中,然后从CSVMY中的URL下载图像。非常感谢
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import base64
import csv
import time


# Get site
headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }
    
#page = driver.page_source
page = "https://unsplash.com/"
r = requests.get(page)
soup = BeautifulSoup(r.text, "html.parser")
# Gets srcs from all <img> from site 
srcs = [img['src'] for img in soup.findAll('img')]

# BELOW code = Writer writes all urls WITH comma after them

print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='\n', encoding='utf-8') as csvfile:
#    writer = csv.writer(csvfile)
    for i,s in enumerate(srcs):  # each image number and URL
       csvfile.write(str(i) +','+s+'\n')

# Below is the code that only downloads the image from the first url. I intend for the code to download all images from all urls

print ('Downloading images to folder')
sleep(1)

filename = "output"

with open("{0}.csv".format(filename), 'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
            print ("Image saved for {0}".format(splitted_line[0]))
            i += 1
        else:
            print ("No result for {0}".format(splitted_line[0]))
0,https://sb.scorecardresearch.com/p?c1=2&c2=32343279&cv=2.0&cj=1
1,https://images.unsplash.com/photo-1597523565663-916cf059f524?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format%2Ccompress&fit=crop&w=1000&h=1000
2,https://images.unsplash.com/profile-1574526450714-e5d331168827image?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
3,https://images.unsplash.com/photo-1599687350404-88b32c067289?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
4,https://images.unsplash.com/profile-1583427783052-3da8ceab5579image?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
5,https://images.unsplash.com/photo-1600181957705-92f267a2740e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
6,https://images.unsplash.com/profile-1545567671893-842f479b15e2?auto=format&fit=crop&w=32&h=32&q=60&crop=faces&bg=fff
7,https://images.unsplash.com/photo-1600187723541-04457a98cc47?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
8,https://images.unsplash.com/photo-1599687350404-88b32c067289?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
9,https://images.unsplash.com/photo-1600181957705-92f267a2740e?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
10,https://images.unsplash.com/photo-1600187723541-04457a98cc47?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80