Python 从ULSs下载CSV文件中的照片–；urllib.error.HTTPError:HTTP错误403:禁止_Python_Python 3.x_Beautifulsoup_Urllib_Http Status Code 403

Python 从ULSs下载CSV文件中的照片–；urllib.error.HTTPError:HTTP错误403:禁止

python python-3.x

Python 从ULSs下载CSV文件中的照片–；urllib.error.HTTPError:HTTP错误403:禁止,python,python-3.x,beautifulsoup,urllib,http-status-code-403,Python,Python 3.x,Beautifulsoup,Urllib,Http Status Code 403,我下面的脚本应该是从url列表下载一组图像，但它不断遇到以下错误HTTP错误403:problederror fromraise HTTPError（req.full\u url，code，msg，hdrs，fp）urllib.error.HTTPError:HTTP错误403:probled 不知道该怎么办。你可以自己运行，我提供了下面的所有内容任何帮助都将不胜感激（：目标是从CSV中的URL列表下载一组图像，而不会出现错误403 from bs4 import BeautifulSoup

我下面的脚本应该是从url列表下载一组图像，但它不断遇到以下错误

HTTP错误403:probled

error from

raise HTTPError（req.full\u url，code，msg，hdrs，fp）urllib.error.HTTPError:HTTP错误403:probled

不知道该怎么办。你可以自己运行，我提供了下面的所有内容

任何帮助都将不胜感激（：

目标是从CSV中的URL列表下载一组图像，而不会出现错误403

from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import requests
import praw
import csv

r = praw.Reddit(client_id=client_id,
                client_secret=client_secret, 
                user_agent=user_agent,
                username=username,
                password=password)

subred = r.subreddit("partyparrot")
top = subred.top(limit = 780)
type(top)
x = next(top)
dir(x)

with open("output_reddit.csv", 'r') as csvfile:

    headers = {
                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
                'Accept-Encoding': 'none',
                'Accept-Language': 'en-US,en;q=0.8',
                'Connection': 'keep-alive',
                'Access-Control-Allow-Origin': '*',
                'Access-Control-Allow-Methods': 'GET',
                'Access-Control-Allow-Headers': 'Content-Type',
                'Access-Control-Max-Age': '3600',
                'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
                }

    for line in csvfile:
        splitted_line = line.split('||')
        if splitted_line[2] != '' and splitted_line[2] != "\n" and ".png" in splitted_line[2]:
            urllib.request.urlretrieve(splitted_line[2], filename=("img_" + splitted_line[0] + ".png")) 
            print ("Image saved for {0}".format(splitted_line[0]))

        elif splitted_line[2] != '' and splitted_line[2] != "\n" and ".jpg" in splitted_line[2]:
            urllib.request.urlretrieve(splitted_line[2], filename=("img_" + splitted_line[0] + ".jpg")) 
            print ("Image saved for {0}".format(splitted_line[0]))

        elif splitted_line[2] != '' and splitted_line[2] != "\n" and "v.redd.it" in splitted_line[2]:

            urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4")) 
            print ("Image saved for {0}".format(splitted_line[0]))

        else:
            print ("No result for {0}".format(splitted_line[0]))

下面是供参考的

输出\u reddit.csv

文件

2||I tried the no pet challenge... she wasn't having it||https://v.redd.it/da60x1qizgs51
3||My trip to the salon went horribly wrong.||https://v.redd.it/tfzc1vye6ds51
4||A few sketches of my macaw buddy from work. Haven't seen this silly girl in six months due to quarantine, I miss her.||https://i.redd.it/jjkb3b5ntis51.jpg
5||Thermals of the party girl!||https://i.imgur.com/rfGChUQ.jpg
6||I present you with Lorena. After rescue, I found out shes an old bird and mostly blind. Once allowed out of her cage to roam free and was given plenty of wonderful fruits and veggies, she became very warm and cuddly. Shes a very sweet regal lady and definitely a queen.||https://v.redd.it/saojuaycnds51
7||A day in the life of the OG Party Parrot. Credit: Ranger Sarah Little.||https://i.redd.it/wjwvl3u01js51.jpg
8||Party game||https://v.redd.it/8myoampepgs51
9||Here I present to you the Christmas loving partyparrot named Felix. He loved sitting in the tree but never chewed on it. Now rest in peace, little friend we will allways love and remember you.||https://i.redd.it/wengcned7is51.jpg

下面是完整的日志，以供参考

Matts-MacBook-Pro-5:Download matt$ python run.py
Image saved for 2
Image saved for 3
Image saved for 4
Image saved for 5
Traceback (most recent call last):
  File "run.py", line 107, in <module>
    urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4")) 
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Matts-MacBook-Pro-5：下载matt$python run.py
为2保存图像
图像已保存3分钟
图像保存为4
图像已保存5分钟
回溯（最近一次呼叫最后一次）：
文件“run.py”，第107行，在
urllib.request.urlretrieve（拆分的_行[2].rstrip（）+“/DASH_720.mp4”，文件名=（“img_行+拆分的_行[0]+”.mp4”））
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第247行，在urlretrieve中
使用contextlib.closing（urlopen（url，data））作为fp:
urlopen中的文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第222行
返回opener.open（url、数据、超时）
打开文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第531行
响应=方法（请求，响应）
http_响应中的文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第640行
响应=self.parent.error(
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第569行出错
返回自我。调用链（*args）
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第502行，在调用链中
结果=func（*args）
文件“/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py”，第649行，默认为http\u error\u
raise HTTPError（请求完整的url、代码、消息、hdrs、fp）
urllib.error.HTTPError:HTTP错误403:禁止

您能从列表中提供一个url吗？这样更便于测试。@Sushil我在上面添加了“output\u reddit.csv”，并将url引用为

分割线[2]

在脚本中，我用一个图像进行了尝试，结果成功了。请查看我的编辑。您是否尝试检查实际访问的URL？您是否理解错误消息的含义？您是否可以选择一个不起作用的URL，并在web浏览器中进行尝试？@KarlKnechtel是的，URL在上面，因此您也可以查看它们。它链接到实际的URLl图像或视频的来源和错误403表示请求的URL被禁止，主要是与客户端相关的问题，但我不确定如何修复。这可能是标题问题