Python 使用boto将公共URL上可用的图像上载到S3
我在PythonWeb环境中工作,我可以使用boto的key.set_contents_from_filename(path/to/file)将文件从文件系统上传到S3。但是,我想上传一张已经在网上的图片(比如说) 我是否应该将映像下载到文件系统,然后像往常一样使用boto将其上载到S3,然后删除映像 理想的情况是,如果有一种方法可以从_文件或其他一些命令中获取boto的key.set_contents_,这些命令可以接受URL并将图像很好地流式传输到S3,而无需将文件副本显式下载到我的服务器Python 使用boto将公共URL上可用的图像上载到S3,python,django,amazon-s3,boto,Python,Django,Amazon S3,Boto,我在PythonWeb环境中工作,我可以使用boto的key.set_contents_from_filename(path/to/file)将文件从文件系统上传到S3。但是,我想上传一张已经在网上的图片(比如说) 我是否应该将映像下载到文件系统,然后像往常一样使用boto将其上载到S3,然后删除映像 理想的情况是,如果有一种方法可以从_文件或其他一些命令中获取boto的key.set_contents_,这些命令可以接受URL并将图像很好地流式传输到S3,而无需将文件副本显式下载到我的服务器
def upload(url):
try:
conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
bucket_name = settings.AWS_STORAGE_BUCKET_NAME
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = "test"
k.set_contents_from_file(url)
k.make_public()
return "Success?"
except Exception, e:
return e
如上所述,使用来自文件的set_contents_,我得到一个“string对象没有属性'tell'”错误。将set_contents_from_filename与url一起使用,我得到一个无此类文件或目录错误。这篇文章没有提到上传本地文件,也没有提到上传远程存储的文件。不幸的是,确实没有办法做到这一点。至少现在没有。我们可以在boto中添加一个方法,比如说从url设置内容,但该方法仍然需要将文件下载到本地计算机,然后上传。这可能仍然是一种方便的方法,但它不会为您节省任何东西
为了做您真正想做的事情,我们需要在S3服务本身上有一些功能,允许我们将URL传递给它,并让它为我们将URL存储到一个bucket中。这听起来是一个非常有用的功能。您可能想将其发布到S3论坛。好的,从@garnaat,听起来S3目前不允许通过url上传。我只通过将远程图像读入内存,成功地将它们上传到S3。这很有效
def upload(url):
try:
conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
bucket_name = settings.AWS_STORAGE_BUCKET_NAME
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
k.key = url.split('/')[::-1][0] # In my situation, ids at the end are unique
file_object = urllib2.urlopen(url) # 'Like' a file object
fp = StringIO.StringIO(file_object.read()) # Wrap object
k.set_contents_from_file(fp)
return "Success"
except Exception, e:
return e
同样感谢2017年对此问题的相关回答,该回答使用了官方的“boto3”包(而不是原始答案中的旧“boto”包): Python 3.5 如果您正在进行干净的Python安装,pip将首先安装这两个软件包:
pip安装boto3
pip安装请求
import boto3
import requests
# Uses the creds in ~/.aws/credentials
s3 = boto3.resource('s3')
bucket_name_to_upload_image_to = 'photos'
s3_image_filename = 'test_s3_image.png'
internet_image_url = 'https://docs.python.org/3.7/_static/py.png'
# Do this as a quick and easy check to make sure your S3 access is OK
for bucket in s3.buckets.all():
if bucket.name == bucket_name_to_upload_image_to:
print('Good to go. Found the bucket to upload the image into.')
good_to_go = True
if not good_to_go:
print('Not seeing your s3 bucket, might want to double check permissions in IAM')
# Given an Internet-accessible URL, download the image and upload it to S3,
# without needing to persist the image to disk locally
req_for_image = requests.get(internet_image_url, stream=True)
file_object_from_req = req_for_image.raw
req_data = file_object_from_req.read()
# Do the actual upload to s3
s3.Bucket(bucket_name_to_upload_image_to).put_object(Key=s3_image_filename, Body=req_data)
使用boto3
upload\u fileobj
方法,您可以将文件流式传输到S3存储桶,而无需保存到磁盘。以下是我的功能:
import boto3
import StringIO
import contextlib
import requests
def upload(url):
# Get the service client
s3 = boto3.client('s3')
# Rember to se stream = True.
with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
# Set up file stream from response content.
fp = StringIO.StringIO(response.content)
# Upload data to S3
s3.upload_fileobj(fp, 'my-bucket', 'my-dir/' + url.split('/')[-1])
我用boto3尝试了以下方法,效果很好:
import boto3;
import contextlib;
import requests;
from io import BytesIO;
s3 = boto3.resource('s3');
s3Client = boto3.client('s3')
for bucket in s3.buckets.all():
print(bucket.name)
url = "@resource url";
with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
# Set up file stream from response content.
fp = BytesIO(response.content)
# Upload data to S3
s3Client.upload_fileobj(fp, 'aws-books', 'reviews_Electronics_5.json.gz')
下面是我使用的方法,关键是在最初发出请求时设置stream=True
,并使用upload.fileobj()
方法上传到s3:
import requests
import boto3
url = "https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg"
r = requests.get(url, stream=True)
session = boto3.Session()
s3 = session.resource('s3')
bucket_name = 'your-bucket-name'
key = 'your-key-name' # key is the name of file on your bucket
bucket = s3.Bucket(bucket_name)
bucket.upload_fileobj(r.raw, key)
一个简单的3行实现,可在现成的lambda上工作:
import boto3
import requests
s3_object = boto3.resource('s3').Object(bucket_name, object_key)
with requests.get(url, stream=True) as r:
s3_object.put(Body=r.content)
.get
部分的源代码直接来自于S3,目前似乎不支持远程上传。您可以使用下面的类将图像上载到S3。这里的上传方法首先尝试下载图像,并将其保存在内存中一段时间,直到上传为止。为了能够连接到S3,您必须使用命令pip install awscli
安装AWS CLI,然后使用命令AWS configure
输入一些凭据:
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id
您只是想避免写入磁盘吗?或者你是在试图避免将文件传输到你的机器上吗?嗯,理想情况下,URL可以传递到S3,这样我的服务器就不必写入磁盘或加载内存。我认为这不是S3服务的合理期望。如果我的服务器必须处理这个问题,我宁愿不写磁盘。谢谢,很高兴知道我没有错过一个潜在有用的S3功能。我在论坛中记录了一个功能请求。这可以通过使用boto的
upload\u fileobj()
以stream=True
流式传输请求内容来完成。有关详细信息,请参见下面的回答。我不是100%确定,但我相信url.split('/')[::-1][0]
可以简单地重写为url.split('/')[-1]
。我的意思是,我想不出任何情况下的结果会有什么不同。我得到了上述方法的异常:S3上传异常:_send_request()接受5个位置参数,但6个是given@ifti看起来你可能遇到了这个错误-看起来它现在已经被修复了。我正在学习boto并对AWS更加熟悉。你能用外行的话告诉我为什么你不能只做s3=boto3.resource('s3')
?默认会话是否已启动?@heartmo此处的讨论提供了有关客户端、会话和资源之间差异的详细概述。您尝试了哪些文件类型?从s3打开我的jpg文件时,它们已损坏。
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id
from io import BytesIO
def send_image_to_s3(url, name):
print("sending image")
bucket_name = 'XXX'
AWS_SECRET_ACCESS_KEY = "XXX"
AWS_ACCESS_KEY_ID = "XXX"
s3 = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
response = requests.get(url)
img = BytesIO(response.content)
file_name = f'path/{name}'
print('sending {}'.format(file_name))
r = s3.upload_fileobj(img, bucket_name, file_name)
s3_path = 'path/' + name
return s3_path