Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/amazon-s3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python中的直接请求从S3下载_Python_Amazon S3_Urllib - Fatal编程技术网

使用Python中的直接请求从S3下载

使用Python中的直接请求从S3下载,python,amazon-s3,urllib,Python,Amazon S3,Urllib,给定 bucket = 'mybucket' aws_id = '.....' aws_secret_key = '........' file_key = '/some/file/key' range = '40-2000' 我想通过Python发送一个请求,以获取文件的相应部分 我将EC2(第一个)示例改编自: 我费力地阅读了boto3的源代码,试图理解这样一个直接请求的机制,但无法集中讨论使用requests/urllib使代码段正常工作需要做些什么 有人能指出完成改编所缺少的内容吗?

给定

bucket = 'mybucket'
aws_id = '.....'
aws_secret_key = '........'

file_key = '/some/file/key'
range = '40-2000'
我想通过Python发送一个请求,以获取文件的相应部分

我将EC2(第一个)示例改编自:

我费力地阅读了boto3的源代码,试图理解这样一个直接请求的机制,但无法集中讨论使用requests/urllib使代码段正常工作需要做些什么


有人能指出完成改编所缺少的内容吗?

如果您只是想下载文件内容以便在python中使用,下面是我的代码的简短版本

import boto3

aws = boto3.session.Session(profile_name='maintenance')
s3 = aws.client('s3, region_name='us-west-2')

data = s3.get_object(
    Bucket='my_bucket_name',
    Key='/path/of/s3/key'
)['Body'].read()
现在您拥有了整个文件,可以像处理其他代码一样处理它

编辑:听起来你好像还没有设置凭据或任何东西。boto3(和大多数amazon CLI产品)需要以下格式的凭据文件:

名称:
~/.aws/credentials

[default]
aws_secret_access_key = 9087OKLJHAFWSKLDJGHNAKLJHWR34K (random keys typed by me)
aws_access_key_id = MORERANDOMKEYSTOFILLTHESPACE

创建该文件,我想您就可以设置了。

请求是http
GET
。我过去经常看到它的样子,这是需要的:

s3_message_parts = ['GET {} HTTP/1.1',
                        'Host: {}',
                        'Connection: keep-alive',
                        'Accept-Encoding: gzip, deflate',
                        'Accept: */*',
                        'User-Agent: ssup',
                        'X-Amz-Content-Sha256: UNSIGNED-PAYLOAD',
                        'Range: bytes={}-{}',
                        'X-Amz-Date: {}',
                        'Authorization: {}',
                        '\r\n']
2个棘手的部分:

  • 给定一个bucket和一个key,找出要与之交互的主机/端点

  • 正确填写
    授权
    标题

  • 我没有解1,只是提供了我为我的桶预先找到的端点

    到了第二天,我通过查看优秀的图书馆了解了签名流程

    整个操纵器看起来像这样():


    只要存在sha256和hmac实现,逻辑就相当可移植。希望这能派上用场。

    您只是想获取存储在S3中的文件内容吗?是的,每次您都会以极其困难的方式获取特定部分。您在帖子上添加了
    boto3
    ,但未使用它。这正是你应该期待的。看到我下面的答案。所以你用
    boto3
    标记了它,告诉我们你已经尝试了
    boto3
    ,但无法让它工作,但现在你说你有其他理由不使用它?对不起,巴德。你自己在尝试手工拼凑。不寻常和模糊的需求是你必须在问题文本的早期解释的东西。boto3显然是这里要使用的库。如果要求不使用boto3,那么问题就不应该用
    boto3
    tag来标记。boto3是一个很棒的库,它有一个“Range=”我需要在这里下载部分文件。然而,我想知道如何在“香草”python中实现它。欣赏help@Jay说你想用“普通”python来做这件事就像说你想用二进制机器代码来做一样。它更复杂,容易出错,将来很可能会中断,而且很难实现。您使用Python是因为它比用原始机器代码编写更容易。出于同样的原因,您应该使用由AWS提供并由AWS支持的boto3库。因此,您对它没有特定的要求,您只需要自己编写与RESTAPI的交互。你本可以这么说的。普通的例子也是如此。自动气象站
    s3_message_parts = ['GET {} HTTP/1.1',
                            'Host: {}',
                            'Connection: keep-alive',
                            'Accept-Encoding: gzip, deflate',
                            'Accept: */*',
                            'User-Agent: ssup',
                            'X-Amz-Content-Sha256: UNSIGNED-PAYLOAD',
                            'Range: bytes={}-{}',
                            'X-Amz-Date: {}',
                            'Authorization: {}',
                            '\r\n']
    
    import hashlib, hmac, socket, ssl
    from datetime import datetime
    
    try:
        from urlparse import urlsplit
    except:
        from urllib.parse import urlsplit
    
    
    ALGORTHM = 'AWS4-HMAC-SHA256'
    sign = lambda key, msg: hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()
    
    
    def sign_headers(headers, url, access_key, secret_key, region = 'us-east-1'):
    
        method = 'GET'
    
        # Get host and parsed datetime and date used by AWS
        parsed_url = urlsplit(url)
        host = parsed_url.netloc
        date = datetime.utcnow()
        aws_datetime = date.strftime("%Y%m%dT%H%M%SZ")
        aws_date = date.strftime("%Y%m%d")
    
        # Generate scope and scoped credential strings, and the signing key
        scope = '/'.join([aws_date, region, 's3', 'aws4_request'])
        credential = '/'.join([access_key, scope])
        signing_key = sign(sign(sign(sign(('AWS4' + secret_key).encode('utf-8'), aws_date), region), 's3'), 'aws4_request') 
    
        # Fill up all headers except 'Authorization'
        headers['Host'] = host
        headers['X-Amz-Date'] = aws_datetime
        headers['X-Amz-Content-Sha256'] = u'UNSIGNED-PAYLOAD'
    
        # Format header keys and data for the upcoming AWS atrings
        sorted_headers_string = ';'.join([header.lower().strip() for header in sorted(headers)])
        canonical_header_list = [header.lower().strip() + ':' + str(headers[header]).strip() for header in sorted(headers)]
    
        # Geenerate canonical request and string to be signed
        prefix = [method, parsed_url.path, parsed_url.query]
        suffix =  ['', sorted_headers_string, u'UNSIGNED-PAYLOAD']  # '' to alow 2 '\n'    
        canonical_req = '\n'.join(prefix + canonical_header_list + suffix)
        string_to_sign = '\n'.join([ALGORTHM, aws_datetime, scope, hashlib.sha256(canonical_req.encode('utf-8')).hexdigest()])
        signature = hmac.new(signing_key, string_to_sign.encode('utf-8'), hashlib.sha256).hexdigest()
    
        # Finally generate the Authoization header with signing string_to_sign
        headers['Authorization'] = ALGORTHM + ' Credential=' + credential + ', ' + 'SignedHeaders=' + sorted_headers_string + ', ' + 'Signature=' + signature
    
        return headers
    
    
    def download_s3_chunk(bucket, key, start, end, access_key, secret_key, endpoint = 'https://s3.amazonaws.com', region = 'us-east-1'):
        ''' Download part of an S3 stored file using vanilla Python '''
    
        headers = {'Range': 'bytes={}-{}'.format(start, end), 'User-Agent': 'ssup'}
        headers = sign_headers(headers, endpoint, access_key, secret_key)
    
        # Raw message to send via socket
        s3_message_parts = ['GET {} HTTP/1.1',
                            'Host: {}',
                            'Connection: keep-alive',
                            'Accept-Encoding: gzip, deflate',
                            'Accept: */*',
                            'User-Agent: ssup',
                            'X-Amz-Content-Sha256: UNSIGNED-PAYLOAD',
                            'Range: bytes={}-{}',
                            'X-Amz-Date: {}',
                            'Authorization: {}',
                            '\r\n']
        message_params = '/' + bucket + '/' + key, headers['Host'], start, end, headers['X-Amz-Date'], headers['Authorization']
    
        s3_download_message = '\r\n'.join(s3_message_parts).format(message_params)
    
        s = ssl.wrap_socket(socket.socket())
        s.connect(('s3.amazonaws.com', 443))
        s.sendall(s3_download_message)
    
        #Implement proper retrieval loop  
        return s.recv(), s.recv()
    
    if __name__=='__main__':
    
        # Adjust to get arguments from command prompt
        from sys import argv as args
    
        # Credentials
        access_key = 'access'
        secret_key = 'secret'
    
        # Bucket, key and location info
        bucket = 'my_bucket'
        key = 'my_key'
    
        # Chunk of key to download
        start = 20
        end = 100
    
        header, chunk = download_s3_chunk(bucket, key, start, end, access_key, secret_key)