Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/video/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 忽略athena输出中的.csv.metadata文件_Python_Amazon Web Services_Amazon S3_Boto3_Amazon Athena - Fatal编程技术网

Python 忽略athena输出中的.csv.metadata文件

Python 忽略athena输出中的.csv.metadata文件,python,amazon-web-services,amazon-s3,boto3,amazon-athena,Python,Amazon Web Services,Amazon S3,Boto3,Amazon Athena,我有一个lambda,它将查询雅典娜并将结果的输出放到我想要的bucket中。Athena输出包含.csv和.csv.metadata。我不想在lambda删除.metadata文件时将其与.csv文件一起获取。这是我的密码: def wait_for_result(athena, query_id):state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']

我有一个lambda,它将查询雅典娜并将结果的输出放到我想要的bucket中。Athena输出包含.csv和.csv.metadata。我不想在lambda删除.metadata文件时将其与.csv文件一起获取。这是我的密码:

def wait_for_result(athena, query_id):state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']
    while state != 'SUCCEEDED':
    print('Query state: {}'.format(state))
    time.sleep(5)
    state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']


def lambda_handler(event, context):
    short_date = event['record']['short_date']

    bucket = 'test-rod-us-east-1-orders'
    s3_output = 's3://{0}/arda-orders/f=csv/short_date={1}'.format(bucket, short_date)

    query = 'query_here'.format(short_date)

    boto_session = assume_role('arn:aws:iam::account-id:role/test-contr-etl-ec2-role')
    session = assume_role('arn:aws:iam::account-id:role/test-xacct-rod-consumer', boto_session)

    athena = session.client('athena')
    s3 = session.client('s3')
    s3_bucket = session.resource('s3').Bucket(bucket)

    response = athena.start_query_execution(QueryString=query,
                                        QueryExecutionContext={
                                            'Database': 'datapond'
                                        },
                                        ResultConfiguration={
                                            'OutputLocation': s3_output
                                        })

    query_id = response['QueryExecutionId']
    wait_for_result(athena, query_id)

    # print ('short_date: {}'.format(short_date))
    for key in s3.list_objects(Bucket=bucket)['Contents']:
       if short_date in key['Key']:
          s3.put_object_acl(ACL='bucket-owner-full-control', Bucket=bucket, Key=key['Key'])
           print('set \'bucket-owner-full-control\' for {}'.format(key['Key']))
             if '.csv.metadata' in key['Key']:
                s3_bucket.delete_objects(
                   Delete={
                     'Objects': [
                         {'Key': key['Key']},
                    ]
                }
            )

             print('deleted {}'.format(key['Key']))

   sqs.delete_message(
       QueueUrl=sqs_queue_url,
       ReceiptHandle=event['receipt_handler']
   )

   print ('Complete process for short_date: {}'.format(short_date))

我只是在日志中得到了“deleted key”消息,但我仍然在s3 bucket中找到了.csv.metadata文件。请帮助

您不应该删除这些对象,当您调用
get\u query\u result
调用时,雅典娜使用这些对象查找元数据。与其删除它们,不如将.csv文件复制到另一个位置。

能否将
s3_bucket.delete_objects
的结果分配给一个名称(例如
response
)和
print(response['Errors'))
如何在代码中提及已删除对象的结果?即使lambda中有错误,每次我在bucket中看到一个新的.csv和一个新的.csv.metadata文件。我猜它是在s3.put\u object\u acl可能使用
response=s3\u bucket.delete\u objects(…
获取删除请求的结果。然后
print(response['Errors'))
输出它。那里的输出是什么?