Python 忽略athena输出中的.csv.metadata文件
我有一个lambda,它将查询雅典娜并将结果的输出放到我想要的bucket中。Athena输出包含.csv和.csv.metadata。我不想在lambda删除.metadata文件时将其与.csv文件一起获取。这是我的密码:Python 忽略athena输出中的.csv.metadata文件,python,amazon-web-services,amazon-s3,boto3,amazon-athena,Python,Amazon Web Services,Amazon S3,Boto3,Amazon Athena,我有一个lambda,它将查询雅典娜并将结果的输出放到我想要的bucket中。Athena输出包含.csv和.csv.metadata。我不想在lambda删除.metadata文件时将其与.csv文件一起获取。这是我的密码: def wait_for_result(athena, query_id):state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']
def wait_for_result(athena, query_id):state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']
while state != 'SUCCEEDED':
print('Query state: {}'.format(state))
time.sleep(5)
state = athena.get_query_execution(QueryExecutionId=query_id)['QueryExecution']['Status']['State']
def lambda_handler(event, context):
short_date = event['record']['short_date']
bucket = 'test-rod-us-east-1-orders'
s3_output = 's3://{0}/arda-orders/f=csv/short_date={1}'.format(bucket, short_date)
query = 'query_here'.format(short_date)
boto_session = assume_role('arn:aws:iam::account-id:role/test-contr-etl-ec2-role')
session = assume_role('arn:aws:iam::account-id:role/test-xacct-rod-consumer', boto_session)
athena = session.client('athena')
s3 = session.client('s3')
s3_bucket = session.resource('s3').Bucket(bucket)
response = athena.start_query_execution(QueryString=query,
QueryExecutionContext={
'Database': 'datapond'
},
ResultConfiguration={
'OutputLocation': s3_output
})
query_id = response['QueryExecutionId']
wait_for_result(athena, query_id)
# print ('short_date: {}'.format(short_date))
for key in s3.list_objects(Bucket=bucket)['Contents']:
if short_date in key['Key']:
s3.put_object_acl(ACL='bucket-owner-full-control', Bucket=bucket, Key=key['Key'])
print('set \'bucket-owner-full-control\' for {}'.format(key['Key']))
if '.csv.metadata' in key['Key']:
s3_bucket.delete_objects(
Delete={
'Objects': [
{'Key': key['Key']},
]
}
)
print('deleted {}'.format(key['Key']))
sqs.delete_message(
QueueUrl=sqs_queue_url,
ReceiptHandle=event['receipt_handler']
)
print ('Complete process for short_date: {}'.format(short_date))
我只是在日志中得到了“deleted key”消息,但我仍然在s3 bucket中找到了.csv.metadata文件。请帮助您不应该删除这些对象,当您调用
get\u query\u result
调用时,雅典娜使用这些对象查找元数据。与其删除它们,不如将.csv文件复制到另一个位置。能否将s3_bucket.delete_objects
的结果分配给一个名称(例如response
)和print(response['Errors'))
如何在代码中提及已删除对象的结果?即使lambda中有错误,每次我在bucket中看到一个新的.csv和一个新的.csv.metadata文件。我猜它是在s3.put\u object\u acl可能使用response=s3\u bucket.delete\u objects(…
获取删除请求的结果。然后print(response['Errors'))
输出它。那里的输出是什么?