Sql 使用S3 Select查询对AWS-S3存储桶中重复出现的变量进行计数
我正在运行一个Python脚本,以使用AWS-S3-Select工具查询AWS-S3存储桶。我正在从txt文件导入一个变量,并希望将其传递到S3 Select查询中。我还想通过查询整个S3目录而不是单个文件来计算所有导入变量的重复次数(在指定列中) 这就是我到目前为止所做的:Sql 使用S3 Select查询对AWS-S3存储桶中重复出现的变量进行计数,sql,python-3.x,amazon-s3,boto3,amazon-s3-select,Sql,Python 3.x,Amazon S3,Boto3,Amazon S3 Select,我正在运行一个Python脚本,以使用AWS-S3-Select工具查询AWS-S3存储桶。我正在从txt文件导入一个变量,并希望将其传递到S3 Select查询中。我还想通过查询整个S3目录而不是单个文件来计算所有导入变量的重复次数(在指定列中) 这就是我到目前为止所做的: import boto3 from boto3.session import Session with open('txtfile.txt', 'r') as myfile: variable = myfile.
import boto3
from boto3.session import Session
with open('txtfile.txt', 'r') as myfile:
variable = myfile.read()
ACCESS_KEY='accessKey'
SECRET_KEY='secredtKey'
session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3b = session.client('s3')
r = s3b.select_object_content(
Bucket='s3BucketName',
Key='directory/fileName',
ExpressionType='SQL',
Expression="'select count(*)from S3Object s where s.columnName = %s;', [variable]",
InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
OutputSerialization={'CSV': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
elif 'Stats' in event:
statsDetails = event['Stats']['Details']
print("Stats details bytesScanned: ")
运行此脚本时,返回以下错误:
Traceback (most recent call last):
File "s3_query.py", line 20, in <module>
OutputSerialization={'CSV': {}},
File "/root/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ParseUnexpectedToken) when calling the SelectObjectContent operation: Unexpected token found COMMA:',' at line 1, column 67.
回溯(最近一次呼叫最后一次):
文件“s3_query.py”,第20行,在
OutputSerialization={'CSV':{},
文件“/root/anaconda3/lib/python3.6/site packages/botocore/client.py”,第314行,在api调用中
返回self.\u make\u api\u调用(操作名称,kwargs)
文件“/root/anaconda3/lib/python3.6/site packages/botocore/client.py”,第612行,在make\u api\u调用中
引发错误\u类(解析的\u响应、操作\u名称)
botocore.exceptions.ClientError:调用SelectObjectContent操作时发生错误(ParseUnexpectedToken):在第1行第67列的逗号“,”处找到意外标记。
这一行看起来很奇怪:
Expression="'select count(*)from S3Object s where s.columnName = %s;', [variable]"
这不是正常的SQL或Python语法
您可能应该使用:
Expression='select count(*)from S3Object s where s.columnName = %s;' % [variable]