Sql 使用S3 Select查询对AWS-S3存储桶中重复出现的变量进行计数

Sql 使用S3 Select查询对AWS-S3存储桶中重复出现的变量进行计数,sql,python-3.x,amazon-s3,boto3,amazon-s3-select,Sql,Python 3.x,Amazon S3,Boto3,Amazon S3 Select,我正在运行一个Python脚本,以使用AWS-S3-Select工具查询AWS-S3存储桶。我正在从txt文件导入一个变量,并希望将其传递到S3 Select查询中。我还想通过查询整个S3目录而不是单个文件来计算所有导入变量的重复次数(在指定列中) 这就是我到目前为止所做的: import boto3 from boto3.session import Session with open('txtfile.txt', 'r') as myfile: variable = myfile.

我正在运行一个Python脚本,以使用AWS-S3-Select工具查询AWS-S3存储桶。我正在从txt文件导入一个变量,并希望将其传递到S3 Select查询中。我还想通过查询整个S3目录而不是单个文件来计算所有导入变量的重复次数(在指定列中)

这就是我到目前为止所做的:

import boto3
from boto3.session import Session

with open('txtfile.txt', 'r') as myfile:
    variable = myfile.read()

ACCESS_KEY='accessKey'
SECRET_KEY='secredtKey'

session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3b = session.client('s3')

r = s3b.select_object_content(
    Bucket='s3BucketName',
    Key='directory/fileName',
    ExpressionType='SQL',
    Expression="'select count(*)from S3Object s where s.columnName = %s;', [variable]",
    InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
    OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
运行此脚本时,返回以下错误:

Traceback (most recent call last):
  File "s3_query.py", line 20, in <module>
    OutputSerialization={'CSV': {}},
  File "/root/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/root/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ParseUnexpectedToken) when     calling the SelectObjectContent operation: Unexpected token found COMMA:',' at line     1, column 67.
回溯(最近一次呼叫最后一次):
文件“s3_query.py”,第20行,在
OutputSerialization={'CSV':{},
文件“/root/anaconda3/lib/python3.6/site packages/botocore/client.py”,第314行,在api调用中
返回self.\u make\u api\u调用(操作名称,kwargs)
文件“/root/anaconda3/lib/python3.6/site packages/botocore/client.py”,第612行,在make\u api\u调用中
引发错误\u类(解析的\u响应、操作\u名称)
botocore.exceptions.ClientError:调用SelectObjectContent操作时发生错误(ParseUnexpectedToken):在第1行第67列的逗号“,”处找到意外标记。

这一行看起来很奇怪:

Expression="'select count(*)from S3Object s where s.columnName = %s;', [variable]"
这不是正常的SQL或Python语法

您可能应该使用:

Expression='select count(*)from S3Object s where s.columnName = %s;' % [variable]