Python 如何使用ApacheBeam从Google Pub/Sub访问消息id？_Python_Apache Beam_Google Cloud Pubsub

Python 如何使用ApacheBeam从Google Pub/Sub访问消息id？

python

Python 如何使用ApacheBeam从Google Pub/Sub访问消息id？,python,apache-beam,google-cloud-pubsub,Python,Apache Beam,Google Cloud Pubsub,我一直在使用Python 2.7.16上的2.13.0 SDK测试Apache Beam，以流模式从Google发布/订阅中提取简单消息，并写入Google大查询表。作为此操作的一部分，我尝试使用发布/订阅消息id进行重复数据消除，但似乎根本无法将其取出 and建议应该将服务生成的KVs（如id_标签）作为attributes属性的一部分返回，但它们似乎不会返回请注意，只有在使用Dataflow runner时才支持id_label参数发送消息的代码导入时间导入json 从日期时间导入日

我一直在使用Python 2.7.16上的2.13.0 SDK测试Apache Beam，以流模式从Google发布/订阅中提取简单消息，并写入Google大查询表。作为此操作的一部分，我尝试使用发布/订阅消息id进行重复数据消除，但似乎根本无法将其取出

and建议应该将服务生成的KVs（如id_标签）作为attributes属性的一部分返回，但它们似乎不会返回

请注意，只有在使用Dataflow runner时才支持id_label参数

发送消息的代码

导入时间
导入json
从日期时间导入日期时间
从google.cloud导入pubsub_v1
project_id=“[YOUR project]”
topic_name=“测试apache beam”
publisher=pubsub_v1.publisher客户端（）
主题路径=发布者。主题路径（项目id，主题名称）
def回调（消息_future）：
如果消息_future.exception（超时=30）：
print（'发布消息{}引发异常{}.'.format（主题名称，消息{u future.Exception（）））
其他：
打印（消息\u future.result（））
对于范围（1,11）内的n：
数据={'rownumber'：n}
jsondata=json.dumps（数据）
message\u future=publisher.publish（主题路径，data=jsondata，source='python'，timestamp=datetime.now（）.strftime（“%Y-%b-%d（%H:%M:%S:%f）”）
消息\u future.add\u done\u回调（回调）
打印（'已发布的邮件ID:'）

梁管道代码：-

来自未来导入绝对导入
导入argparse
导入日志记录
进口稀土
导入json
导入时间
导入日期时间
导入base64
导入pprint
从过去。内置导入unicode
将apache_梁作为梁导入
从apache_beam.io导入ReadFromText
从apache_beam.io导入ReadFromPubSub
从apache_beam.io导入ReadStringsFromPubSub
从apache_beam.io导入WriteToText
从apache_beam.options.pipeline_options导入PipelineOptions
从apache_beam.options.pipeline_options导入设置选项
从apache_beam.options.pipeline_options导入标准选项
从apache_beam.transforms.trigger导入AfterProcessingTime
从apache_beam.transforms.trigger导入累加模式
def格式\消息\元素（消息，时间戳=beam.DoFn.timestamp参数）：
data=json.load（message.data）
attribs=message.attributes
fullmessage={'data'：数据，
“属性”：属性，
“attribstring”：str（message.attributes）}
返回完整消息
def运行（argv=None）：
parser=argparse.ArgumentParser（）
输入组=解析器。添加互斥组（必需=True）
输入\组。添加\参数(
“--输入订阅”，
dest='input_subscription'，
help=（'Input PubSub订阅表单'
““项目//订阅/”））
输入\组。添加\参数(
“--测试_输入”，
action=“store\u true”，
默认值=False
)
组=解析器。添加互斥组（必需=True）
group.add_参数(
“--输出_表”，
dest='output_table'，
帮助=
（'Output BigQuery table for results指定为：PROJECT:DATASET.table'
“或DATASET.TABLE。”）
group.add_参数(
“--输出_文件”，
dest='output_file'，
help='Output file to write results to'）
已知参数，管道参数=解析器。解析已知参数（argv）
选项=管道选项（管道参数）
选项。查看为（设置选项）。保存主会话=真
如果已知参数输入订阅：
选项。按（标准选项）查看。流=真
梁管道（选项=选项）为p：
从apache_beam.io.gcp.internal.clients导入bigquery
table_schema=bigquery.TableSchema（）
attribfield=bigquery.TableFieldSchema（）
attribfield.name='attributes'
attribfield.type='record'
attribfield.mode='nullable'
attribsource=bigquery.TableFieldSchema（）
attribsource.name='source'
attribsource.type='string'
attribsource.mode='nullable'
attribtimestamp=bigquery.TableFieldSchema（）
attribtimestamp.name='timestamp'
attribtimestamp.type='string'
attribtimestamp.mode='nullable'
attribfield.fields.append（attribsource）
attribfield.fields.append（attribtimestamp）
表\u schema.fields.append（attribfield）
datafield=bigquery.TableFieldSchema（）
datafield.name='data'
datafield.type='record'
datafield.mode='nullable'
datanumberfield=bigquery.TableFieldSchema（）
datanumberfield.name='rownumber'
datanumberfield.type='integer'
datanumberfield.mode='nullable'
datafield.fields.append（datanumberfield）
表\u schema.fields.append（数据字段）
attribstringfield=bigquery.TableFieldSchema（）
attribstringfield.name='attribstring'
attribstringfield.type='string'
attribstringfield.mode='nullable'
表_schema.fields.append（attribstringfield）
如果已知参数输入订阅：
消息=（p
|'Read From Pub Sub'>>ReadFromPubSub（订阅=known\u args.input\u订阅，属性=True，id\u label='message\u id'）
|“格式消息”>>beam.Map（格式消息元素）
)
输出=（消息|‘写入’>>beam.io.WriteToBigQuery(
已知参数输出表，
schema=表\u schema，
create_disposition=beam.io.BigQueryDisposition.create_如果需要，
write\u disposition=beam.io.BigQueryDisposition.write\u APPEND）
)
结果=p.运行（）
结果。等待直到完成（）
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
logging.getLogger（）.setLevel（logging.INFO）
运行（）

以及要运行的代码