Python 3.x 在python中将错误数据写入GCP数据流的新Bigquery表_Python 3.x_Google Cloud Platform_Google Bigquery_Google Cloud Pubsub_Dataflow

Python 3.x 在python中将错误数据写入GCP数据流的新Bigquery表

python-3.x google-cloud-platform google-bigquery

Python 3.x 在python中将错误数据写入GCP数据流的新Bigquery表,python-3.x,google-cloud-platform,google-bigquery,google-cloud-pubsub,dataflow,Python 3.x,Google Cloud Platform,Google Bigquery,Google Cloud Pubsub,Dataflow,我正在尝试用python构建一个数据流作业，以将数据从pubsub写入Bigquery，代码运行良好，但要处理错误并将其加载到新的Bigquery表中，我遇到了困难，您能否建议一种方法来处理run函数中的错误并将源消息加载到新表中此函数运行数据流管道并加载到bigquery表中 def run(argv=None): """Build and run the pipeline.""" parser = argparse.ArgumentParser()

我正在尝试用python构建一个数据流作业，以将数据从pubsub写入Bigquery，代码运行良好，但要处理错误并将其加载到新的Bigquery表中，我遇到了困难，您能否建议一种方法来处理run函数中的错误并将源消息加载到新表中

此函数运行数据流管道并加载到bigquery表中

  def run(argv=None):      
"""Build and run the pipeline."""

      parser = argparse.ArgumentParser()
      parser.add_argument(
          '--input_topic', dest='input_topic', required=True, 
          help='Input PubSub topic of the form "/topics/<PROJECT>/<TOPIC>".')
      parser.add_argument(
          '--output_table', dest='output_table', required=True, 
          help='Input the table name for bigquery".')
      parser.add_argument(
          '--output_dataset', dest='output_dataset', required=True, 
          help='Input the dataset name for bigquery".') 

      known_args, pipeline_args = parser.parse_known_args(argv)

      with beam.Pipeline(argv=pipeline_args) as p:
        # Read from PubSub Topic 
        lines = p | beam.io.ReadFromPubSub(known_args.input_topic)
        #Adapt messages from PubSub to BQ table, this needs to be in JSON 
        lines = lines | beam.Map(parse_pubsub)
        #Write to a BQ table 
        lines | beam.io.WriteToBigQuery(table=known_args.output_table,
                                        dataset=known_args.output_dataset,
                                        project='test-project',
                                        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED
                                        )



    if __name__ == '__main__':
      logging.getLogger().setLevel(logging.INFO)
      run()

def运行（argv=None）：
“”“生成并运行管道。”“”
parser=argparse.ArgumentParser（）
parser.add_参数(
'--input_topic'，dest='input_topic'，required=True，
help='Input PubSub-topic of the form'/topics//“）
parser.add_参数(
'--output_table'，dest='output_table'，required=True，
help='Input bigquery'的表名'）
parser.add_参数(
'--output_dataset'，dest='output_dataset'，required=True，
help='Input bigquery'的数据集名称'）
已知参数，管道参数=解析器。解析已知参数（argv）
将beam.Pipeline（argv=Pipeline_args）作为p：
#从PubSub主题阅读
lines=p | beam.io.ReadFromPubSub（已知参数输入主题）
#将来自PubSub的消息适配到BQ表，这需要使用JSON格式
lines=lines | beam.Map（解析_pubsub）
#写入BQ表
行| beam.io.WriteToBigQuery（table=known_args.output_table，
数据集=已知参数。输出数据集，
project='test-project'，
create_disposition=beam.io.BigQueryDisposition.create_如果需要
)
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
logging.getLogger（）.setLevel（logging.INFO）
运行（）

您遇到了什么问题？看起来你只是在网上举了个例子，然后发布了一个问题。谷歌有很好的数据流教程。通过几个例子来理解概念。我需要在解析或写入bq时处理错误，并将错误和实际消息加载到bq中的一个新表中，这不是一个在线示例beam不支持您自己复制/修改beam源代码所需的大量工作。您是在流式处理还是在批处理管道中？bigquery中写入错误的原因是什么？流式传输数据，我的数据流工作正常，加载bq表，我想考虑源数据不好的情况，想确认该消息并将其加载到错误表中。您遇到了什么问题？看起来你只是在网上举了个例子，然后发布了一个问题。谷歌有很好的数据流教程。通过几个例子来理解概念。我需要在解析或写入bq时处理错误，并将错误和实际消息加载到bq中的一个新表中，这不是一个在线示例beam不支持您自己复制/修改beam源代码所需的大量工作。您是在流式处理还是在批处理管道中？bigquery中写入错误的原因是什么？流式传输数据，我的数据流工作正常，加载bq表，我想考虑源数据不好的情况，想确认该消息并将其加载到错误表中。