Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 使用数据流将Pubsubio发布到Bigquery_Python 2.7_Google Cloud Platform_Google Bigquery_Google Cloud Dataflow_Apache Beam Io - Fatal编程技术网

Python 2.7 使用数据流将Pubsubio发布到Bigquery

Python 2.7 使用数据流将Pubsubio发布到Bigquery,python-2.7,google-cloud-platform,google-bigquery,google-cloud-dataflow,apache-beam-io,Python 2.7,Google Cloud Platform,Google Bigquery,Google Cloud Dataflow,Apache Beam Io,我在从pubsubio向BigQuery插入消息时遇到错误belwo 如何将记录从pubsub插入BQ。我们可以将pcollection转换成一个列表,还是有其他替代方案 AttributeError:'PCollection'对象没有属性'split' 这是我的密码: def create_record(columns): #import re col_value=record_ids.split('|') col_name=columns.split(",")

我在从pubsubio向BigQuery插入消息时遇到错误belwo

如何将记录从pubsub插入BQ。我们可以将
pcollection
转换成一个列表,还是有其他替代方案

AttributeError:
'PCollection'
对象没有属性
'split'

这是我的密码:

def create_record(columns):
    #import re
    col_value=record_ids.split('|')
    col_name=columns.split(",")
    for i in range(length(col_name)):
        schmea_dict[col_name[i]]=col_value[i]
    return schmea_dict

schema = 'tungsten_opcode:STRING,tungsten_seqno:INTEGER
columns="tungsten_opcode,tungsten_seqno"
lines = p | 'Read PubSub' >> beam.io.ReadStringsFromPubSub(INPUT_TOPIC) | 
    beam.WindowInto(window.FixedWindows(15))
record_ids = lines | 'Split' >> 
    (beam.FlatMap(split_fn).with_output_types(unicode))
records = record_ids | 'CreateRecords' >> beam.Map(create_record(columns))
records | 'BqInsert' >> beam.io.WriteToBigQuery(
    OUTPUT,
    schema=schema,
    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)

需要作为转换进行,您不能直接访问pcollection中的数据

编写一个DoFn类,以使用schema作为侧输入对记录执行拆分转换,并使用column/records创建dict,例如

class CreateRecord(beam.DoFn):
  def process(self, element, schema):
    cols = element.split(',')
    header = map(lambda x: x.split(':')[0], schema.split(','))
    return [dict(zip(header, cols))]
像这样应用变换:

schema = 'tungsten_opcode:STRING,tungsten_seqno:INTEGER'
records = record_ids | 'CreateRecords' >> beam.ParDo(CreateRecord(), SCHEMA)

def create_record(columns):#import re col_value=record_id.split(“|”)col_name=columns.split(“,”)表示范围内的i(长度(col_name)):schmea_dict[col_name[i]=col_value[i]返回schmea_dict能否请您正确格式化代码。是否有其他方法可以使用Dataflow格式化代码将pubsub消息加载到BQ。非常感谢您的帮助。