Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/google-cloud-platform/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud platform 如何基于条件运行apachebeam write-to-big查询_Google Cloud Platform_Google Bigquery_Google Cloud Dataflow_Apache Beam - Fatal编程技术网

Google cloud platform 如何基于条件运行apachebeam write-to-big查询

Google cloud platform 如何基于条件运行apachebeam write-to-big查询,google-cloud-platform,google-bigquery,google-cloud-dataflow,apache-beam,Google Cloud Platform,Google Bigquery,Google Cloud Dataflow,Apache Beam,我试图从Google pubsub和Google storage中读取值,并根据计数条件将这些值放入大查询中,即,如果值存在,则不应插入值,否则可以插入值 我的代码如下所示: p_bq = beam.Pipeline(options=pipeline_options1) logging.info('Start') """Pipeline starts. Create creates a PCollection from what we read from Cloud storage""" t

我试图从Google pubsub和Google storage中读取值,并根据计数条件将这些值放入大查询中,即,如果值存在,则不应插入值,否则可以插入值

我的代码如下所示:

p_bq = beam.Pipeline(options=pipeline_options1)

logging.info('Start')

"""Pipeline starts. Create creates a PCollection from what we read from Cloud storage"""
test = p_bq | beam.Create(data)

"""The pipeline then reads from pub sub and then combines the pub sub with the cloud storage data"""
BQ_data1 = p_bq | 'readFromPubSub' >> beam.io.ReadFromPubSub(
    'mytopic') |  beam.Map(parse_pubsub, param=AsList(test))
其中“数据”是来自谷歌存储的值,而从pubsub读取的数据是来自谷歌分析的值。Parse_pubsub返回2个值:一个是dictionary,另一个是count(表示该值是否存在于表中)

由于值在Pcollection中,如何为大查询插入提供条件


新编辑:

class Process(beam.DoFn):

def process1(self, element, trans):
    if element['id'] in trans:
        # Emit this short word to the main output.
        yield pvalue.TaggedOutput('present',element)
    else:
        # Emit this word's long length to the 'above_cutoff_lengths' output.
        yield pvalue.TaggedOutput(
            'absent', present)

test1 = p_bq | "TransProcess" >> beam.Create(trans)
名单在哪里

BQ_data2 = BQ_data1 | beam.ParDo(Process(),trans=AsList(test1)).with_outputs('present','absent')
present_value=BQ_data2.present
absent_value=BQ_data2.absent
提前谢谢

您可以使用

beam.Filter(lambda_function)
在执行beam.Map步骤以过滤掉传递给lambda_函数时返回False的元素后

您可以根据条件在ParDo函数中拆分PCollection

不要忘记为ParDo函数
提供输出标记。使用_outputs()

将PCollection的元素写入特定输出时,请使用
.TaggedOutput()


然后选择所需的PCollection并将其写入BigQuery。

Hi@vdolez我正在尝试标记输出。但即使这样,我还是得到了UnpicklingError:无效的加载密钥“”。我甚至没有在代码中使用Pickle。我只想用details@SriHariSriHari.M也许解压错误与您如何读取输入有关。我不是Python专家,但在我看来这是一个种族条件问题。。。
beam.Filter(lambda_function)