Google cloud platform 如何在google云数据流管道中传递动态参数_Google Cloud Platform_Google Cloud Dataflow_Apache Beam

Google cloud platform 如何在google云数据流管道中传递动态参数

google-cloud-platform google-cloud-dataflow

Google cloud platform 如何在google云数据流管道中传递动态参数,google-cloud-platform,google-cloud-dataflow,apache-beam,Google Cloud Platform,Google Cloud Dataflow,Apache Beam,我已经编写了将CSV文件从GCS注入到BigQuery的代码，其中包含硬编码的ProjectID、数据集、表名、GCS临时和暂存位置我正在寻找应该阅读的代码投射数据集表名地面军事系统温度和暂存位置参数从BigQuery表（动态参数）代码：- public class DemoPipeline { public static TableReference getGCDSTableReference() { TableReference ref = new TableRe

我已经编写了将CSV文件从GCS注入到BigQuery的代码，其中包含硬编码的ProjectID、数据集、表名、GCS临时和暂存位置

我正在寻找应该阅读的代码

投射
数据集
表名
地面军事系统温度和暂存位置参数

从

BigQuery表（动态参数）

代码：-

public class DemoPipeline {

public static TableReference getGCDSTableReference() {
    TableReference ref = new TableReference();
    ref.setProjectId("myprojectbq");
    ref.setDatasetId("DS_Emp");
    ref.setTableId("emp");
    return ref;
}
static class TransformToTable extends DoFn<String, TableRow> {
    @ProcessElement
    public void processElement(ProcessContext c) {

        String input = c.element();

        String[] s = input.split(",");
        TableRow row = new TableRow();

        row.set("id", s[0]);
        row.set("name", s[1]);
        c.output(row);

    }
}
public interface MyOptions extends PipelineOptions {

    /*
     * Param
     * 
     */

}

public static void main(String[] args) {

    MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
    options.setTempLocation("gs://demo-xxxxxx/temp");
    Pipeline p = Pipeline.create(options);

    PCollection<String> lines = p.apply("Read From Storage", TextIO.read().from("gs://demo-xxxxxx/student.csv"));

    PCollection<TableRow> rows = lines.apply("Transform To Table",ParDo.of(new TransformToTable()));

    rows.apply("Write To Table",BigQueryIO.writeTableRows().to(getGCDSTableReference())
            //.withSchema(BQTableSemantics.getGCDSTableSchema())
            .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
            .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER));

    p.run();
}
}

公共类DemoPipeline{
公共静态表引用getGCDSTableReference（）{
TableReference ref=新的TableReference（）；
参考setProjectId（“myprojectbq”）；
参考setDatasetId（“DS_Emp”）；
参考setTableId（“emp”）；
返回ref；
}
静态类TransformToTable扩展了DoFn{
@过程元素
公共void processElement（ProcessContext c）{
字符串输入=c.element（）；
字符串[]s=input.split（“，”）；
TableRow行=新TableRow（）；
行集合（“id”，s[0]）；
行集合（“名称”，s[1]）；
c、 输出（行）；
}
}
公共接口MyOptions扩展了PipelineOptions{
/*
*Param
* 
*/
}
公共静态void main（字符串[]args）{
MyOptions options=PipelineOptionsFactory.fromArgs（args）.withValidation（）.as（MyOptions.class）；
options.setTempLocation（“gs://demo-xxxxxx/temp”）；
Pipeline p=Pipeline.create（选项）；
PCollection line=p.apply（“从存储器读取”，TextIO.Read（）。从（“gs://demo-xxxxxx/student.csv”）；
PCollection rows=lines.apply（“转换到表”，ParDo.of（new TransformToTable（））；
rows.apply（“写入表”，BigQueryIO.writeTableRows（）.To（getGCDSTableReference（））
//.withSchema（BQTableSemantics.getGCDSTableSchema（））
.withWriteDisposition（BigQueryIO.Write.WriteDisposition.Write\u追加）
.withCreateDisposition（BigQueryIO.Write.CreateDisposition.CREATE_NEVER））；
p、 run（）；
}
}

即使要从包含其他数据的初始表（项目ID/dataset/tables名称）中读取数据，也需要在某个地方对这些信息进行硬编码。Haris推荐的属性文件是一种很好的方法，请查看以下建议：

。必须更改或调整参数时使用。通常，不需要新编译的更改。它是一个必须存在或附加到java类的文件。从GCS读取此文件是可行的，但却是一个奇怪的选择

管道执行参数。自定义参数可以作为解决问题的方法，请检查以了解如何实现

我不太明白这个问题。是否要使用BigQuery作为源，并基于从另一个源处理的元素从特定表和/或数据集加载？或者将其用作接收器，并根据从其他来源处理的元素写入特定表和/或数据集？感谢Alex的回复。我的要求是将CSV文件从GCS加载到BigQuery，而无需在java代码中硬编码项目ID/dataset/tables名称。我想从外部存储器或动态参数（模板）中读取这些参数。请告知。@Kannan只需使用配置即可file@HarisNadeem，如果您提供一些示例以及如何从GCS读取配置文件，我们将不胜感激。我的要求是从GCS中读取源CSV文件，并与GCS中的配置CSV文件（我将维护列名）进行比较，然后将其加载到Bigquery中。提前谢谢。您可以在这里找到一个配置文件的示例：然后您可以将配置文件与您的作业打包在一起