Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/ant/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Maven 从jar文件调用数据流进程,错误为PipelineOptions缺少名为';GCP位置'; 要求_Maven_Google Cloud Dataflow_Gcloud_Apache Beam - Fatal编程技术网

Maven 从jar文件调用数据流进程,错误为PipelineOptions缺少名为';GCP位置'; 要求

Maven 从jar文件调用数据流进程,错误为PipelineOptions缺少名为';GCP位置'; 要求,maven,google-cloud-dataflow,gcloud,apache-beam,Maven,Google Cloud Dataflow,Gcloud,Apache Beam,我们正试图通过使用可执行jar文件 随后的过程 使用SDK 2.2.0按照说明创建应用程序 使用maven命令生成jar文件mvn包 使用此命令执行jar文件 java-jar-DataFlow-jobs-0.1.jar--tempLocation=gs://events-DataFlow/tmp--gcpTempLocation=gs://events-DataFlow/tmp--project=google项目id--runner=DataflowRunner--BQQuery='sele

我们正试图通过使用
可执行jar文件

随后的过程
  • 使用SDK 2.2.0按照说明创建应用程序
  • 使用maven命令生成jar文件
    mvn包
  • 使用此命令执行jar文件
    java-jar-DataFlow-jobs-0.1.jar--tempLocation=gs://events-DataFlow/tmp--gcpTempLocation=gs://events-DataFlow/tmp--project=google项目id--runner=DataflowRunner--BQQuery='select t1.user\id google-project-id.deve.user\u info t1'
输出 代码 pom.xml

org.apache.maven.plugins
maven汇编插件
假的
org.customerlabs.beam.WriteFromBQtoES
带有依赖项的jar
制作可执行jar
包裹
单一的
WriteFromBQtoES.java
public class WriteFromBQtoES {
    private static DateTimeFormatter fmt =
        DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
    private static final Logger LOG = LoggerFactory.getLogger(WriteFromBQtoES.class);
    private static final ObjectMapper mapper = new ObjectMapper();

    public interface Options extends PipelineOptions {
        @Description("Bigquery query to fetch data")
        @Required
        String getBQQuery();
        void setBQQuery(String value);
    }

    public static void main(String[] args) throws IOException{
        PipelineOptionsFactory.register(Options.class);
        Options options = PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);

        Pipeline p = Pipeline.create(options);
        PCollection<TableRow> tableRows = p.apply(BigQueryIO.read().fromQuery(options.getBQQuery()).usingStandardSql());

        tableRows.apply("WriteToCSV", ParDo.of(new DoFn<TableRow, String>() {
        // process WriteToCSV
        }))
        p.run();
    }
}

public static void main(String[] args) throws IOException{
   PipelineOptionsFactory.register(Options.class);
   Options options = PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);
   String query = options.getBQQuery();
   Pipeline p = Pipeline.create(options);
   .....
   ..... pipeline operations.....
   .....
}
公共类WriteFromBQtoES{
专用静态DateTimeFormatter fmt=
DateTimeFormat.forPattern(“yyyy-MM-dd'T'HH:MM:ss.SSS'Z'”);
私有静态最终记录器LOG=LoggerFactory.getLogger(WriteFromBQtoES.class);
私有静态最终ObjectMapper mapper=新ObjectMapper();
公共接口选项扩展了PipelineOptions{
@说明(“获取数据的Bigquery查询”)
@必需的
字符串getBQQuery();
void setBQQuery(字符串值);
}
公共静态void main(字符串[]args)引发IOException{
pipelineoptions工厂寄存器(Options.class);
Options Options=PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);
Pipeline p=Pipeline.create(选项);
PCollection tableRows=p.apply(BigQueryIO.read().fromQuery(options.getBQQuery()).usingStandardSql());
tableRows.apply(“WriteToCSV”,ParDo.of(new DoFn()){
//进程WriteToCSV
}))
p、 run();
}
}
公共静态void main(字符串[]args)引发IOException{
pipelineoptions工厂寄存器(Options.class);
Options Options=PipelineOptionsFactory.fromArgs(args).withValidation().create().as(Options.class);
字符串查询=options.getBQQuery();
Pipeline p=Pipeline.create(选项);
.....
……管道作业。。。。。
.....
}

我不确定我们遗漏了什么,我们有这个错误。我们在命令行中传递参数gcpTempLocation。请帮助找出这个问题。提前感谢

我想您需要的不是管道选项,而是:

public interface Options extends DataflowPipelineOptions { ... }

gcpTempLocation在中定义并由扩展。

我遇到了相同的问题,只是我使用maven shade插件创建了一个uber jar,其中包含应用程序所需的所有依赖项。使用Apache Beam所需的参数执行jar文件会导致相同的错误,其中找不到-gcpTempLocation。在pom.xml中添加以下代码块将允许您使用maven shade打包uber jar文件,并解决缺少参数的问题

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>${maven-shade-plugin.version}</version>
  <executions>
    <!-- Run shade goal on package phase -->
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <transformers>
          <!-- Required to ensure Beam Pipeline options can be passed properly. Without this, pipeline options will not be recognised -->
          <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"></transformer>
          <!-- add Main-Class to manifest file -->
          <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
            <mainClass>NAME-OF-YOUR-MAIN-CLASS</mainClass>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
</plugin>

Hi@Slava我们用DataflowPipelineOptions替换了PipelineOptions,我们得到了一个新的错误,上面写着线程“main”java.lang.IllegalArgumentException中的
异常:未知的“runner”指定了“DataflowRunner”,支持的管道运行程序[DirectRunner]
我们需要DataflowRunner如何获得它?我是java平台的新手,在构建包含DataflowRunner的JAR文件时,您可能并不依赖于正确的依赖项。查看qyuckstart指南()中的示例mvn命令:mvn compile exec:java-Dexec.mainClass=org.apache.beam.examples.WordCount \-Dexec.args=“--runner=DataflowRunner--project=\--gcpTempLocation=gs:///tmp \--inputFile gs://apache beam samples/shakespeare/*--output=gs:///counts”\-Pdataflow runner具体地说,我认为您需要将“-Pdataflow runner”配置文件传递给MVNW。我们已经尝试了此命令,并且数据流进程成功启动。但唯一的问题是,我们需要一个jar文件来运行它并传递参数。
public interface Options extends DataflowPipelineOptions { ... }
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>${maven-shade-plugin.version}</version>
  <executions>
    <!-- Run shade goal on package phase -->
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <transformers>
          <!-- Required to ensure Beam Pipeline options can be passed properly. Without this, pipeline options will not be recognised -->
          <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"></transformer>
          <!-- add Main-Class to manifest file -->
          <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
            <mainClass>NAME-OF-YOUR-MAIN-CLASS</mainClass>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
</plugin>
java -jar target/[your-jar-name].jar \
--runner=org.apache.beam.runners.dataflow.DataflowRunner \
--tempLocation=[GCS temp folder path] \
--stagingLocation=[GCS staging folder path]