Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Google cloud dataflow 为什么Apache Beam中的CustomOptions不继承DataflowPipelineOptions默认属性?_Google Cloud Dataflow_Apache Beam_Dataflow_Apache Beam Io - Fatal编程技术网

Google cloud dataflow 为什么Apache Beam中的CustomOptions不继承DataflowPipelineOptions默认属性?

Google cloud dataflow 为什么Apache Beam中的CustomOptions不继承DataflowPipelineOptions默认属性?,google-cloud-dataflow,apache-beam,dataflow,apache-beam-io,Google Cloud Dataflow,Apache Beam,Dataflow,Apache Beam Io,我是Apache Beam的新手,尝试使用DirectRunner和DataflowRunner运行一个示例读写程序。在我的用例中,很少有CLI参数,为了实现这一点,我创建了一个接口“CustomOptions.java”,它扩展了PipelineOptions 使用DirectRunner,程序运行正常,但使用DataflowRunner时,会显示“接口CustomOptions缺少名为“project”的属性” pom.xml <dependencies> <dep

我是Apache Beam的新手,尝试使用DirectRunner和DataflowRunner运行一个示例读写程序。在我的用例中,很少有CLI参数,为了实现这一点,我创建了一个接口“CustomOptions.java”,它扩展了PipelineOptions

使用DirectRunner,程序运行正常,但使用DataflowRunner时,会显示“接口CustomOptions缺少名为“project”的属性”

pom.xml

<dependencies>
    <dependency>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.2.0</version>
        <type>maven-plugin</type>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.16.0</version>
    </dependency>

</dependencies>
WordCount.java

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptionsFactory;

public class WordCount {

    public static void main(String args[]) {
        PipelineOptionsFactory.register(CustomOptions.class);
        CustomOptions options = PipelineOptionsFactory.fromArgs(args).as(CustomOptions.class);
        Pipeline p = Pipeline.create(options);

        p.apply("Read", TextIO.read().from(options.getInput()))
                .apply("Write", TextIO.write().to(options.getOutput()));

        p.run();
    }
}
命令:

DirectRunner (Working) : java -cp jarPath WordCount --input=inputPath --output=outputPath
DataflowRunner (Not Working) : java -cp jarPath WordCount --input=inputPath --output=outputPath --runner=DataflowRunner --stagingLocation=gs://<tmp_path> --project=<projectId>
我尝试的第二件事是用DataflowPipelineOptions而不是PipelineOptions扩展CustomOptions。同时使用此选项,我得到一个错误:

Exception in thread "main" java.lang.IllegalArgumentException: No filesystem found for scheme gs
    at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:463)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
    at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:215)
    at org.apache.beam.sdk.io.TextIO$TypedWrite.to(TextIO.java:734)
    at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:1069)
    at WordCount.main(WordCount.java:15)

第二次试验还提出了一个问题,即不能使用DirectRunner和DataflowRunner执行相同的代码。因为在第二种情况下,“projectId”是一个强制参数,不会在DirectRunner中指定。

经过几次尝试和错误,我认为我得到了正确的结果。 我使用的java类与问题中提到的相同,即使用PipelineOptions扩展CustomOptions.java。我所做的唯一更改是在pom.xml中

现在我使用的是maven shade插件,没有多少额外的配置,而不是maven assembly插件。通过这些,我取得了以下成就: 1.相同的jar可用于DirectRunner或DataflowRunner。 2.说明要从命令行执行的主类

上一个“pom.xml”:

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.2.0</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id> <!-- this is used for inheritance merges -->
                    <phase>package</phase> <!-- bind to the packaging phase -->
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <!-- add Main-Class to manifest file -->
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.dh.WordCount</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>

    </plugins>
</build>

<dependencies>
    <dependency>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.2.0</version>
        <type>maven-plugin</type>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.16.0</version>
    </dependency>

</dependencies>
<build>
    <plugins>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>

    </plugins>
</build>

<dependencies>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.16.0</version>
    </dependency>

</dependencies>


经过几次尝试和错误,我认为我得到了正确的答案。 我使用的java类与问题中提到的相同,即使用PipelineOptions扩展CustomOptions.java。我所做的唯一更改是在pom.xml中

现在我使用的是maven shade插件,没有多少额外的配置,而不是maven assembly插件。通过这些,我取得了以下成就: 1.相同的jar可用于DirectRunner或DataflowRunner。 2.说明要从命令行执行的主类

上一个“pom.xml”:

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>3.2.0</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id> <!-- this is used for inheritance merges -->
                    <phase>package</phase> <!-- bind to the packaging phase -->
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <!-- add Main-Class to manifest file -->
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.dh.WordCount</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>

    </plugins>
</build>

<dependencies>
    <dependency>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.2.0</version>
        <type>maven-plugin</type>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.16.0</version>
    </dependency>

</dependencies>
<build>
    <plugins>

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>

    </plugins>
</build>

<dependencies>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
        <version>2.16.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.16.0</version>
    </dependency>

</dependencies>


在第一种情况下,您是否可以删除--project=只是为了澄清,当您看到“没有为scheme gs找到文件系统”错误时,您是否扩展了DataflowPipelineOptions并在DataflowRunner上运行了它?如果您正在扩展DataflowPipelineOptions,我预计不会发生该错误。请您澄清一下(1)您使用了两个命令行中的哪一个,以及(2)当您看到该错误时您正在扩展哪个选项类,好吗?我不确定您是否可以将DataflowPipelineOptions与DirectRunner一起使用。如果它要求您在DirectRunner中传入参数,如--project,那么如果您传入一个未使用的占位符值,它可能会起作用。尽管我认为--project参数用于源和汇,如果它们向GCP服务读/写数据的话。在这种情况下,您需要指定一个有效值。如果失败,您可以有两个主程序来交换选项类,分别是DataflowRunner和DirectRunner。@JayadeepJayaraman如果我删除--project=,它会为key--stagingLocation=引发另一个异常。它声明CustomOptions.java没有键“stagingLocation”。@AlexAmato是的,你说得对,“没有为scheme gs找到文件系统”错误在我扩展DataflowPipelineOptions时出现。我正在使用这两个命令,一个用于DirectRunner,另一个用于DataflowRunner。在第一种情况下,您是否可以删除--project=只是为了澄清,当您看到“没有为scheme gs找到文件系统”错误时,您是否扩展了DataflowPipelineOptions并在DataflowRunner上运行它?如果您正在扩展DataflowPipelineOptions,我预计不会发生该错误。请您澄清一下(1)您使用了两个命令行中的哪一个,以及(2)当您看到该错误时您正在扩展哪个选项类,好吗?我不确定您是否可以将DataflowPipelineOptions与DirectRunner一起使用。如果它要求您在DirectRunner中传入参数,如--project,那么如果您传入一个未使用的占位符值,它可能会起作用。尽管我认为--project参数用于源和汇,如果它们向GCP服务读/写数据的话。在这种情况下,您需要指定一个有效值。如果失败,您可以有两个主程序来交换选项类,分别是DataflowRunner和DirectRunner。@JayadeepJayaraman如果我删除--project=,它会为key--stagingLocation=引发另一个异常。它声明CustomOptions.java没有键“stagingLocation”。@AlexAmato是的,你说得对,“没有为scheme gs找到文件系统”错误在我扩展DataflowPipelineOptions时出现。我使用这两个命令,一个用于DirectRunner,另一个用于DataflowRunner。