Java 从fat jar启动数据流作业时转移包时出错

Java 从fat jar启动数据流作业时转移包时出错,java,maven,executable-jar,google-cloud-dataflow,apache-beam,Java,Maven,Executable Jar,Google Cloud Dataflow,Apache Beam,我创建了一个maven项目来执行管道。如果我运行main类,管道工作得很好。如果我创建一个胖jar并执行它,我会有两个不同的错误,一个是在Windows下执行,另一个是在Linux下执行 在Windows下: Exception in thread "main" java.lang.RuntimeException: Error while staging packages at org.apache.beam.runners.dataflow.util.PackageUtil.stag

我创建了一个maven项目来执行管道。如果我运行main类,管道工作得很好。如果我创建一个胖jar并执行它,我会有两个不同的错误,一个是在Windows下执行,另一个是在Linux下执行

在Windows下:

Exception in thread "main" java.lang.RuntimeException: Error while staging packages
    at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:364)
    at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:261)
    at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:66)
    at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:517)
    at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:170)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
    at ....
Caused by: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://MY_BUCKET/staging
    at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
    at sun.nio.fs.WindowsPath.parse(Unknown Source)
    at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
    at java.nio.file.Paths.get(Unknown Source)
    at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:196)
    at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:78)
    at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:563)
    at org.apache.beam.runners.dataflow.util.PackageUtil$PackageAttributes.forFileToStage(PackageUtil.java:452)
    at org.apache.beam.runners.dataflow.util.PackageUtil$1.call(PackageUtil.java:147)
    at org.apache.beam.runners.dataflow.util.PackageUtil$1.call(PackageUtil.java:138)
    at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
    at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
    at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
这是我的pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>xxxxxxxxxxx</groupId>
  <artifactId>xxxxxxxxx</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
        <!-- https://mvnrepository.com/artifact/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-all -->
        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
          <groupId>com.fasterxml.jackson.core</groupId>
          <artifactId>jackson-core</artifactId>
          <version>2.9.3</version>
        </dependency>

        <dependency>
          <groupId>com.fasterxml.jackson.core</groupId>
          <artifactId>jackson-databind</artifactId>
          <version>2.9.3</version>
        </dependency>
        <dependency>
            <groupId>com.google.appengine</groupId>
            <artifactId>appengine-api-1.0-sdk</artifactId>
            <version>1.9.60</version>
        </dependency>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-datastore</artifactId>
            <version>1.15.0</version>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>javax.servlet-api</artifactId>
            <version>4.0.0</version>
        </dependency>

    </dependencies>
    <build>
        <finalName>myFatJar</finalName>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <transformers>
                        <transformer implementation= "org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>com.myclass.MyClass</mainClass>
                        </transformer>
                    </transformers>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>
</project>
我尝试使用GcpTempLocation更改模板位置,但如果更改,则出现以下错误:

java.lang.IllegalArgumentException: BigQueryIO.Write needs a GCS temp location to store temp files.
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
        at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.validate(BatchLoads.java:191)
        at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:621)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:651)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
        at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
        at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:446)
        at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:563)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:302)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
        at ...
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:748)

我该怎么办?

此评论解决了我的问题:

您是否尝试将DataflowRunner的Apache Beam工件显式添加到pom.xml中安德鲁


在这里添加第二个答案,可以更广泛地解决这个问题。 我有一种预感,S.M.将依赖项提取到上面pom文件的顶层的方法恰好绕过了不将shade
ServiceResourceTransformer
ManifestResourceTransformer
结合使用的问题

然而,没有看到S.M.的最终pom文件,我不能确定

无论如何,我已经包括了对我有用的shade build插件配置:

<build>
    <pluginManagement>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.0.0</version>
                <executions>
                    <execution>
                        <id>generate-runner</id>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <finalName>${project.artifactId}${runner.suffix}</finalName>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/LICENSE</exclude>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>${runner.class}</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </pluginManagement>
</build>

org.apache.maven.plugins

  • 我让它在Windows和Linux上工作


  • 根据Linux环境提供的堆栈跟踪,您似乎没有提供有效的GCS路径:
    '/home/USER/gs:/MY_BUCKET/temp/staging/'
    。不知何故,应用程序正在用户主目录的根目录中查找路径。你有任何迹象表明为什么会这样吗?您是否尝试将DataflowRunner的Apache Beam工件显式添加到
    pom.xml
    ?谢谢!将google cloud dataflow java sdk全部替换为其所有apache beam依赖项,解决我的问题@S.M.您包含了什么版本的google云数据流java sdk所有依赖项?版本2.2.0?@Max是的,正如您从我的pom.xml中所看到的,我以前包含了版本2.2.0,但我使用beam Runner google cloud dataflow java和其他apache beam依赖项解决了我的问题。您能在这里发布生成的/完整的pom吗?
    java.lang.IllegalArgumentException: BigQueryIO.Write needs a GCS temp location to store temp files.
            at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
            at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.validate(BatchLoads.java:191)
            at org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:621)
            at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:651)
            at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655)
            at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:655)
            at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
            at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
            at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:446)
            at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:563)
            at org.apache.beam.sdk.Pipeline.run(Pipeline.java:302)
            at org.apache.beam.sdk.Pipeline.run(Pipeline.java:289)
            at ...
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
            at java.lang.Thread.run(Thread.java:748)
    
    <build>
        <pluginManagement>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>3.0.0</version>
                    <executions>
                        <execution>
                            <id>generate-runner</id>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <finalName>${project.artifactId}${runner.suffix}</finalName>
                                <filters>
                                    <filter>
                                        <artifact>*:*</artifact>
                                        <excludes>
                                            <exclude>META-INF/LICENSE</exclude>
                                            <exclude>META-INF/*.SF</exclude>
                                            <exclude>META-INF/*.DSA</exclude>
                                            <exclude>META-INF/*.RSA</exclude>
                                        </excludes>
                                    </filter>
                                </filters>
                                <transformers>
                                    <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                                    <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <mainClass>${runner.class}</mainClass>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
    
    <dependency>
        <groupId>com.google.cloud.dataflow</groupId>
        <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
        <version>2.5.0</version>
    </dependency>