Scala ApacheFlink:如何将数据流导入Google云存储文件系统
我正试图在Google云存储文件系统中将一些数据流写入一个文件,如下所示(使用Flink 1.8和Scala 2.11): 但我面临以下错误:Scala ApacheFlink:如何将数据流导入Google云存储文件系统,scala,google-cloud-platform,google-cloud-storage,apache-flink,Scala,Google Cloud Platform,Google Cloud Storage,Apache Flink,我正试图在Google云存储文件系统中将一些数据流写入一个文件,如下所示(使用Flink 1.8和Scala 2.11): 但我面临以下错误: Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(J
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:638)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
Caused by: java.lang.RuntimeException: Error while creating FileSystem when initializing the state of the BucketingSink.
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:379)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:403)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1227)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:432)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:376)
... 8 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot support file system for 'gs' via Hadoop, because Hadoop is not in the classpath, or some classes are missing from the classpath.
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:179)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:399)
... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/HdfsConfiguration
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:85)
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
我看到了一些关于这方面的问题,我也看到了:
环境变量:
-FLINK_CONF_DIR
文件flink-conf.yaml:
-fs.hdfs.hadoopconf:src/main/resources/core-site.xml
core-site.xml:
> <property>
> <name>fs.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.
> GoogleHadoopFileSystem</value>
> <description>The FileSystem for gs: (GCS) uris.</description>
> </property>
> <property>
> <name>fs.AbstractFileSystem.gs.impl</name>
> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
> <description>The AbstractFileSystem for gs: (GCS)
> uris.</description>
>
>fs.gs.impl
>com.google.cloud.hadoop.fs.gcs。
>GoogleHadoop文件系统
>gs:(GCS)URI的文件系统。
>
>
>fs.AbstractFileSystem.gs.impl
>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
>gs的抽象文件系统:(GCS)
>URI。
以下是我的pom依赖项:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-lang/commons-lang -->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-1.9.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-hadoop-fs -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-fs</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-filesystem -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
</dependencies>
org.apache.flink
flink-scala_2.11
1.8.0
org.apache.flink
flink-streaming-scala_2.11
1.8.0
公地郎
公地郎
2.6
com.google.cloud.bigdataoss
地面军事系统连接器
Hadoop 3-1.9.16
org.apache.flink
flink hadoop fs
1.8.0
org.apache.hadoop
hadoop内核
1.2.1
org.apache.flink
flink-connector-U 2.11
1.8.0
org.apache.hadoop
hadoop hdfs
3.2.0
com.google.cloud
谷歌云存储
1.35.0
有什么帮助吗?根据您发布的堆栈跟踪,我发现您在尝试使用Flink和Scala写入GCS容器时遇到问题 所以问题解决的地方也有类似的问题,请检查一下 如果您还有其他问题,请不要犹豫
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-lang/commons-lang -->
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-1.9.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-hadoop-fs -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-fs</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-connector-filesystem -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
<version>1.35.0</version>
</dependency>
</dependencies>