Apache flink Flink Shaded Hadoop S3文件系统仍然需要hdfs默认路径和hdfs站点配置路径

Apache flink Flink Shaded Hadoop S3文件系统仍然需要hdfs默认路径和hdfs站点配置路径,apache-flink,flink-streaming,Apache Flink,Flink Streaming,我正在尝试使用Flink 1.6.0将S3配置为我的状态后端 flink-conf.yaml state.backend: filesystem state.checkpoints.dir: s3://***/flink-checkpoints state.savepoints.dir: s3://***/flink-savepoints s3.access-key: ******* s3.secret-key: ******* 我已经将flink-s3-fs-hadoop-1.6.0.j

我正在尝试使用Flink 1.6.0将S3配置为我的状态后端

flink-conf.yaml
state.backend: filesystem
state.checkpoints.dir: s3://***/flink-checkpoints
state.savepoints.dir: s3://***/flink-savepoints

s3.access-key: *******
s3.secret-key: *******

我已经将flink-s3-fs-hadoop-1.6.0.jar移动到lib目录。这些文档没有为这个特定的方法指定对hadoop配置文件的任何需求。然而,我面临着这个错误,抱怨缺少hadoop配置路径

2018-08-24 23:25:17,829 INFO org.apache.flink.streaming.runtime.tasks.StreamTask           - State backend is set to heap memory (checkpoints to filesystem "s3://***/flink-checkpoints")
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory  - Creating Hadoop file system (backed by Hadoop s3a file system)
2018-08-24 23:25:17,831 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory  - Loading Hadoop configuration for Hadoop s3a file system
2018-08-24 23:25:17,872 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Cannot find hdfs-default configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Cannot find hdfs-site configuration-file path in Flink config.
2018-08-24 23:25:17,873 DEBUG org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util.HadoopUtils  - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2018-08-24 23:25:17,878 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Custom Source -> Map -> Sink: Print to Std. Out (1/1) (ee0eeb00ea0f01043d90f6b8d3c0cc2e) switched from RUNNING to FAILED.
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
        at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
        at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
        at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.loadHadoopConfigFromFlink(HadoopConfigLoader.java:101)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.getOrLoadHadoopConfig(HadoopConfigLoader.java:80)
        at org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory.create(AbstractFileSystemFactory.java:55)
        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:395)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
        at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:61)
        at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:443)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:257)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
        at java.lang.Thread.run(Thread.java:748)
2018-08-24 23:25:17829 INFO org.apache.flink.streaming.runtime.tasks.StreamTask-状态后端设置为堆内存(文件系统的检查点“s3://***/flink检查点”)
2018-08-24 23:25:17831调试org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory-创建Hadoop文件系统(由Hadoop s3a文件系统支持)
2018-08-24 23:25:17831调试org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory-加载Hadoop s3a文件系统的Hadoop配置
2018-08-24 23:25:17872调试org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.util.HadoopUtils-在flink配置中找不到hdfs默认配置文件路径。
2018-08-24 23:25:17873调试org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.util.HadoopUtils-在flink config中找不到hdfs站点配置文件路径。
2018-08-24 23:25:17873调试org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.util.HadoopUtils-无法通过任何支持的方法(flink配置、环境变量)找到Hadoop配置。
2018-08-24 23:25:17878 INFO org.apache.flink.runtime.taskmanager.Task-源:自定义源->映射->接收器:打印到标准输出(1/1)(ee0eeb00ea0f01043d90f6b8d3c0cc2e)从运行切换到失败。
javax.xml.parsers.FactoryConfigurationError:无法创建类javax.xml.parsers.DocumentBuilderFactory的提供程序
位于javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
位于javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
位于javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.loadhadoopconfigfromfrink(HadoopConfigLoader.java:101)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.fs.hdfs.HadoopConfigLoader.getOrLoadHadoopConfig(HadoopConfigLoader.java:80)
位于org.apache.flink.fs.s3hadoop.shade.org.apache.flink.runtime.fs.hdfs.AbstractFileSystemFactory.create(AbstractFileSystemFactory.java:55)
位于org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:395)
位于org.apache.flink.core.fs.FileSystem.get(FileSystem.java:318)
位于org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
位于org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.(FsCheckpointStorage.java:61)
位于org.apache.flink.runtime.state.filesystem.fsstatebend.createCheckpointStorage(fsstatebend.java:443)
位于org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:257)
位于org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
运行(Thread.java:748)

我是不是遗漏了什么?感谢您的帮助。

破坏了我的依赖关系,这就是导致此无关异常的原因。正在试验需要Hadoop依赖项的Bucketing和Rolling Sink连接器。将它们添加到maven提供的范围中,无法从IntelliJ IDEA运行它们。因此,将它们转换为编译,并保持原样。他们打包了工件罐的一部分,导致了这个问题

经验教训:永远不要在默认(编译)范围中添加Hadoop依赖项。IntelliJ在运行配置中有一个选项,用于在提供的范围中声明包含的依赖项