Apache spark Spark Streaming引发FileNotFoundException_Apache Spark_Spark Streaming

Apache spark Spark Streaming引发FileNotFoundException

apache-spark

Apache spark Spark Streaming引发FileNotFoundException,apache-spark,spark-streaming,Apache Spark,Spark Streaming,群集模式下的Spark Streaming在linux文件系统（GFS-所有节点的共享文件系统）中抛出FileNotFoundException，但在HDFS作为输入时工作正常数据实际上可以从所有工作节点在该路径上访问 JavaPairInputDStream<Text, Text> myDStream = jssc.fileStream(path, Text.class, Text.class, customInputFormat.class, new Function&

群集模式下的Spark Streaming在linux文件系统（GFS-所有节点的共享文件系统）中抛出

FileNotFoundException

，但在HDFS作为输入时工作正常

数据实际上可以从所有工作节点在该路径上访问

JavaPairInputDStream<Text, Text> myDStream =
    jssc.fileStream(path, Text.class, Text.class, customInputFormat.class, new Function<Path, Boolean>() {
      @Override
      public Boolean call(Path v1) throws Exception {
        return Boolean.TRUE;
      }
    }, false);

注意： Spark shell使用此共享文件系统

JavaPairInputDStream<Text, Text> myDStream =
    jssc.fileStream(path, Text.class, Text.class, customInputFormat.class, new Function<Path, Boolean>() {
      @Override
      public Boolean call(Path v1) throws Exception {
        return Boolean.TRUE;
      }
    }, false);

如何解决这个问题？

我猜可能是权限问题

确保在运行作业时，用户（对于主节点或提交作业的机器）具有足够的权限，可以通过ssh连接到工作节点并在工作文件系统上运行r/w/x。

JavaPairInputDStream myDStream=
JavaPairInputDStream<Text, Text> myDStream =
    jssc.fileStream(path, Text.class, Text.class, customInputFormat.class, new Function<Path, Boolean>() {
      @Override
      public Boolean call(Path v1) throws Exception {
        return Boolean.TRUE;
      }
    }, false);

fileStream（路径，Text.class，Text.class，customInputFormat.class，新函数（）{
@凌驾
公共布尔调用（路径v1）引发异常{
返回Boolean.TRUE；
}
}，假）；

在目录路径前缀为

file://

后解析所有节点都具有路径的r/w/x访问权限。