Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 将静态数据集与数据流连接时发生火花检查点错误_Java_Apache Spark_Spark Streaming_Hadoop2 - Fatal编程技术网

Java 将静态数据集与数据流连接时发生火花检查点错误

Java 将静态数据集与数据流连接时发生火花检查点错误,java,apache-spark,spark-streaming,hadoop2,Java,Apache Spark,Spark Streaming,Hadoop2,我正在尝试使用Java中的Spark流媒体应用程序。我的Spark应用程序从Hadoop读取连续提要 每隔1分钟使用textFileStream()。 我需要对传入的数据流执行Spark聚合(分组)操作。聚合后,我加入聚合的DStream和RDD 使用从hadoop目录中读取的静态数据集创建的RDD 启用检查点时会出现问题。如果检查点目录为空,则运行正常。运行2-3个批次后,我使用ctrl+c关闭它,然后再次运行。 在第二次运行时,它会立即抛出spark异常:“spark-5063” 以下是sp

我正在尝试使用Java中的Spark流媒体应用程序。我的Spark应用程序从Hadoop读取连续提要 每隔1分钟使用textFileStream()。 我需要对传入的数据流执行Spark聚合(分组)操作。聚合后,我加入聚合的
DStream
RDD
使用从hadoop目录中读取的静态数据集创建的
RDD

启用检查点时会出现问题。如果检查点目录为空,则运行正常。运行2-3个批次后,我使用ctrl+c关闭它,然后再次运行。 在第二次运行时,它会立即抛出spark异常:“spark-5063”

以下是spark应用程序的代码块:

private void compute(JavaSparkContext sc, JavaStreamingContext ssc) {

   JavaRDD<String> distFile = sc.textFile(MasterFile);      
   JavaDStream<String> file = ssc.textFileStream(inputDir);             

   // Read Master file
   JavaRDD<MasterParseLog> masterLogLines = distFile.flatMap(EXTRACT_MASTER_LOGLINES);
   final JavaPairRDD<String, String> masterRDD = masterLogLines.mapToPair(MASTER_KEY_VALUE_MAPPER);

   // Continuous Streaming file
   JavaDStream<ParseLog> logLines = file.flatMap(EXTRACT_CKT_LOGLINES);

   // calculate the sum of required field and generate group sum RDD
   JavaPairDStream<String, Summary> sumRDD = logLines.mapToPair(CKT_GRP_MAPPER);
   JavaPairDStream<String, Summary> grpSumRDD = sumRDD.reduceByKey(CKT_GRP_SUM);

   //GROUP BY Operation
   JavaPairDStream<String, Summary> grpAvgRDD = grpSumRDD.mapToPair(CKT_GRP_AVG);

   // Join Master RDD with the DStream  //This is the block causing error (without it code is working fine)
   JavaPairDStream<String, Tuple2<String, String>> joinedStream = grpAvgRDD.transformToPair(

       new Function2<JavaPairRDD<String, String>, Time, JavaPairRDD<String, Tuple2<String, String>>>() {

           private static final long serialVersionUID = 1L;

           public JavaPairRDD<String, Tuple2<String, String>> call(
               JavaPairRDD<String, String> rdd, Time v2) throws Exception {
               return masterRDD.value().join(rdd);
           }
       }
   );
   joinedStream.print(10);
}

public static void main(String[] args) {

   JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() {
        public JavaStreamingContext create() {

           // Create the context with a 60 second batch size
           SparkConf sparkConf = new SparkConf();
           final JavaSparkContext sc = new JavaSparkContext(sparkConf);
           JavaStreamingContext ssc1 = new JavaStreamingContext(sc, Durations.seconds(duration));               

           app.compute(sc, ssc1);

           ssc1.checkpoint(checkPointDir);                       
           return ssc1;
        }
   };

   JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkPointDir, contextFactory);

   // start the streaming server
   ssc.start();
   logger.info("Streaming server started...");

   // wait for the computations to finish
   ssc.awaitTermination();
   logger.info("Streaming server stopped...");
}
private void compute(JavaSparkContext sc、JavaStreamingContext ssc){
javarddistfile=sc.textFile(主文件);
JavaDStream file=ssc.textFileStream(inputDir);
//读取主文件
JavaRDD masterLogLines=distFile.flatMap(EXTRACT\u MASTER\u LOGLINES);
final JavaPairRDD masterRDD=masterLogLines.mapToPair(MASTER\u KEY\u VALUE\u MAPPER);
//连续流文件
JavaDStream logLines=file.flatMap(EXTRACT_CKT_logLines);
//计算必填字段的总和并生成组总和RDD
JavaPairDStream sumRDD=logLines.mapToPair(CKT\u GRP\u MAPPER);
JavaPairDStream grpSumRDD=sumRDD.reduceByKey(CKT\u GRP\u SUM);
//分组操作
JavaPairDStream grpAvgRDD=grpSumRDD.mapToPair(CKT_GRP_AVG);
//将主RDD与数据流联接//这是导致错误的块(没有它代码工作正常)
JavaPairDStream joinedStream=grpAvgRDD.transformToPair(
新功能2(){
私有静态最终长serialVersionUID=1L;
公共JavaPairdd调用(
JavaPairdd rdd,时间v2)引发异常{
返回masterRDD.value().join(rdd);
}
}
);
joinedStream.print(10);
}
公共静态void main(字符串[]args){
JavaStreamingContextFactory contextFactory=新JavaStreamingContextFactory(){
公共JavaStreamingContext创建(){
//创建批大小为60秒的上下文
SparkConf SparkConf=新SparkConf();
最终JavaSparkContext sc=新的JavaSparkContext(sparkConf);
JavaStreamingContext ssc1=新的JavaStreamingContext(sc,Durations.seconds(duration));
应用计算(sc、ssc1);
ssc1.检查点(checkPointDir);
返回ssc1;
}
};
JavaStreamingContext ssc=JavaStreamingContext.getOrCreate(checkPointDir,contextFactory);
//启动流媒体服务器
ssc.start();
logger.info(“流媒体服务器已启动…”);
//等待计算完成
ssc.终止();
logger.info(“流媒体服务器已停止…”);
}
我知道将静态数据集和数据流连接在一起的代码块会导致错误,但这是从spark流中提取的 Apache spark网站的页面(在“连接操作”下的子标题“流数据集连接”)。请帮助我让它工作,即使 做这件事有不同的方法。我需要在我的流应用程序中启用检查点

环境详细信息:

  • Centos6.5:2节点群集
  • Java:1.8
  • 火花:1.4.1
  • Hadoop:2.7.1*

最新文档中的示例是transform而不是transformtopair,您尝试过吗?我已经尝试过transform(),但它对我不起作用,建议使用transformtopair()。听起来像是类似的问题:
private void compute(JavaSparkContext sc, JavaStreamingContext ssc) {

   JavaRDD<String> distFile = sc.textFile(MasterFile);      
   JavaDStream<String> file = ssc.textFileStream(inputDir);             

   // Read Master file
   JavaRDD<MasterParseLog> masterLogLines = distFile.flatMap(EXTRACT_MASTER_LOGLINES);
   final JavaPairRDD<String, String> masterRDD = masterLogLines.mapToPair(MASTER_KEY_VALUE_MAPPER);

   // Continuous Streaming file
   JavaDStream<ParseLog> logLines = file.flatMap(EXTRACT_CKT_LOGLINES);

   // calculate the sum of required field and generate group sum RDD
   JavaPairDStream<String, Summary> sumRDD = logLines.mapToPair(CKT_GRP_MAPPER);
   JavaPairDStream<String, Summary> grpSumRDD = sumRDD.reduceByKey(CKT_GRP_SUM);

   //GROUP BY Operation
   JavaPairDStream<String, Summary> grpAvgRDD = grpSumRDD.mapToPair(CKT_GRP_AVG);

   // Join Master RDD with the DStream  //This is the block causing error (without it code is working fine)
   JavaPairDStream<String, Tuple2<String, String>> joinedStream = grpAvgRDD.transformToPair(

       new Function2<JavaPairRDD<String, String>, Time, JavaPairRDD<String, Tuple2<String, String>>>() {

           private static final long serialVersionUID = 1L;

           public JavaPairRDD<String, Tuple2<String, String>> call(
               JavaPairRDD<String, String> rdd, Time v2) throws Exception {
               return masterRDD.value().join(rdd);
           }
       }
   );
   joinedStream.print(10);
}

public static void main(String[] args) {

   JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() {
        public JavaStreamingContext create() {

           // Create the context with a 60 second batch size
           SparkConf sparkConf = new SparkConf();
           final JavaSparkContext sc = new JavaSparkContext(sparkConf);
           JavaStreamingContext ssc1 = new JavaStreamingContext(sc, Durations.seconds(duration));               

           app.compute(sc, ssc1);

           ssc1.checkpoint(checkPointDir);                       
           return ssc1;
        }
   };

   JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkPointDir, contextFactory);

   // start the streaming server
   ssc.start();
   logger.info("Streaming server started...");

   // wait for the computations to finish
   ssc.awaitTermination();
   logger.info("Streaming server stopped...");
}