Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
卡夫卡火花流:java.lang.InterruptedException_Java_Apache Spark_Spark Streaming - Fatal编程技术网

卡夫卡火花流:java.lang.InterruptedException

卡夫卡火花流:java.lang.InterruptedException,java,apache-spark,spark-streaming,Java,Apache Spark,Spark Streaming,谈到spark和大数据处理,我是个天真的人。我正在写一份spark工作,我正在使用来自kafka的流,并在该流上进行多次聚合,有时我必须从kafka direct stream创建窗口流 目前我只有两个聚合要执行,但将来我可能会在多个时间间隔的widows上执行许多不同的聚合。 批处理间隔为60秒 下面是我的代码 val streamingContext = new StreamingContext(sc, Seconds(60)); val rawStream = KafkaUtils.cr

谈到spark和大数据处理,我是个天真的人。我正在写一份spark工作,我正在使用来自kafka的流,并在该流上进行多次聚合,有时我必须从kafka direct stream创建窗口流

目前我只有两个聚合要执行,但将来我可能会在多个时间间隔的widows上执行许多不同的聚合。

批处理间隔为60秒

下面是我的代码

val streamingContext = new StreamingContext(sc, Seconds(60));

val rawStream = KafkaUtils.createDirectStream[String, String](
  streamingContext,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams));

// transform creates a new rdd so I thought if I create a new rdd than each of different aggregations will have different rdd's to perform actions
val rViewStream = rawStream.transform((rdd) => rdd);

val rawStreamValues = rViewStream.map(x => x.value());

val windowStream = rawStreamValues.window(Minutes(10), Seconds(60));

windowStream.foreachRDD(windowRDD => ({
  var windowDf = sqlContext.read.json(windowRDD);
  // perform some aggregation on this data frame and push results to redis
}));

rawStream.foreachRDD(x => ({
   // perform some aggregation/transformation on this stream and save to hdfs.
   val stringRDD = x.map(cr => ({
     cr.value();
   }))
   val rawDf = sqlContext.read.json(stringRDD);    <---- exception here
}))
  • 如何解决此错误
  • 我正在寻找关于如何按照最佳实践以最佳方式实现所需场景的建议

  • 起初当我读到这篇文章时,我以为你说的是“我是一个有火花的本地人”,但我只是打了个错字——完全相反的意思!仅供参考,“天真”是一个形容词。:)起初当我读到这篇文章时,我以为你说的是“我是一个有火花的本地人”,但我只是打了个错字——完全相反的意思!仅供参考,“天真”是一个形容词。:)
    Caused by: java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1988)
        at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1089)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.fold(RDD.scala:1083)
        at org.apache.spark.sql.execution.datasources.json.InferSchema$.infer(InferSchema.scala:69)
        at org.apache.spark.sql.DataFrameReader$$anonfun$3.apply(DataFrameReader.scala:329)
        at org.apache.spark.sql.DataFrameReader$$anonfun$3.apply(DataFrameReader.scala:329)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:328)