Scala 分析从数据流到ElasticSearch的数据时出错
我一直在尝试解析来自spark stream(TCP)的数据流,并将其发送到elastic search。我收到一个错误Scala 分析从数据流到ElasticSearch的数据时出错,scala,
elasticsearch,apache-spark,spark-streaming,Scala,
elasticsearch,Apache Spark,Spark Streaming,我一直在尝试解析来自spark stream(TCP)的数据流,并将其发送到elastic search。我收到一个错误org.elasticsearch.hadoop.rest.eshadopinvalidRequest:发现无法恢复的错误[127.0.0.1:9200]返回了错误请求(400)-解析失败;纾困.. 以下是我的代码: import org.apache.spark._ import org.apache.spark.streaming._ import org.apache.s
org.elasticsearch.hadoop.rest.eshadopinvalidRequest:发现无法恢复的错误[127.0.0.1:9200]返回了错误请求(400)-解析失败;纾困..
以下是我的代码:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.SparkContext
import org.apache.spark.serializer.KryoSerializer;
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.rdd.EsSpark
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.spark.TaskContext
import org.elasticsearch.common.transport.InetSocketTransportAddress;
object Test {
case class createRdd(Message: String, user: String)
def main(args:Array[String]) {
val mapper=new ObjectMapper()
val SparkConf = new SparkConf().setAppName("NetworkWordCount").setMaster("local[*]")
SparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
SparkConf.set("es.nodes","localhost:9200")
SparkConf.set("es.index.auto.create", "true")
// Create a local StreamingContext with batch interval of 10 second
val ssc = new StreamingContext(SparkConf, Seconds(10))
/* Create a DStream that will connect to hostname and port, like localhost 9999. As stated earlier, DStream will get created from StreamContext, which in return is created from SparkContext. */
val lines = ssc.socketTextStream("localhost",9998)
// Using this DStream (lines) we will perform transformation or output operation.
val words = lines.map(_.split(" "))
words.foreachRDD(_.saveToEs("spark/test"))
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}
错误如下:
16/10/17 11:02:30 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
16/10/17 11:02:30 INFO BlockManager: Found block input-0-1476682349200 locally
16/10/17 11:02:30 INFO Version: Elasticsearch Hadoop v5.0.0.BUILD.SNAPSHOT [4282a0194a]
16/10/17 11:02:30 INFO EsRDDWriter: Writing to [spark/test]
16/10/17 11:02:30 ERROR TaskContextImpl: Error in TaskCompletionListener
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - failed to parse; Bailing out..
at org.elasticsearch.hadoop.rest.RestClient.processBulkResponse(RestClient.java:250)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:202)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:220)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:242)
at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:267)
at org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestService.java:120)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:42)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:42)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我正在scala上编写代码。我找不到错误的原因。请帮我解决这个例外
谢谢。您正在尝试将字符串推送到elasticsearch。Spark只能将配对rdd或数据帧推送到elasticsearch。嗨,我正在尝试发送一条流到lines以激发并将其拆分为文字。我只是不知道如何将单词分离出来,并将其映射到一个键,然后按到ES@anshul_cached天哪,你救了我一天,感谢你无数次的提醒。你可以发布你的更新代码@habitats。你正在尝试将字符串推到elasticsearch。Spark只能将配对rdd或数据帧推送到elasticsearch。嗨,我正在尝试发送一条流到lines以激发并将其拆分为文字。我只是不知道如何将单词分离出来,并将其映射到一个键,然后按到ES@anshul_cached妈的,你救了我一天,谢谢你无数次你能把你的更新代码发到@Habitates吗