Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 火花流卡夫卡到ES_Apache Spark_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch_Spark Structured Streaming - Fatal编程技术网 elasticsearch,spark-structured-streaming,Apache Spark,elasticsearch,Spark Structured Streaming" /> elasticsearch,spark-structured-streaming,Apache Spark,elasticsearch,Spark Structured Streaming" />

Apache spark 火花流卡夫卡到ES

Apache spark 火花流卡夫卡到ES,apache-spark,elasticsearch,spark-structured-streaming,Apache Spark,elasticsearch,Spark Structured Streaming,我有一个spark流作业,它将从kafka读取数据,并通过Http请求写入elastic 我想验证来自Kafka的每个请求,并根据业务需要更改负载,然后写入弹性搜索 我使用ES Http请求将数据推入弹性搜索。可以指导我如何通过数据框将数据写入ES吗 代码段: val dfInput = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "loc

我有一个spark流作业,它将从kafka读取数据,并通过Http请求写入elastic

我想验证来自Kafka的每个请求,并根据业务需要更改负载,然后写入弹性搜索

我使用ES Http请求将数据推入弹性搜索。可以指导我如何通过数据框将数据写入ES吗

代码段:

val dfInput = spark
  .readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .option("subscribe", "test")
  .option("startingOffsets", "latest")
  .option("group.id", sourceTopicGroupId)
  .option("failOnDataLoss", "false")
  .option("maxOffsetsPerTrigger", maxOffsetsPerTrigger)
  .load()

import spark.implicits._

val resultDf = dfInput
  .withColumn("value", $"value".cast("string"))
  .select("value")

resultDf.writeStream.foreach(new ForeachWriter[Row] {
  override def open(partitionId: Long, version: Long): Boolean = true

  override def process(value: Row): Unit = {
    processEventsData(value.get(0).asInstanceOf[String], deviceIndex, msgIndex, retryOnConflict,auth,refreshInterval,deviceUrl,messageUrl,spark)
  }

  override def close(errorOrNull: Throwable): Unit = {
  }
}).trigger(Trigger.ProcessingTime(triggerPeriod)).start().awaitTermination() //"1 second"
}
这样我们就无法实现性能

有什么办法吗

  • Spark版本2.3.2
  • 卡夫卡20
  • ES版本7.7.0

您可以使用
elasticsearch-spark-20_2.11
,它非常简单,可以获取更多信息


提供更多详细信息,使用的spark版本是什么?和kafka分区等?Spark版本-2.3.2,kafka分区-20,ES版本-7.7.0有没有不使用kafka connect的原因?我想通过Spark数据框将嵌套JSON插入Elasticseatrch中
EsSpark.saveJsonToEs(rdd, index, conf)