Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark Spark结构化流媒体卡夫卡avro制作人_Apache Spark_Kafka Producer Api_Spark Structured Streaming - Fatal编程技术网

Apache spark Spark结构化流媒体卡夫卡avro制作人

Apache spark Spark结构化流媒体卡夫卡avro制作人,apache-spark,kafka-producer-api,spark-structured-streaming,Apache Spark,Kafka Producer Api,Spark Structured Streaming,我有一个数据帧,比如说: val someDF = Seq( (8, "bat"), (64, "mouse"), (-27, "horse") ).toDF("number", "word") 我想使用avro序列化和schema注册表将该数据帧发送到kafka主题。我相信我就快到了,但我似乎无法克服任务不可序列化的错误。我知道kafka有一个接收器,但它不与模式注册表通信,这是一个要求 object Holder extends Serializable{ def prop

我有一个数据帧,比如说:

val someDF = Seq(
  (8, "bat"),
  (64, "mouse"),
  (-27, "horse")
).toDF("number", "word")
我想使用avro序列化和schema注册表将该数据帧发送到kafka主题。我相信我就快到了,但我似乎无法克服任务不可序列化的错误。我知道kafka有一个接收器,但它不与模式注册表通信,这是一个要求

object Holder extends Serializable{
  def prop(): java.util.Properties = {
    val props = new Properties()
    props.put("schema.registry.url", schemaRegistryURL)
    props.put("key.serializer", classOf[KafkaAvroSerializer].getCanonicalName)
    props.put("value.serializer", classOf[KafkaAvroSerializer].getCanonicalName)
    props.put("schema.registry.url", schemaRegistryURL)
    props.put("bootstrap.servers", brokers)
    props
  }

  def vProps(props: java.util.Properties): kafka.utils.VerifiableProperties = {
    val vProps = new kafka.utils.VerifiableProperties(props)
  vProps
  }

  def messageSchema(vProps: kafka.utils.VerifiableProperties): org.apache.avro.Schema = {
    val ser = new KafkaAvroEncoder(vProps)
    val avro_schema = new RestService(schemaRegistryURL).getLatestVersion(subjectValueName)
    val messageSchema = new Schema.Parser().parse(avro_schema.getSchema)
    messageSchema
  }

  def avroRecord(messageSchema: org.apache.avro.Schema): org.apache.avro.generic.GenericData.Record = {
    val avroRecord = new GenericData.Record(messageSchema)
    avroRecord
  }

  def ProducerRecord(avroRecord:org.apache.avro.generic.GenericData.Record): org.apache.kafka.clients.producer.ProducerRecord[org.apache.avro.generic.GenericRecord,org.apache.avro.generic.GenericRecord] = {
    val record = new ProducerRecord[GenericRecord, GenericRecord](topicWrite, avroRecord)
    record
  }

  def producer(props: java.util.Properties): KafkaProducer[GenericRecord, GenericRecord] = {
    val producer = new KafkaProducer[GenericRecord, GenericRecord](props)
    producer
  }
}

val prod:  (String, String) => String = (
  number: String,
  word: String,
   ) => {
  val prop = Holder.prop()
  val vProps = Holder.vProps(prop)
  val mSchema = Holder.messageSchema(vProps)
  val aRecord = Holder.avroRecord(mSchema)
  aRecord.put("number", number)
  aRecord.put("word", word)
  val record = Holder.ProducerRecord(aRecord)
  val producer = Holder.producer(prop)
  producer.send(record)
  "sent"
}

val prodUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
  udf((
  Number: String,
  word: String,
 ) => prod(number,word))


val testDF = firstDF.withColumn("sent", prodUDF(col("number"), col("word")))

KafkaProducer不可序列化。
在prod()内部创建KafkaProducer,而不是在外部创建。

您解决了这个问题吗?老实说,我不记得了。我可以看看