Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/sql-server-2005/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何解析Spark 1.6中从Kafka流接收的Spark流中的Proto Buf Mesage_Apache Spark_Apache Kafka_Protocol Buffers_Spark Streaming_Kafka Producer Api - Fatal编程技术网

Apache spark 如何解析Spark 1.6中从Kafka流接收的Spark流中的Proto Buf Mesage

Apache spark 如何解析Spark 1.6中从Kafka流接收的Spark流中的Proto Buf Mesage,apache-spark,apache-kafka,protocol-buffers,spark-streaming,kafka-producer-api,Apache Spark,Apache Kafka,Protocol Buffers,Spark Streaming,Kafka Producer Api,嗨,我在做一个Spark流媒体项目。在这个项目中,我希望从Kafka流(Proto-Buf消息)接收解析数据 我不知道如何解析卡夫卡的原版Buf Mesage 我试图理解下面的代码来开始解析protobuf消息 def main(参数:数组[字符串]){ } 有人能给我举一些例子来说明如何一步一步地解析原始buf消息吗? 我只需要一些参考资料,了解如何在spark流媒体应用程序中使用它 我以这种方式使用结构化流媒体: import MessagesProto #Your proto.py fil

嗨,我在做一个Spark流媒体项目。在这个项目中,我希望从Kafka流(Proto-Buf消息)接收解析数据

我不知道如何解析卡夫卡的原版Buf Mesage

我试图理解下面的代码来开始解析protobuf消息

def main(参数:数组[字符串]){

}

有人能给我举一些例子来说明如何一步一步地解析原始buf消息吗?
我只需要一些参考资料,了解如何在spark流媒体应用程序中使用它

我以这种方式使用结构化流媒体:

import MessagesProto #Your proto.py file
from datetime import datetime as dt
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql.functions import udf


def message_proto(value):
    m = MessagesProto.message_x()
    m.ParseFromString(value)
    return({'x': y,
            'z': w
           })
schema_impressions = StructType() \
    .add("x", StringType()) \
    .add("z", TimestampType())

proto_udf = udf(message_proto, schema_impressions)

class StructuredStreaming():

    def structured_streming(self):       

        stream = self.spark.readStream \
          .format("kafka") \
          .option("kafka.bootstrap.servers", self.kafka_bootstrap_servers) \
          .option("subscribe", self.topic) \
          .option("startingOffsets", self.startingOffsets) \
          .option("max.poll.records", self.max_poll_records) \
          .option("auto.commit.interval.ms", self.auto_commit_interval_ms) \
          .option("session.timeout.ms", self.session_timeout_ms) \
          .option("key.deserializer", self.key_deserializer) \
          .option("value.deserializer", self.value_deserializer) \
          .load()

        self.query = stream \
        .select(col("value")) \
        .select(proto_udf("value").alias("value_udf")) \
        .select("value_udf.x", "valued_udf.y)

在您的示例中,Student.parseFrom(_)执行解析。没有更多的步骤-不确定你在问什么?或者,您可以创建一个反序列化器并将其传递到kafka配置中,然后您将接收回学生对象,而不是字节数组。有关protobuf的参考资料,请参见:
import MessagesProto #Your proto.py file
from datetime import datetime as dt
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql.functions import udf


def message_proto(value):
    m = MessagesProto.message_x()
    m.ParseFromString(value)
    return({'x': y,
            'z': w
           })
schema_impressions = StructType() \
    .add("x", StringType()) \
    .add("z", TimestampType())

proto_udf = udf(message_proto, schema_impressions)

class StructuredStreaming():

    def structured_streming(self):       

        stream = self.spark.readStream \
          .format("kafka") \
          .option("kafka.bootstrap.servers", self.kafka_bootstrap_servers) \
          .option("subscribe", self.topic) \
          .option("startingOffsets", self.startingOffsets) \
          .option("max.poll.records", self.max_poll_records) \
          .option("auto.commit.interval.ms", self.auto_commit_interval_ms) \
          .option("session.timeout.ms", self.session_timeout_ms) \
          .option("key.deserializer", self.key_deserializer) \
          .option("value.deserializer", self.value_deserializer) \
          .load()

        self.query = stream \
        .select(col("value")) \
        .select(proto_udf("value").alias("value_udf")) \
        .select("value_udf.x", "valued_udf.y)