Windows PySpark使用fromu csv将Kafka csv分隔的数据解析为列

Windows PySpark使用fromu csv将Kafka csv分隔的数据解析为列,windows,pyspark,pycharm,apache-kafka-streams,Windows,Pyspark,Pycharm,Apache Kafka Streams,我不熟悉卡夫卡的结构化流媒体。正在尝试使用schema和_csv将分隔数据从Kafka转换为PySpark中的Dataframe kafkaDataSchema = StructType([ StructField("sid", StringType()), StructField("timestamp", LongType()), StructField("sensor", StringType()), StructField

我不熟悉卡夫卡的结构化流媒体。正在尝试使用schema和_csv将分隔数据从Kafka转换为PySpark中的Dataframe

kafkaDataSchema = StructType([
  StructField("sid", StringType()), StructField("timestamp", LongType()),
  StructField("sensor", StringType()), StructField("value", StringType()),
])
kafkaStream = spark.readStream \
            .format("kafka") \
            .option("kafka.bootstrap.servers", self.config.get('kafka-config', 'bootstrap-servers')) \
            .option("subscribe", self.config.get('kafka-config', 'topic-list-input')) \
            .option("startingOffsets", self.config.get('kafka-config', 'startingOffsets')) \
            .load()\
            .selectExpr("CAST(value AS STRING)")
formattedStream = kafkaStream.select(from_csv(kafkaStream.value, kafkaDataSchema))
我得到以下错误:

Traceback (most recent call last):
  File "main.py", line 43, in <module>
    formattedStream = KafkaSource.readData(spark)
  File "src.zip/src/main/sources/KafkaSource.py", line 31, in readData
  File "src.zip/src/main/sources/KafkaSource.py", line 36, in formatKafkaData
  File "/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/functions.py", line 4082, in from_csv
TypeError: schema argument should be a column or string
回溯(最近一次呼叫最后一次):
文件“main.py”,第43行,在
formattedStream=KafkaSource.readData(spark)
readData中第31行的文件“src.zip/src/main/sources/KafkaSource.py”
文件“src.zip/src/main/sources/KafkaSource.py”,第36行,格式为Kafkadata
文件“/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/functions.py”,第4082行,from_csv
TypeError:架构参数应为列或字符串

如何解决此问题?

请尝试使用csv中的
(kafkaStream.value,kafkaDataSchema.simpleString())
感谢它的有效性。我还有一个问题,如何仅从数据帧中选择“sid”值?