Python 如何使用pyspark从卡夫卡获取并打印一行?必须使用writeStream.start()执行具有流源的查询

Python 如何使用pyspark从卡夫卡获取并打印一行?必须使用writeStream.start()执行具有流源的查询,python,apache-spark,pyspark,spark-structured-streaming,Python,Apache Spark,Pyspark,Spark Structured Streaming,我试图从卡夫卡那里读一些数据,看看里面有什么 我写 builder = SparkSession.builder\ .appName("PythonTest01") spark = builder.getOrCreate() # Subscribe to 1 topic df = spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers

我试图从卡夫卡那里读一些数据,看看里面有什么

我写

builder = SparkSession.builder\
   .appName("PythonTest01")

spark = builder.getOrCreate()

# Subscribe to 1 topic
df = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", config["kafka"]["bootstrap.servers"]) \
  .option("subscribe", dataFlowTopic) \
  .load()

# df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

df.printSchema()

df = df.first()

query = df \
    .writeStream \
    .outputMode('complete') \
    .format('console') \
    .start()

query.awaitTermination()
不幸的是,它发誓

pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;
它想要什么以及如何满足它


如果我删除
first()
它会发誓

pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;

我要写

#df = df.first()

query = df \
    .writeStream \
    .outputMode('append') \
    .format('console') \
    .start()

query.awaitTermination()
它不是打印第一行,而是打印最后一行,并且不终止

而不是终止

这是蒸汽;它不意味着终止

打印不是第一行,而是最后一行

请参阅
startingoffset
选项。默认值为
latest