Python 如何使用pyspark从卡夫卡获取并打印一行?必须使用writeStream.start()执行具有流源的查询
我试图从卡夫卡那里读一些数据,看看里面有什么 我写Python 如何使用pyspark从卡夫卡获取并打印一行?必须使用writeStream.start()执行具有流源的查询,python,apache-spark,pyspark,spark-structured-streaming,Python,Apache Spark,Pyspark,Spark Structured Streaming,我试图从卡夫卡那里读一些数据,看看里面有什么 我写 builder = SparkSession.builder\ .appName("PythonTest01") spark = builder.getOrCreate() # Subscribe to 1 topic df = spark \ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers
builder = SparkSession.builder\
.appName("PythonTest01")
spark = builder.getOrCreate()
# Subscribe to 1 topic
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", config["kafka"]["bootstrap.servers"]) \
.option("subscribe", dataFlowTopic) \
.load()
# df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
df.printSchema()
df = df.first()
query = df \
.writeStream \
.outputMode('complete') \
.format('console') \
.start()
query.awaitTermination()
不幸的是,它发誓
pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;
它想要什么以及如何满足它
如果我删除
first()
它会发誓
pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;
我要写
#df = df.first()
query = df \
.writeStream \
.outputMode('append') \
.format('console') \
.start()
query.awaitTermination()
它不是打印第一行,而是打印最后一行,并且不终止
而不是终止
这是蒸汽;它不意味着终止
打印不是第一行,而是最后一行
请参阅startingoffset
选项。默认值为latest