Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python getBatch从MQTTTextStreamSource返回的数据帧没有isStreaming=true_Python_Apache Spark_Pyspark_Spark Structured Streaming_Apache Bahir - Fatal编程技术网

Python getBatch从MQTTTextStreamSource返回的数据帧没有isStreaming=true

Python getBatch从MQTTTextStreamSource返回的数据帧没有isStreaming=true,python,apache-spark,pyspark,spark-structured-streaming,apache-bahir,Python,Apache Spark,Pyspark,Spark Structured Streaming,Apache Bahir,我尝试将MQTT与PySpark结构化流一起使用 from pyspark.sql import SparkSession from pyspark.sql.functions import explode from pyspark.sql.functions import split spark = SparkSession \ .builder \ .appName("Test") \ .master("local[4]") \ .getOrCreate()

我尝试将MQTT与PySpark结构化流一起使用

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split

spark = SparkSession \
    .builder \
    .appName("Test") \
    .master("local[4]") \
    .getOrCreate()

# Custom Structured Streaming receiver
lines = spark\
             .readStream\
             .format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")\
             .option("topic","uwb/distances")\
             .option('brokerUrl', 'tcp://127.0.0.1:1883')\
             .load()

# Split the lines into words
words = lines.select(explode(split(lines.value, ' ')).alias('word'))

# Generate running word count
wordCounts = words.groupBy('word').count()

# Start running the query that prints the running counts to the console
query = wordCounts \
    .writeStream \
    .outputMode('complete') \
    .format('console') \
    .start()

query.awaitTermination()
错误消息:

逻辑计划:
聚合[单词7],[单词7,计数(1)为计数11]
+-项目[word#7]
+-生成分解(拆分(值2),false,[word#7]
+-StreamingExecutionRelation org.apache.bahir.sql.streaming.mqtt。MQTTTextStreamSource@383ccec1,[值#2,时间戳#3]
在org.apache.spark.sql.execution.streaming.streamingExecution.org$apache$spark$sql$execution$streaming$streaming$streamingExecution$$runStream(StreamExecution.scala:295)
位于org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
原因:java.lang.AssertionError:assertion失败:getBatch从org.apache.bahir.sql.streaming.mqtt返回的数据帧。MQTTTextStreamSource@383ccec1没有isStreaming=true
我不明白我的代码中有什么错误。此外,根据Structured Streaming,Bahir MQTT实际上支持2.1.0。我还尝试了Spark 2.2.1,但也遇到了同样的问题

以下是我运行代码的方式:

spark提交\
--jars lib/spark-streaming-mqtt_2.11-2.2.1.jar\
lib/spark-sql-streaming-mqtt_2.11-2.2.1.jar\
lib/org.eclipse.paho.client.mqttv3-1.2.0.jar\
TestSpark.py

如何解决此问题?

我下载了Spark 2.2.0二进制文件,并按如下方式执行代码:

~/Downloads/spark-2.2.1-bin-hadoop2.7/bin/spark-submit \
    --jars lib/spark-streaming-mqtt_2.11-2.2.1.jar, \
    lib/spark-sql-streaming-mqtt_2.11-2.2.1.jar, \
    lib/org.eclipse.paho.client.mqttv3-1.2.0.jar \
    TestSpark.py
这就解决了问题。以前我只是更改MQTT jar文件的版本,例如spark-streaming-MQTT_2.11-2.2.1.jar,但显然这还不够