Apache spark 使用apache-bahir的pyspark-MQTT结构化流媒体

Apache spark 使用apache-bahir的pyspark-MQTT结构化流媒体,apache-spark,pyspark,mqtt,spark-structured-streaming,apache-bahir,Apache Spark,Pyspark,Mqtt,Spark Structured Streaming,Apache Bahir,我使用的是spark 2.4,我运行的pyspark如下: ./bin/pyspark --packages org.apache.bahir:spark-sql-streaming-mqtt_2.11:2.3.2 pyspark运行成功。 (但当我运行spark-sql-streaming-mqtt_2.11:2.4.0-SNAPSHOT时,出现了一个错误) 我正在尝试使用结构化流从MQTT代理获取数据。 所以,我已经运行了这个 >>> from pyspark.sql i

我使用的是spark 2.4,我运行的pyspark如下:

./bin/pyspark --packages org.apache.bahir:spark-sql-streaming-mqtt_2.11:2.3.2
pyspark运行成功。
(但当我运行spark-sql-streaming-mqtt_2.11:2.4.0-SNAPSHOT时,出现了一个错误)

我正在尝试使用结构化流从MQTT代理获取数据。 所以,我已经运行了这个

>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.functions import explode
>>> from pyspark.sql.functions import split
>>> spark = SparkSession \
...     .builder \
...     .appName("Test") \
...     .getOrCreate()
>>> lines = spark.readStream\
...     .format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")\
...     .option("topic", "/sensor")\
...     .option("brokerUrl", "tcp://localhost:1883")\
...     .load()
错误显示为:

2019-03-22 01:24:43 WARN  MQTTUtils:51 - If `clientId` is not set, a random value is picked up.
Recovering from failure is not supported in such a case.
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/opt/spark/python/pyspark/sql/streaming.py", line 400, in load
    return self._df(self._jreader.load())
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.load.
: MqttException (0)
    at org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.checkIsOpen(MqttDefaultFilePersistence.java:130)
    at org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.getFiles(MqttDefaultFilePersistence.java:247)
    at org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.close(MqttDefaultFilePersistence.java:142)
    at org.apache.bahir.sql.streaming.mqtt.MQTTStreamSource.stop(MQTTStreamSource.scala:228)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:190)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
2019-03-22 01:24:43警告MQTTUtils:51-如果未设置'clientId',则拾取一个随机值。
在这种情况下,不支持从故障中恢复。
回溯(最近一次呼叫最后一次):
文件“”,第4行,在
文件“/opt/spark/python/pyspark/sql/streaming.py”,第400行,已加载
返回self.\u df(self.\u jreader.load())
文件“/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1257行,in_u调用__
文件“/opt/spark/python/pyspark/sql/utils.py”,第63行,deco格式
返回f(*a,**kw)
文件“/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,第328行,在get_return_值中
py4j.protocol.Py4JJavaError:调用o43.load时出错。
:MqttException(0)
位于org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.checkIsOpen(MqttDefaultFilePersistence.java:130)
位于org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.getFiles(MqttDefaultFilePersistence.java:247)
位于org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence.close(MqttDefaultFilePersistence.java:142)
位于org.apache.bahir.sql.streaming.mqtt.MQTTStreamSource.stop(MQTTStreamSource.scala:228)
位于org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:190)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处
位于sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)中
位于java.lang.reflect.Method.invoke(Method.java:498)
位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
在py4j.Gateway.invoke处(Gateway.java:282)
位于py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
在py4j.commands.CallCommand.execute(CallCommand.java:79)
在py4j.GatewayConnection.run处(GatewayConnection.java:238)
运行(Thread.java:748)
我尝试了一周的MQTT数据流。但我不认为有办法解决这个问题,它真的很绝望。我没办法解决吗?
谢谢。

尝试设置
持久性
选项

例如:

   val lines = spark.readStream.format("datasource.mqtt.MQTTStreamSourceProvider")
  .option("topic", topic)
  .option("persistence","memory")
  .option("brokerUrl",broker)
  .option("cleanSession", "true")
  .load()

尝试设置
持久性
选项

例如:

   val lines = spark.readStream.format("datasource.mqtt.MQTTStreamSourceProvider")
  .option("topic", topic)
  .option("persistence","memory")
  .option("brokerUrl",broker)
  .option("cleanSession", "true")
  .load()

“但是当我运行spark-sql-streaming-mqtt_2.11:2.4.0-SNAPSHOT时,出现了一个错误”是否相关?看起来类似于“但是当我运行spark-sql-streaming-mqtt_2.11:2.4.0-SNAPSHOT时,出现了一个错误”是否相关?如果我想将其持久化到内存和磁盘中,看起来类似于。。。我该怎么做?如果我想把它保存在内存和磁盘中。。。我该怎么做?