Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark ApacheSpark Streaming上ApacheBahir构造的流式连接器的架构问题_Apache Spark_Mqtt_Spark Structured Streaming_Watson Iot_Apache Bahir - Fatal编程技术网

Apache spark ApacheSpark Streaming上ApacheBahir构造的流式连接器的架构问题

Apache spark ApacheSpark Streaming上ApacheBahir构造的流式连接器的架构问题,apache-spark,mqtt,spark-structured-streaming,watson-iot,apache-bahir,Apache Spark,Mqtt,Spark Structured Streaming,Watson Iot,Apache Bahir,我正在尝试将ApacheSpark结构化流连接到MQTT主题(本例中为IBMBlueMix上的IBMWatson IoT平台) 我正在创建结构化流,如下所示: df: org.apache.spark.sql.DataFrame = [value: string, timestamp: timestamp] val df=spark.readStream .format(“org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider”

我正在尝试将ApacheSpark结构化流连接到MQTT主题(本例中为IBMBlueMix上的IBMWatson IoT平台)

我正在创建结构化流,如下所示:

df: org.apache.spark.sql.DataFrame = [value: string, timestamp: timestamp]
val df=spark.readStream
.format(“org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider”)
.选项(“用户名”,“用户名”)
.选项(“密码”,“密码”)
.选项(“客户ID”、“a:vy0z2s:a-vy0z2s-zfzzckrnqf”)
.选项(“主题”,“iot-2/type/WashingMachine/id/Washier02/evt/voltage/fmt/json”)
.加载(“tcp://vy0z2s.messaging.internetofthings.ibmcloud.com:1883")
到目前为止还不错,在REPL中,我得到了这个df对象,如下所示:

df: org.apache.spark.sql.DataFrame = [value: string, timestamp: timestamp]
但是如果我开始使用这行从流中读取:

val query=df.writeStream
.outputMode(“追加”)
.格式(“控制台”)
.start()
我得到以下错误:

scala> 17/02/03 07:32:23 ERROR StreamExecution: Query query-1
terminated with error java.lang.ClassCastException: scala.Tuple2
cannot be cast to scala.runtime.Nothing$    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource$$anonfun$getBatch$1$$anonfun$3.apply(MQTTStreamSource.scala:156)
    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource$$anonfun$getBatch$1$$anonfun$3.apply(MQTTStreamSource.scala:156)
    at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)  at
scala.collection.concurrent.TrieMap.getOrElse(TrieMap.scala:633)    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource$$anonfun$getBatch$1.apply$mcZI$sp(MQTTStreamSource.scala:156)
    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource$$anonfun$getBatch$1.apply(MQTTStreamSource.scala:155)
    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource$$anonfun$getBatch$1.apply(MQTTStreamSource.scala:155)
    at scala.collection.immutable.Range.foreach(Range.scala:160)    at
org.apache.bahir.sql.streaming.mqtt.MQTTTextStreamSource.getBatch(MQTTStreamSource.scala:155)
    at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$5.apply(StreamExecution.scala:332)
    at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$5.apply(StreamExecution.scala:329)
    at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)  at
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)  at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)  at
org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
    at
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at
org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
    at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:329)
    at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:194)
    at
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:43)
    at
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:184)
    at
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:120)
17/02/03 07:32:24 WARN MQTTTextStreamSource: Connection to mqtt server
lost. Connection lost (32109) - java.io.EOFException    at
org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:146)
    at java.lang.Thread.run(Thread.java:745) Caused by:
java.io.EOFException    at
java.io.DataInputStream.readByte(DataInputStream.java:267)  at
org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:65)
    at
org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:107)
    ... 1 more 17/02/03 07:32:28 WARN MQTTTextStreamSource: Connection to
mqtt server lost.
我的直觉告诉我模式有问题,所以我添加了一个:

import org.apache.spark.sql.types.\uval
schema=StructType(
StructField(“计数”,长型,真)::
StructField(“流速”,长型,真)::
StructField(“fluidlevel”,StringType,true)::
StructField(“频率”,长型,真)::
StructField(“硬度”,长型,真)::
StructField(“速度”,长型,真)::
StructField(“温度”,长型,真)::
StructField(“ts”,长型,真)::
StructField(“电压”,长型,真)::Nil)
val df=spark.readStream
.schema(schema)
.format(“org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider”)
.选项(“用户名”,“用户名”)
.选项(“密码”,“密码”)
.选项(“客户ID”、“a:vy0z2s:a-vy0z2s-zfzzckrnqf”)
.选项(“主题”,“iot-2/type/WashingMachine/id/Washier02/evt/voltage/fmt/json”)
.加载(“tcp://vy0z2s.messaging.internetofthings.ibmcloud.com:1883")

但是这没有帮助,有什么想法吗?

您的问题似乎是因为您在后续连接中使用了相同的客户端ID

Closing TCP connection:   ClientID="a:vy0z2s:a-vy0z2s-xxxxxxxxxx" Protocol=mqtt4-tcp Endpoint="mqtt"   RC=288 Reason="The client ID was reused."  
每个clientID只允许一个唯一连接;不能有两个使用相同ID的并发连接


请检查客户端ID,并确保同一应用程序的多个实例使用唯一的客户端ID。应用程序可以共享相同的API密钥,但MQTT要求客户端ID始终是唯一的。

您的问题似乎是因为在后续连接中重新使用相同的客户端ID

Closing TCP connection:   ClientID="a:vy0z2s:a-vy0z2s-xxxxxxxxxx" Protocol=mqtt4-tcp Endpoint="mqtt"   RC=288 Reason="The client ID was reused."  
每个clientID只允许一个唯一连接;不能有两个使用相同ID的并发连接


请检查客户端ID,并确保同一应用程序的多个实例使用唯一的客户端ID。应用程序可以共享相同的API密钥,但MQTT要求客户端ID始终是唯一的。

在我看来,这是一个版本控制问题。您使用的是哪个版本的MQTT和Spark?Spark-2.0.0-bin-hadoop2.7,Watson IoT使用MQTT V3.1.1 IMHO如果您似乎已发布了您的用户名和密码,请确保立即撤销这些凭据,因为现在任何人都可以使用它们。我已更改了PW:在发布之前:)现在用mosquitto进行了尝试,这是可行的-所以这似乎是IBM Watson IoT broker的一个问题…知道如何调试吗?在我看来,这是一个版本控制问题。您使用的是哪个版本的MQTT和Spark?Spark-2.0.0-bin-hadoop2.7,Watson IoT使用MQTT V3.1.1 IMHO如果您似乎已发布了您的用户名和密码,请确保立即撤销这些凭据,因为现在任何人都可以使用它们。我已更改了PW:在发布之前:)现在用mosquitto进行了尝试,这是可行的——所以这似乎是IBM Watson IoT代理的一个问题……知道如何调试吗?