Scala apachespark接收机调度
我已经实现了一个接收器,它应该连接到WebSocket流并获取要处理的消息。以下是我迄今为止所做的实施:Scala apachespark接收机调度,scala,serialization,apache-spark,Scala,Serialization,Apache Spark,我已经实现了一个接收器,它应该连接到WebSocket流并获取要处理的消息。以下是我迄今为止所做的实施: class WebSocketReader (wsConfig: WebSocketConfig, stringMessageHandler: String => Option[String], storageLevel: StorageLevel) extends Receiver[String] (storageLevel) { // TODO: avoid using
class WebSocketReader (wsConfig: WebSocketConfig, stringMessageHandler: String => Option[String],
storageLevel: StorageLevel) extends Receiver[String] (storageLevel) {
// TODO: avoid using a var
private var wsClient: WebSocketClient = _
def sendRequest(isRequest: Boolean, msgCount: Int) = {
while (isRequest) {
wsClient.send(msgCount.toString)
Thread.sleep(1000)
}
}
// TODO: avoid using Synchronization...
private def connect(): Unit = {
Try {
wsClient = createWsClient
} match {
case Success(_) =>
wsClient.connect().map {
case result if result.isSuccess =>
sendRequest(true, 10)
case _ =>
connect()
}
case Failure(ex) =>
// TODO: how to signal a failure so that it is tried the next time....
ex.printStackTrace()
}
}
def onStart(): Unit = {
new Thread(getClass.getSimpleName) {
override def run() { connect() }
}.start()
}
override def onStop(): Unit =
if (wsClient != null) wsClient.disconnect()
private def createWsClient = {
new DefaultHookupClient(new HookupClientConfig(new URI(wsConfig.wsUrl))) {
override def receive: Receive = {
case Disconnected(_) =>
// TODO: use Logging framework, try reconnecting....
println(s"the web socket is disconnected")
case TextMessage(message) =>
stringMessageHandler(message).foreach(store)
case JsonMessage(jsValue) =>
stringMessageHandler(jsValue.toString).foreach(store)
}
}
}
}
这个接收器是如何运行的?此接收器是在工作节点上运行还是在驱动节点上运行?这种休眠线程的方法正确吗
我之所以要这样做,是因为公开WebSocket端点的服务器需要对我想要接收的消息进行计数。假设我向服务器请求100条消息,它会给我100条消息,以此类推。因此,我需要一种定期将此请求调度到服务器的方法。目前,我正在使用Thread.sleep机制。这样做明智吗?替代方案是什么