Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Json 火花流中的未知HostExceptionError_Json_Apache Spark_Spark Streaming_Unknown Host - Fatal编程技术网

Json 火花流中的未知HostExceptionError

Json 火花流中的未知HostExceptionError,json,apache-spark,spark-streaming,unknown-host,Json,Apache Spark,Spark Streaming,Unknown Host,我希望我的代码能够读取每分钟生成的json文本文件(这是来自Citibike的站点提要数据),我尝试使用Spark流媒体。但我一直收到未知主机异常错误 我的代码: stringurl=”http://citibikenyc.com/stations/json"; SparkConf conf=new SparkConf().setMaster(“local[2]”)。setAppName(“Streaming”); JavaSparkContext sc=新的JavaSparkContext(c

我希望我的代码能够读取每分钟生成的json文本文件(这是来自Citibike的站点提要数据),我尝试使用Spark流媒体。但我一直收到未知主机异常错误

我的代码:

stringurl=”http://citibikenyc.com/stations/json";
SparkConf conf=new SparkConf().setMaster(“local[2]”)。setAppName(“Streaming”);
JavaSparkContext sc=新的JavaSparkContext(conf);
JavaStreamingContext jssc=新的JavaStreamingContext(sc,新的持续时间(60000));
JavadStreamLines=jssc.socketTextStream(url,9999);
line.print();
jssc.start();
jssc.aittimination();
错误是:

14/11/22 15:32:54 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting        receiver with delay 2000ms: Error receiving data - java.net.UnknownHostException: http://citibikenyc.com/stations/json
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:208)
    at org.apache.spark.streaming.dstream.SocketReceiver.receive(SocketInputDStream.scala:71)
    at org.apache.spark.streaming.dstream.SocketReceiver$$anon$2.run(SocketInputDStream.scala:57)
14/11/22 15:32:54 INFO receiver.ReceiverSupervisorImpl: Stopped receiver 0
14/11/22 15:32:54错误调度器.ReceiverTracker:取消注册流0的接收器:延迟2000ms重新启动接收器:接收数据时出错-java.net.UnknownHostException:http://citibikenyc.com/stations/json
位于java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
位于java.net.socksocketimpl.connect(socksocketimpl.java:392)
位于java.net.Socket.connect(Socket.java:579)
位于java.net.Socket.connect(Socket.java:528)
位于java.net.Socket。(Socket.java:425)
位于java.net.Socket(Socket.java:208)
位于org.apache.spark.streaming.dstream.SocketReceiver.receive(SocketInputDStream.scala:71)
位于org.apache.spark.streaming.dstream.SocketReceiver$$anon$2.run(SocketInputDStream.scala:57)
14/11/22 15:32:54信息接收方。接收方监管者MPL:已停止接收方0

.socketTextStream
的用途完全不同。Spark Streaming没有任何接收器定期获取URL

您需要编写一个单独的程序来定期获取URL并将其提供给Spark Streaming。您有很多选择:

  • 编写一个shell脚本,定期将URL下载到目录中,然后使用读取该目录中的文件并将其发送到Spark Streaming。有一个集成指南:
  • 编写自己的Spark流式接收器。你可以
  • 在Spark应用程序中,启动一个定期获取URL的线程,打开一个套接字发送内容,然后连接到该套接字(例如
    .socketTextStream(127.0.0.119999)
有很多变体和一些更高级的解决方案,但我认为这些更方便。

Google“java.net.UnknownHostException”?