Apache spark 使用纱线模式时,Spark streaming应用程序挂起

Apache spark 使用纱线模式时,Spark streaming应用程序挂起,apache-spark,pyspark,spark-streaming,Apache Spark,Pyspark,Spark Streaming,我有一个问题,火花流在纱线上 当我在本地模式下启动脚本时,它工作正常:我可以从flume接收和打印事件 from pyspark.streaming.flume import FlumeUtils from pyspark.streaming import StreamingContext from pyspark.storagelevel import StorageLevel from pyspark import SparkConf, SparkContext conf = SparkC

我有一个问题,火花流在纱线上

当我在本地模式下启动脚本时,它工作正常:我可以从flume接收和打印事件

from pyspark.streaming.flume import FlumeUtils
from pyspark.streaming import StreamingContext
from pyspark.storagelevel import StorageLevel
from pyspark import SparkConf, SparkContext

conf = SparkConf().setAppName("Test").setMaster('local[*]')
sc = SparkContext(conf=conf)

ssc = StreamingContext(sc, 1)
hostname = 'myhost.com'
port = 6668
addresses = [(hostname, port)]

flumeStream = FlumeUtils.createPollingStream(ssc, addresses, \
                                         storageLevel=StorageLevel(True, True, False, False, 2), \
                                         maxBatchSize=1000, parallelism=5)

flumeStream.pprint()

ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate
输出:

-------------------------------------------
Time: 2017-03-03 00:49:34
-------------------------------------------

-------------------------------------------
Time: 2017-03-03 00:49:35
-------------------------------------------

17/03/03 00:49:35 WARN storage.BlockManager: Block input-0-1488476966735 replicated to only 0 peer(s) instead of 1 peers
-------------------------------------------
Time: 2017-03-03 00:49:36
-------------------------------------------
({u'timestamp': u'1488476971253', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.262000]; 2; 3341678; 3279.39; 97')
({u'timestamp': u'1488476971265', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.274000]; 4; 2690399; 69.24; 27')
({u'timestamp': u'1488476971276', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.285000]; 6; 7266957; 514.57; 25')
({u'timestamp': u'1488476971286', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.296000]; 8; 9220339; 3189.55; 5')
({u'timestamp': u'1488476971298', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.307000]; 10; 2897030; 1029.84; 56')
({u'timestamp': u'1488476971308', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.317000]; 12; 4710976; 1125.88; 35')
({u'timestamp': u'1488476971340', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.349000]; 14; 4894562; 707.43; 50')
({u'timestamp': u'1488476971371', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.380000]; 16; 7370409; 1056.91; 1')
({u'timestamp': u'1488476971402', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.411000]; 18; 6669529; 2868.7; 56')
({u'timestamp': u'1488476971433', u'Severity': u'4', u'Facility': u'3'}, u'<28>[2017-03-03 00:49:31.442000]; 20; 7823207; 791.02; 15')
...

-------------------------------------------
Time: 2017-03-03 00:49:37
-------------------------------------------

然后(看起来)我的应用程序进入无限循环,可能在等待什么。屏幕上的图像像这样挂起:

-------------------------------------------
Time: 2017-03-03 00:59:34
-------------------------------------------
火花任务无火花流,在纱线和本地模式下工作良好

我的基础架构是:

  • 第一个节点:作业历史服务器和ResourceManager,在同一个位置也是flume代理

  • 第二和第三个节点-节点管理器

我正在从节点2启动spark流媒体应用程序。从日志中,我看到节点3和资源管理器之间的连接工作正常


任何想法,任何帮助都将不胜感激!谢谢

通过在集群中再添加一个节点解决了该问题

conf = SparkConf().setAppName("Test").set("spark.executor.memory", "1g").set("spark.driver.memory", "2g")
-------------------------------------------
Time: 2017-03-03 00:59:34
-------------------------------------------