Apache spark 火花流不工作_Apache Spark_Spark Streaming

Apache spark 火花流不工作

apache-spark

Apache spark 火花流不工作,apache-spark,spark-streaming,Apache Spark,Spark Streaming,我有一个基本的火花流字计数，它只是不工作 import sys from pyspark import SparkConf, SparkContext from pyspark.streaming import StreamingContext sc = SparkContext(appName='streaming', master="local[*]") scc = StreamingContext(sc, batchDuration=5) lines = scc.socketTextS

我有一个基本的火花流字计数，它只是不工作

import sys
from pyspark import SparkConf, SparkContext
from pyspark.streaming import StreamingContext

sc = SparkContext(appName='streaming', master="local[*]")
scc = StreamingContext(sc, batchDuration=5)

lines = scc.socketTextStream("localhost", 9998)
words = lines.flatMap(lambda line: line.split())
counts = words.map(lambda word: (word, 1)).reduceByKey(lambda x, y: x + y)

counts.pprint()

print 'Listening'
scc.start()
scc.awaitTermination()

我在另一个终端上运行了

nc-lk9998

，并粘贴了一些文本。它会打印出典型的日志（没有例外），但最终会将作业排队等待一段奇怪的时间（45年），并继续打印此日志

15/06/19 18:53:30 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/06/19 18:53:30 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (PythonRDD[7] at RDD at PythonRDD.scala:43)
15/06/19 18:53:30 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/06/19 18:53:35 INFO JobScheduler: Added jobs for time 1434754415000 ms
15/06/19 18:53:40 INFO JobScheduler: Added jobs for time 1434754420000 ms
15/06/19 18:53:45 INFO JobScheduler: Added jobs for time 1434754425000 ms
...
...

我做错了什么

Spark Streaming需要多个执行器才能工作。尝试使用本地[4]作为主机。

谢谢。我想

local[*]

会根据本地可用的内核自动分配执行器吗？（尽管如此，它解决了我的问题，但好奇地想知道为什么

不起作用）文档中不清楚，但我认为执行

local[*]

类似于

local

，它只为接收者创建一个线程，而不为执行者创建任何线程。