Python 2.7 如何从socketTextStream获取字符串格式的记录
我试图从套接字流中获取每条记录。我希望记录是行中的字符串数据类型。如何用python编写代码?谢谢Python 2.7 如何从socketTextStream获取字符串格式的记录,python-2.7,apache-spark,spark-streaming,Python 2.7,Apache Spark,Spark Streaming,我试图从套接字流中获取每条记录。我希望记录是行中的字符串数据类型。如何用python编写代码?谢谢 model = pipeline.PipelineModel.read().load(model_path) sc = spark.sparkContext ssc = StreamingContext(sc, 1) lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2])) if (lines is not None):
model = pipeline.PipelineModel.read().load(model_path)
sc = spark.sparkContext
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
if (lines is not None):
lines.foreachRDD(lambda rdd: rdd.foreach(processRecord))
def processRecord(record):
print("test")
...
谢谢。记录不是字符串类型我在那里添加了更多代码。请检查我的代码有什么问题。谢谢函数processRecord在赋值之前调用。
from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
if __name__ == "__main__":
sc = SparkContext(appName="Demo")
ssc = StreamingContext(sc, 1)
#record = ssc.socketTextStream("localhost", 9999)
record = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
# print out each single word
record.flatMap(lambda line: line.split(" ")).pprint()
# start streaming
ssc.start()
# stop when the socket we are listening is dead
ssc.awaitTermination()