Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/gwt/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用kafka直接流的Pyspark预测_Pyspark_Spark Streaming - Fatal编程技术网

使用kafka直接流的Pyspark预测

使用kafka直接流的Pyspark预测,pyspark,spark-streaming,Pyspark,Spark Streaming,我试图将卡夫卡数据拉入流媒体,从HDFS加载一个已经构建的模型,然后使用卡夫卡消息进行预测 我尝试了几种方法,但由于类型错误,我被困在model.predict上:无法将类型转换为向量 从卡夫卡接收的数据以浮点数逗号分隔 这是我的密码: sc = SparkContext(appName="PythonStreamingKafkaForecast") ssc = StreamingContext(sc, 10) # Create stream to get kafka messages dir

我试图将卡夫卡数据拉入流媒体,从HDFS加载一个已经构建的模型,然后使用卡夫卡消息进行预测

我尝试了几种方法,但由于类型错误,我被困在model.predict上:无法将类型转换为向量

从卡夫卡接收的数据以浮点数逗号分隔

这是我的密码:

sc = SparkContext(appName="PythonStreamingKafkaForecast")
ssc = StreamingContext(sc, 10)

# Create stream to get kafka messages
directKafkaStream = KafkaUtils.createDirectStream(ssc, ["my_topic"], {"metadata.broker.list": "kafka_ip"})

features = directKafkaStream.foreachRDD(lambda rdd: rdd.map(lambda s: Vectors.dense(s[1].split(","))))

model = LinearRegressionModel.load(sc, "hdfs://hadoop_ip/model.model")

#Predict
predicted = model.predict(features)
我也试过:

lines = directKafkaStream.map(lambda x: x[1])
features = lines.map(lambda data: Vectors.dense([float(c) for c in data.split(',')]))
但这一次,特性的类型是TransformedStream,它不适用于前缀

你能告诉我我做错了什么吗


感谢您的帮助

好的,问题是即使主题为空,也要尝试从卡夫卡读取数据

这解决了我的问题:

def predict(rdd):
    count = rdd.count()
    if (count > 0):
        features = rdd.map(lambda s: Vectors.dense(s[1].split(",")))

        return features
    else:
    print("No data received")

directKafkaStream.foreachRDD(lambda rdd: predict(rdd))