Python 火花流减速器EBYWINDOW，需要平均值，中位数，最大值，标准值，IQR_Python_Apache Spark_Mapreduce_Spark Streaming

Python 火花流减速器EBYWINDOW，需要平均值，中位数，最大值，标准值，IQR

python apache-spark mapreduce

Python 火花流减速器EBYWINDOW，需要平均值，中位数，最大值，标准值，IQR,python,apache-spark,mapreduce,spark-streaming,Python,Apache Spark,Mapreduce,Spark Streaming,我正在使用TCP套接字向spark streaming（Python）发送数据使用windowLength=4秒和slideInterval=2秒的窗口流媒体我的RDD在一个窗口部分中显示如下： [1,2,3,4] [2,2,2,2] [5,6,7,8] [1,2,1,1] [8,7,6,5] 如何找到“对应”值的平均值、中值、最大值、标准值和IQR。平均值=[（1+2+5+1+8）/5，（2,2,6,2,7）/5，（3+2+7+1+6）/5，（4+

我正在使用TCP套接字向spark streaming（Python）发送数据

使用windowLength=4秒和slideInterval=2秒的窗口流媒体
我的RDD在一个窗口部分中显示如下：

[1,2,3,4]    
[2,2,2,2]    
[5,6,7,8]    
[1,2,1,1]    
[8,7,6,5]

如何找到“对应”值的平均值、中值、最大值、标准值和IQR。
平均值=[（1+2+5+1+8）/5，（2,2,6,2,7）/5，（3+2+7+1+6）/5，（4+2+8+1+5）/5]

到目前为止，我的代码是：

def importData():
    sc = SparkContext(appName="test1")
    ssc = StreamingContext(sc, 2)
    RowsData = ssc.socketTextStream("localhost", 9999)
    RowsData = RowsData.map(lambda x: x.split(","))
    RowsDataLIST = RowsData.map(lambda mylist: [int(strTono) for 
    strTono in mylist])
    print("Print the Rows Data List with windows 4,1")


    RowsDataLIST = RowsDataLIST.window(4,2)
    TheMean = RowsDataLIST.reduce(lambda x, y: list(map(np.mean,zip(x,y))))

    TheMean.pprint()

    ssc.start()
    ssc.awaitTermination()


def main():
    importData()
if __name__ == "__main__":  
    main()

平均值的输出是[1.0,0.75,4.0,2.25]，这显然是错误的。我知道对于.reduce（λx，y:…），它一次取两行，取平均值。然而，如果我需要窗口内RDD中所有对应元素的含义，那么应该采用什么方法

一种方法是取和除以计数。但我想知道有一种不同的方式
另外，如何计算列表中相应元素的不同统计信息

我是新的火花流，请指导

你能弄明白吗？你能弄明白吗？