Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 皮斯帕克。仅获取最小值_Python_Apache Spark_Pyspark_Rdd - Fatal编程技术网

Python 皮斯帕克。仅获取最小值

Python 皮斯帕克。仅获取最小值,python,apache-spark,pyspark,rdd,Python,Apache Spark,Pyspark,Rdd,我只想得到最小值 将pyspark导入为ps spark=ps.sql.SparkSession.builder.master('local[4]”)\ .appName('some-name-here').getOrCreate() sc=spark.sparkContext sc.textFile('path-to.csv')\ .map(lambda x:x.replace(“,”).split(“,”))\ .filter(lambda x:不是x[0]。开始使用('player_id'

我只想得到最小值

将pyspark导入为ps
spark=ps.sql.SparkSession.builder.master('local[4]”)\
.appName('some-name-here').getOrCreate()
sc=spark.sparkContext
sc.textFile('path-to.csv')\
.map(lambda x:x.replace(“,”).split(“,”))\
.filter(lambda x:不是x[0]。开始使用('player_id'))\
.map(lambda x:(x[2]+“”+x[1],int(x[8]),如果x[8]否则0))\
.reduceByKey(λ值1,值2:value1+value2)\
.sortBy(lambda price:price[1],升序=True).collect()
这就是我得到的:

[('Cedric Ceballos',0),('Maurcie脸颊',0),('James Foster',0),('Billy Gabor',0),('Julius Keye',0),('Anthony Mason',0),('Chuck Noble',0),('Theo Ratliff',0),('Austin Carr',0),('Mark Eaton',0),('A.C.Green',0),('Darrall Imhoff',0),('John Johnson',0),('Neil Johnson',0),('Jim King',0),('Max Zaslofsky',1),('Don Barksdale',1),('Curtis Rowe',1),('Caron Butler',2),('Chris Gatling',2)]。


如您所见,有许多键的值为0,这是最小值。如何对其进行排序?

您可以将最小值收集到一个变量中,并基于该变量执行相等筛选:

rdd = sc.textFile('path-to.csv')\
    .map(lambda x: x.replace('"', '').split(','))\
    .filter(lambda x: not x[0].startswith('player_id'))\
    .map(lambda x: (x[2] + " " + x[1], int(x[8]) if x[8] else 0))\
    .reduceByKey(lambda value1, value2: value1 + value2)\
    .sortBy(lambda price: price[1], ascending=True)

minval = rdd.take(1)[0][1]
rdd2 = rdd.filter(lambda x: x[1] == minval)

您的数据已排序。请使用
take(1)
而不是
collect()
来获取第一个元素,即take()返回驱动程序后的最小值,在筛选器之前广播
minval
是否有意义?