';数据帧';对象在pyspark中不可调用
我希望员工姓名的工资高于pyspark中的部门平均工资';数据帧';对象在pyspark中不可调用,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,我希望员工姓名的工资高于pyspark中的部门平均工资 filt = df3.select('SALARY','Dept_name','First_name','Last_name') filt.filter(filt('SALARY').geq(filt.groupBy('Dept_name').agg(F.mean('SALARY')))).show() 正在创建示例数据帧: 创建查询以检索工资高于部门平均水平的人员: from pyspark.sql import functions a
filt = df3.select('SALARY','Dept_name','First_name','Last_name')
filt.filter(filt('SALARY').geq(filt.groupBy('Dept_name').agg(F.mean('SALARY')))).show()
正在创建示例数据帧:
创建查询以检索工资高于部门平均水平的人员:
from pyspark.sql import functions as F
from pyspark.sql.window import Window
data= [[200,'Marketing','Jane','Smith'],
[140,'Marketing','Jerry','Soreky'],
[120,'Marketing','Justin','Sauren'],
[170,'Sales','Joe','Statham'],
[190,'Sales','Jeremy','Sage'],
[220,'Sales','Jay','Sawyer']]
columns= ['SALARY','Dept_name','First_name','Last_name']
df= spark.createDataFrame(data,columns)
df.show()
+------+---------+----------+---------+
|SALARY|Dept_name|First_name|Last_name|
+------+---------+----------+---------+
| 200|Marketing| Jane| Smith|
| 140|Marketing| Jerry| Soreky|
| 120|Marketing| Justin| Sauren|
| 170| Sales| Joe| Statham|
| 190| Sales| Jeremy| Sage|
| 220| Sales| Jay| Sawyer|
+------+---------+----------+---------+
w=Window().partitionBy("Dept_name")
df.withColumn("Average_Salary", F.avg("SALARY").over(w))\
.filter(F.col("SALARY")>F.col("Average_Salary"))\
.select("SALARY","Dept_name","First_name","Last_name")\
.show()
+------+---------+----------+---------+
|SALARY|Dept_name|First_name|Last_name|
+------+---------+----------+---------+
| 220| Sales| Jay| Sawyer|
| 200|Marketing| Jane| Smith|
+------+---------+----------+---------+