Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/joomla/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PySpark DataFrame地板分区不支持的操作数类型_Pyspark_Pyspark Dataframes - Fatal编程技术网

PySpark DataFrame地板分区不支持的操作数类型

PySpark DataFrame地板分区不支持的操作数类型,pyspark,pyspark-dataframes,Pyspark,Pyspark Dataframes,我的数据集如下所示: 我是按年龄分组的,平均每个年龄段的朋友数量 from pyspark.sql import SparkSession from pyspark.sql import Row import pyspark.sql.functions as F def parseInput(line): fields = line.split(',') return Row(age = int(fields[2]), numFriends = int(fields[3]))

我的数据集如下所示:

我是按年龄分组的,平均每个年龄段的朋友数量

from pyspark.sql import SparkSession
from pyspark.sql import Row
import pyspark.sql.functions as F

def parseInput(line):
    fields = line.split(',')
    return Row(age = int(fields[2]), numFriends = int(fields[3]))

spark = SparkSession.builder.appName("FriendsByAge").getOrCreate()
lines = spark.sparkContext.textFile("data/fakefriends.csv")
friends = lines.map(parseInput)
friendDataset = spark.createDataFrame(friends)
counts = friendDataset.groupBy("age").count()
total = friendDataset.groupBy("age").sum('numFriends')
res = total.join(counts, "age").withColumn("Friend By Age", (F.col("sum(numFriends)") // F.col("count"))).drop('sum(numFriends)','count')
我得到以下错误:

TypeError: unsupported operand type(s) for //: 'Column' and 'Column'
通常,我在Python3.0+中使用//,并像我在这里所期望的那样返回一个整数值,但是,在PySpark数据报中,//不起作用,只有/works起作用。有什么原因不起作用吗?我们必须使用舍入函数来获取整数值?

/
(楼层划分)在pyspark over列中不受支持。试试下面的备选方案-

counts=friendDataset.groupBy(“年龄”).count()
total=friendDataset.groupBy(“年龄”).agg(sum('numFriends')。别名('sum'))
res=total.join(counts,“age”)。使用列(“按年龄划分的朋友”,F.floor(F.col(“sum”)/F.col(“count”))。drop(“sum(numFriends)”,“count”)

不确定原因。但您可以将cast键入int或使用Floor函数

from pyspark.sql import functions as F
tst= sqlContext.createDataFrame([(1,7,9),(1,8,4),(1,5,10),(5,1,90),(7,6,18),(0,3,11)],schema=['col1','col2','col3'])
tst1 = tst.withColumn("div", (F.col('col1')/F.col('col2')).cast('int'))
tst2 = tst.withColumn("div", F.floor(F.col('col1')/F.col('col2')))