Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/matlab/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用pyspark显示标记_Pyspark_Markdown - Fatal编程技术网

使用pyspark显示标记

使用pyspark显示标记,pyspark,markdown,Pyspark,Markdown,在我的数据框中有两列具有多个唯一值(种族、状态),我希望看到出现率最高的值,并将其整齐地显示出来。基本上看起来像: LeastFreq种族(事件)MostFreq种族(事件),LeastFreq状态(事件),MostFreq状态(事件) 这是我的代码,但有点不正常:TypeError:%d格式:需要数字,而不是str print ("Most and least frequent occurrences for age and income columns:") ethnic

在我的数据框中有两列具有多个唯一值(种族、状态),我希望看到出现率最高的值,并将其整齐地显示出来。基本上看起来像: LeastFreq种族(事件)MostFreq种族(事件),LeastFreq状态(事件),MostFreq状态(事件)

这是我的代码,但有点不正常:TypeError:%d格式:需要数字,而不是str

print ("Most and least frequent occurrences for age and income columns:")
ethnicDF = datingDF.groupBy("ethnicity").agg(count(lit(1)).alias("Total"))
statusDF = datingDF.groupBy("status").agg(count(lit(1)).alias("Total"))

leastFreqEthnicity    = ethnicDF.orderBy(col("Total").asc()).first()
mostFreqEthnicity     = ethnicDF.orderBy(col("Total").desc()).first()
leastFreqStatus     = statusDF.orderBy(col("Total").asc()).first()
mostFreqStatus      = statusDF.orderBy(col("Total").desc()).first()

display(Markdown("""
| %s | %s | %s | %s |
|----|----|----|----|
| %s | %s | %s | %s |
""" % ("leastFreqEthnicity", "MostFreqEthnicity", "leastFreqStatus", "mostFreqStatus", \
       " (%d occurrences)" % (leastFreqEthnicity["ethnicity"], leastFreqEthnicity["Total"]), \
       " (%d occurrences)" % (mostFreqEthnicity["ethnicity"], mostFreqEthnicity["Total"]), \
       " (%d occurrences)" % (leastFreqStatus["status"], leastFreqStatus["Total"]), \
       " (%d occurrences)" % (mostFreqStatus["status"], mostFreqStatus["Total"]))))

如果要添加模式,可能需要将“Total”值强制转换为IntegerType。