使用pyspark显示标记_Pyspark_Markdown

使用pyspark显示标记

pyspark markdown

使用pyspark显示标记,pyspark,markdown,Pyspark,Markdown,在我的数据框中有两列具有多个唯一值（种族、状态），我希望看到出现率最高的值，并将其整齐地显示出来。基本上看起来像： LeastFreq种族（事件）MostFreq种族（事件），LeastFreq状态（事件），MostFreq状态（事件）这是我的代码，但有点不正常：TypeError:%d格式：需要数字，而不是str print ("Most and least frequent occurrences for age and income columns:") ethnic

在我的数据框中有两列具有多个唯一值（种族、状态），我希望看到出现率最高的值，并将其整齐地显示出来。基本上看起来像： LeastFreq种族（事件）MostFreq种族（事件），LeastFreq状态（事件），MostFreq状态（事件）

这是我的代码，但有点不正常：TypeError:%d格式：需要数字，而不是str

print ("Most and least frequent occurrences for age and income columns:")
ethnicDF = datingDF.groupBy("ethnicity").agg(count(lit(1)).alias("Total"))
statusDF = datingDF.groupBy("status").agg(count(lit(1)).alias("Total"))

leastFreqEthnicity    = ethnicDF.orderBy(col("Total").asc()).first()
mostFreqEthnicity     = ethnicDF.orderBy(col("Total").desc()).first()
leastFreqStatus     = statusDF.orderBy(col("Total").asc()).first()
mostFreqStatus      = statusDF.orderBy(col("Total").desc()).first()

display(Markdown("""
| %s | %s | %s | %s |
|----|----|----|----|
| %s | %s | %s | %s |
""" % ("leastFreqEthnicity", "MostFreqEthnicity", "leastFreqStatus", "mostFreqStatus", \
       " (%d occurrences)" % (leastFreqEthnicity["ethnicity"], leastFreqEthnicity["Total"]), \
       " (%d occurrences)" % (mostFreqEthnicity["ethnicity"], mostFreqEthnicity["Total"]), \
       " (%d occurrences)" % (leastFreqStatus["status"], leastFreqStatus["Total"]), \
       " (%d occurrences)" % (mostFreqStatus["status"], mostFreqStatus["Total"]))))

如果要添加模式，可能需要将“Total”值强制转换为IntegerType。