Python 使用包含大量类别的附加列在条形图中绘制

Python 使用包含大量类别的附加列在条形图中绘制,python,matplotlib,pyspark,apache-spark-sql,pyspark-sql,Python,Matplotlib,Pyspark,Apache Spark Sql,Pyspark Sql,我有一个pyspark数据框,有3列:违规位置、违规代码和罚单频率。但是,在违规代码和违规位置列中都有几个类别(即每个类别超过100个) 我想根据车票频率,获得违规位置和违规代码的前10名 Precint = spark.sql("SELECT Violation_Location, Violation_Code, Count(*) as Ticket_Frequency from table_view2 group by Violation_Location, Violation_Code o

我有一个pyspark数据框,有3列:违规位置、违规代码和罚单频率。但是,在违规代码和违规位置列中都有几个类别(即每个类别超过100个)

我想根据车票频率,获得违规位置和违规代码的前10名

Precint = spark.sql("SELECT Violation_Location, Violation_Code, Count(*) as Ticket_Frequency from table_view2 group by Violation_Location, Violation_Code order by Ticket_Frequency desc")
Precint.show()

到目前为止,我只能根据罚单频率绘制出前10个违规位置。任何形式的帮助都将不胜感激,谢谢


# plot violation based on the states the cars were registered to
precintplot = Precint.toPandas()
plt.figure(figsize=(100,200))

#remove missing rows from Violation_Location first
precintplotnomiss = precintplot.dropna(subset=['Violation_Location'])
precintplotnomiss.head(10).plot(x='Violation_Location', y='Ticket_Frequency', kind='bar')
plt.title("Violations by Precint (top 10)")
plt.xlabel('Precint')
plt.ylabel('Ticket Frequency')

plt.show()


也许我错过了,但你的问题是什么?到目前为止,我已经根据罚单频率绘制了十大违规地点。我想在违规位置列中添加违规代码作为类别。您能添加一张显示您期望的图表的图像吗?

# plot violation based on the states the cars were registered to
precintplot = Precint.toPandas()
plt.figure(figsize=(100,200))

#remove missing rows from Violation_Location first
precintplotnomiss = precintplot.dropna(subset=['Violation_Location'])
precintplotnomiss.head(10).plot(x='Violation_Location', y='Ticket_Frequency', kind='bar')
plt.title("Violations by Precint (top 10)")
plt.xlabel('Precint')
plt.ylabel('Ticket Frequency')

plt.show()