Python 如何在matplotlib中为嵌套数据框创建计数图
我想使用Python 如何在matplotlib中为嵌套数据框创建计数图,python,pandas,matplotlib,Python,Pandas,Matplotlib,我想使用matplotlib可视化下面的pandasdataframe,如草图所示 草图只显示了一般情况下需要的内容-不需要像图中所示的那样有精确的布局 如何使用matplotlib实现此任务 import pandas as pd df = pd.DataFrame({'a': [0, 0, 0, 0, 0 , 1, 1,], 'b': [7, 7, 3, 3, 1, 2, 3, ], 'c': [102, 102, -50, -50, 30, 10, 10], }) df a b
matplotlib
可视化下面的pandas
dataframe,如草图所示
草图只显示了一般情况下需要的内容-不需要像图中所示的那样有精确的布局
如何使用matplotlib
实现此任务
import pandas as pd
df = pd.DataFrame({'a': [0, 0, 0, 0, 0 , 1, 1,], 'b': [7, 7, 3, 3, 1, 2, 3, ], 'c': [102, 102, -50, -50, 30, 10, 10], })
df
a b c
0 0 7 102
1 0 7 102
2 0 3 -50
3 0 3 -50
4 0 1 30
5 1 2 10
6 1 3 10
在开始可视化之前,我建议重新塑造数据,以明确嵌套级别,并预先计算频率。比如:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.gridspec as gridspec
temp_df = pd.concat([
df.groupby(["a"])["b"].value_counts().reset_index(name="count").rename(columns={"b":"value"}).assign(level_2="b"),
df.groupby(["a"])["c"].value_counts().reset_index(name="count").rename(columns={"c":"value"}).assign(level_2="c")
])
final_df = (temp_df
.rename(columns={"a":"level_1"})
[["level_1", "level_2", "value", "count"]]
.sort_values(["level_1", "level_2"]))
生成的数据帧如下所示:
level_1 level_2 value count
0 0 b 3 2
1 0 b 7 2
2 0 b 1 1
0 0 c -50 2
1 0 c 102 2
2 0 c 30 1
3 1 b 2 1
4 1 b 3 1
3 1 c 10 2
现在,要以这种嵌套方式打印值及其计数,可以使用根据每个嵌套级别下的值数量定义布局。为了演示这个玩具数据集,我对这些值进行了硬编码,但是您希望以编程方式处理实际数据
您有9个值,因此GridSpec
将有9列。您有两个嵌套级别,因此我们保留了两个底部行用于嵌套标签,并添加了几行以“承载”条形图
f = plt.figure(figsize=(10,4), dpi=300)
grid = gridspec.GridSpec(10, 9, figure=f)
mpl.rcParams["axes.edgecolor"] = "gainsboro"
# Use context manager to set mpl parameters for nested axs
with mpl.rc_context({"xtick.major.bottom": False, "ytick.major.left": False}):
# Level 1 axs (label, ax)
ax_level_1_0 = ("0", f.add_subplot(grid[9, 0:6]))
ax_level_1_1 = ("1", f.add_subplot(grid[9, 6:]))
level_1_axs = [ax_level_1_0, ax_level_1_1]
# Level 2 axs (label, ax)
ax_level_2_0b = ("B", f.add_subplot(grid[8, 0:3]))
ax_level_2_0c = ("C", f.add_subplot(grid[8, 3:6]))
ax_level_2_1b = ("B", f.add_subplot(grid[8, 6:8]))
ax_level_2_1c = ("C", f.add_subplot(grid[8, 8:]))
level_2_axs = [ax_level_2_0b, ax_level_2_0c, ax_level_2_1b, ax_level_2_1c]
# Actual count plot axs (level_1, level_2, ax)
ax_0b = (0, "b", f.add_subplot(grid[0:8, 0:3]))
ax_0b[2].set_ylabel("Frequency")
# Hide y-ticks
with mpl.rc_context({"ytick.major.left": False}):
ax_0c = (0, "c", f.add_subplot(grid[0:8, 3:6]))
ax_1b = (1, "b", f.add_subplot(grid[0:8, 6:8]))
ax_1c = (1, "c", f.add_subplot(grid[0:8, 8:]))
count_axs = [ax_0b, ax_0c, ax_1b, ax_1c]
# Remove white space between subplots
plt.subplots_adjust(wspace=0, hspace=0)
# Add label text to Level 1 and 2 axs
for label, ax in level_1_axs + level_2_axs:
ax.text(0.5, 0.5, label, horizontalalignment='center',
verticalalignment='center', transform=ax.transAxes)
for l1, l2, ax in count_axs:
y = final_df.query(f'(level_1 == {l1}) & (level_2 == "{l2}")')["count"]
labels = final_df.query(f'(level_1 == {l1}) & (level_2 == "{l2}")')["value"]
x = range(len(y))
ax.bar(x, y, color="steelblue")
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.tick_params(
axis="x", direction="in", bottom=False, pad=-20,
colors="white", labelsize=15)