PythonDataFrame—使用按列分组的数据框(至少两列)绘制条形图
我一直在努力使用matlplotlib在python中重新创建此Excel图形: 数据在数据帧中;我正在尝试自动生成此图表的过程 我试着去堆叠我的数据框、子图,但我还没有成功地创建“区域”索引,它在Excel中是如此优雅。我已经成功地在没有这个“区域”索引的情况下绘制了图形,但这并不是我真正想要做的 这是我的密码:PythonDataFrame—使用按列分组的数据框(至少两列)绘制条形图,python,dataframe,matplotlib,group-by,bar-chart,Python,Dataframe,Matplotlib,Group By,Bar Chart,我一直在努力使用matlplotlib在python中重新创建此Excel图形: 数据在数据帧中;我正在尝试自动生成此图表的过程 我试着去堆叠我的数据框、子图,但我还没有成功地创建“区域”索引,它在Excel中是如此优雅。我已经成功地在没有这个“区域”索引的情况下绘制了图形,但这并不是我真正想要做的 这是我的密码: data = pd.DataFrame( { 'Factory Zone': ["AMERICAS","APA
data = pd.DataFrame(
{
'Factory Zone':
["AMERICAS","APAC","APAC","APAC","APAC","APAC","EMEA","EMEA","EMEA","EMEA"],
'Factory Name':
["Chocolate Factory","Crayon Factory","Jobs Ur Us", "Gibberish US","Lil Grey", "Toys R Us","Food Inc.",
"Pet Shop", "Bonbon Factory","Carrefour"],
'Production Day 1':
[24,1,9,29,92,79,4,90,42,35],
'Production Day 2':
[2,43,17,5,31,89,44,49,34,84]
})
df = pd.DataFrame(data)
print(df)
# Without FactoryZone, it works:
df = df.drop(['Factory Zone'], axis=1)
image = df.plot(kind="bar")
数据如下所示:
Unnamed: 0 FactoryZone Factory Name Production Day 1 Production Day 2
0 1 AMERICAS Chocolate Factory 24 43
1 2 AMERICAS Crayon Factory 1 17
2 3 EMEA Pet Shop 9 5
3 4 EMEA Bonbon Factory 29 31
4 5 APAC Lil Grey 92 89
5 6 AMERICAS Jobs Ur Us 79 44
6 7 APAC Toys R Us 4 49
7 8 EMEA Carrefour 90 34
8 9 AMERICAS Gibberish US 42 84
9 10 APAC Food Inc. 35 62
绘制紧密图的方法是在相邻的子地块中绘制每个
工厂区域
:
# setting up the subplots
fig, axes = plt.subplots(1, len(df['Factory Zone'].unique()),
figsize=(12,4),
sharex=True, sharey=True,
gridspec_kw={'wspace':0},
subplot_kw={'frameon':False})
# use groupby to loop through the `Factory Zone`
for (k,d), ax in zip(df.groupby('Factory Zone'), axes):
# plot the data into subplot
d.plot.bar(x='Factory Name', ax=ax)
# set label to the `Factory Zone`
ax.set_xlabel(k)
# remove the extra legend in each subplot
legend = ax.legend()
handlers = ax.get_legend_handles_labels()
ax.legend().remove()
ax.grid(True, axis='y')
# reinstall the last legend
ax.legend(*handlers)
输出:
您可以通过首先为分层数据集创建一个图来创建此图,其中0级为工厂区域,1级为工厂名称: 正如广亨所提议的,您可以为每个分区创建一个子地块,并将它们粘在一起。必须使用
gridspec_kw
字典中的参数,根据工厂数量校正每个子批次的宽度,以便所有列具有相同的宽度。然后就有了无限的格式选择
在下面的示例中,我选择仅在分区之间显示分隔线,方法是为此使用小记号。此外,因为图的宽度仅限于10英寸,所以我重写了两行较长的标签
# Create figure with a subplot for each factory zone with a relative width
# proportionate to the number of factories
zones = df.index.levels[0]
nplots = zones.size
plots_width_ratios = [df.xs(zone).index.size for zone in zones]
fig, axes = plt.subplots(nrows=1, ncols=nplots, sharey=True, figsize=(10, 4),
gridspec_kw = dict(width_ratios=plots_width_ratios, wspace=0))
# Loop through array of axes to create grouped bar chart for each factory zone
alpha = 0.3 # used for grid lines, bottom spine and separation lines between zones
for zone, ax in zip(zones, axes):
# Create bar chart with grid lines and no spines except bottom one
df.xs(zone).plot.bar(ax=ax, legend=None, zorder=2)
ax.grid(axis='y', zorder=1, color='black', alpha=alpha)
for spine in ['top', 'left', 'right']:
ax.spines[spine].set_visible(False)
ax.spines['bottom'].set_alpha(alpha)
# Set and place x labels for factory zones
ax.set_xlabel(zone)
ax.xaxis.set_label_coords(x=0.5, y=-0.2)
# Format major tick labels for factory names: note that because this figure is
# only about 10 inches wide, I choose to rewrite the long names on two lines.
ticklabels = [name.replace(' ', '\n') if len(name) > 10 else name
for name in df.xs(zone).index]
ax.set_xticklabels(ticklabels, rotation=0, ha='center')
ax.tick_params(axis='both', length=0, pad=7)
# Set and format minor tick marks for separation lines between zones: note
# that except for the first subplot, only the right tick mark is drawn to avoid
# duplicate overlapping lines so that when an alpha different from 1 is chosen
# (like in this example) all the lines look the same
if ax.is_first_col():
ax.set_xticks([*ax.get_xlim()], minor=True)
else:
ax.set_xticks([ax.get_xlim()[1]], minor=True)
ax.tick_params(which='minor', length=55, width=0.8, color=[0, 0, 0, alpha])
# Add legend using the labels and handles from the last subplot
fig.legend(*ax.get_legend_handles_labels(), frameon=False, loc=(0.08, 0.77))
fig.suptitle('Production Quantity by Zone and Factory on both days', y=1.02, size=14);
参考资料:广华的答案,谢谢广华!你对如何使复制品消失有什么建议吗?例如,宠物店和邦邦工厂只在欧洲、中东和非洲设有工厂,但在这张图上,它们出现在美洲和亚太地区。您可以在中找到替代解决方案。非常感谢您的回答,这太完美了!:)为匹配数据表和Excel图表,数据框的字典应按如下方式编辑:
“工厂区域”:[“美洲”、“美洲”、“美洲”、“亚太地区”、“亚太地区”、“亚太地区”、“EMEA”、“EMEA”]
df
# Production Day 1 Production Day 2
# Factory Zone Factory Name
# AMERICAS Chocolate Factory 24 2
# Crayon Factory 1 43
# Jobs Ur Us 9 17
# Gibberish US 29 5
# APAC Lil Grey 92 31
# Toys R Us 79 89
# Food Inc. 4 44
# EMEA Pet Shop 90 49
# Bonbon Factory 42 34
# Carrefour 35 84
# Create figure with a subplot for each factory zone with a relative width
# proportionate to the number of factories
zones = df.index.levels[0]
nplots = zones.size
plots_width_ratios = [df.xs(zone).index.size for zone in zones]
fig, axes = plt.subplots(nrows=1, ncols=nplots, sharey=True, figsize=(10, 4),
gridspec_kw = dict(width_ratios=plots_width_ratios, wspace=0))
# Loop through array of axes to create grouped bar chart for each factory zone
alpha = 0.3 # used for grid lines, bottom spine and separation lines between zones
for zone, ax in zip(zones, axes):
# Create bar chart with grid lines and no spines except bottom one
df.xs(zone).plot.bar(ax=ax, legend=None, zorder=2)
ax.grid(axis='y', zorder=1, color='black', alpha=alpha)
for spine in ['top', 'left', 'right']:
ax.spines[spine].set_visible(False)
ax.spines['bottom'].set_alpha(alpha)
# Set and place x labels for factory zones
ax.set_xlabel(zone)
ax.xaxis.set_label_coords(x=0.5, y=-0.2)
# Format major tick labels for factory names: note that because this figure is
# only about 10 inches wide, I choose to rewrite the long names on two lines.
ticklabels = [name.replace(' ', '\n') if len(name) > 10 else name
for name in df.xs(zone).index]
ax.set_xticklabels(ticklabels, rotation=0, ha='center')
ax.tick_params(axis='both', length=0, pad=7)
# Set and format minor tick marks for separation lines between zones: note
# that except for the first subplot, only the right tick mark is drawn to avoid
# duplicate overlapping lines so that when an alpha different from 1 is chosen
# (like in this example) all the lines look the same
if ax.is_first_col():
ax.set_xticks([*ax.get_xlim()], minor=True)
else:
ax.set_xticks([ax.get_xlim()[1]], minor=True)
ax.tick_params(which='minor', length=55, width=0.8, color=[0, 0, 0, alpha])
# Add legend using the labels and handles from the last subplot
fig.legend(*ax.get_legend_handles_labels(), frameon=False, loc=(0.08, 0.77))
fig.suptitle('Production Quantity by Zone and Factory on both days', y=1.02, size=14);