Python 用次轴线图制作分类或分组条形图_Python_Pandas_Matplotlib_Plot_Seaborn

Python 用次轴线图制作分类或分组条形图

python pandas matplotlib plot

Python 用次轴线图制作分类或分组条形图,python,pandas,matplotlib,plot,seaborn,Python,Pandas,Matplotlib,Plot,Seaborn,我需要使用条形图和线形图比较4个班次（分类/分组）之间的不同每日数据集。我到处都找过了，没有找到一个有效的解决方案，不包括生成新的枢轴等等我同时使用了matplotlib和seaborn，虽然我可以做其中一个（每个班次使用不同颜色的条/线），但一旦我合并了另一个，要么一个消失，要么发生其他异常，就像只显示一个绘图点一样。我看了所有的图表，都有解决方案来表示两种图表类型上的单个数据系列，但没有一种可以同时用于多个类别或分组数据示例： report_date wh_id shift He

我需要使用条形图和线形图比较4个班次（分类/分组）之间的不同每日数据集。我到处都找过了，没有找到一个有效的解决方案，不包括生成新的枢轴等等

我同时使用了matplotlib和seaborn，虽然我可以做其中一个（每个班次使用不同颜色的条/线），但一旦我合并了另一个，要么一个消失，要么发生其他异常，就像只显示一个绘图点一样。我看了所有的图表，都有解决方案来表示两种图表类型上的单个数据系列，但没有一种可以同时用于多个类别或分组

数据示例：

report_date wh_id   shift   Head_Count  UTL_R
3/17/19     55  A   72  25%
3/18/19     55  A   71  10%
3/19/19     55  A   76  20%
3/20/19     55  A   59  33%
3/21/19     55  A   65  10%
3/22/19     55  A   54  20%
3/23/19     55  A   66  14%
3/17/19     55  1   11  10%
3/17/19     55  2   27  13%
3/17/19     55  3   18  25%
3/18/19     55  1   23  100%
3/18/19     55  2   16  25%
3/18/19     55  3   12  50%
3/19/19     55  1   28  10%
3/19/19     55  2   23  50%
3/19/19     55  3   14  33%
3/20/19     55  1   29  25%
3/20/19     55  2   29  25%
3/20/19     55  3   10  50%
3/21/19     55  1   17  20%
3/21/19     55  2   29  14%
3/21/19     55  3   30  17%
3/22/19     55  1   12  14%
3/22/19     55  2   10  100%
3/22/19     55  3   17  14%
3/23/19     55  1   16  10%
3/23/19     55  2   11  100%
3/23/19     55  3   13  10%

这是我能找到的最接近的密码。请注意，即使使用

stacked=False

，它们仍然是堆叠的。我将设置更改为True，但没有任何更改

我所需要的只是让这些条彼此相邻，并使用代表位移的相同配色方案

图表：

这个怎么样

tm_daily_df['UTL_R'] = tm_daily_df['UTL_R'].str.replace('%', '').astype('float') / 100
pivoted = tm_daily_df.pivot_table(values=['Head_Count', 'UTL_R'], 
                                  index='report_date', 
                                  columns='shift')
pivoted

#             Head_Count             UTL_R
# shift                1   2   3   A     1     2     3     A
# report_date
# 3/17/19             11  27  18  72  0.10  0.13  0.25  0.25
# 3/18/19             23  16  12  71  1.00  0.25  0.50  0.10
# 3/19/19             28  23  14  76  0.10  0.50  0.33  0.20
# 3/20/19             29  29  10  59  0.25  0.25  0.50  0.33
# 3/21/19             17  29  30  65  0.20  0.14  0.17  0.10
# 3/22/19             12  10  17  54  0.14  1.00  0.14  0.20
# 3/23/19             16  11  13  66  0.10  1.00  0.10  0.14

fig, ax = plt.subplots()
pivoted['Head_Count'].plot.bar(ax=ax)
pivoted['UTL_R'].plot.line(ax=ax, legend=False, secondary_y=True, marker='D')
ax.legend(loc='upper left', title='shift')

这里有两种解决方案（堆叠和未堆叠）。根据您的问题，我们将：

在左y轴绘制
```
Head\u Count
```
，在右y轴绘制
```
UTL\u R
```
```
报告日期
```
将是我们的x轴
```
shift
```
将表示图形的色调

堆叠版本使用

pandas

默认打印功能，未堆叠版本使用

seaborn

编辑
根据您的请求，我添加了一个100%堆叠图。虽然它与您在注释中所要求的并不完全相同，但您所要求的图形类型在读取时可能会造成一些混乱（这些值基于堆栈的上一行或堆栈的宽度）。另一种解决方案可能是使用100%堆叠图

堆叠

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(12,6))

ax2  = ax.twinx()

dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(15,6))

ax2  = ax.twinx()

sns.barplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.Head_Count,
           hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)

sns.lineplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.UTL_R,
           hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

堆叠100%

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

# Create `Head_Count_Pct` column
for date in dfg.index.get_level_values('report_date').unique():
    for shift in dfg.loc[date, :].index.get_level_values('shift').unique():
        dfg.loc[(date, shift), 'Head_Count_Pct'] = dfg.loc[(date, shift), 'Head_Count'].sum() / dfg.loc[(date, 'A'), 'Head_Count'].sum()

fig, ax = plt.subplots(figsize=(12,6))

ax2  = ax.twinx()
pal = sns.color_palette("Set1")

dfg[dfg.index.get_level_values('shift').isin(['1','2','3'])]['Head_Count_Pct'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.5, color=pal)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None, color=pal)

ax.set_title('My Graph')
plt.show()

未堆叠的

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(12,6))

ax2  = ax.twinx()

dfg['Head_Count'].unstack().plot.bar(stacked=True, ax=ax, alpha=0.6)
dfg['UTL_R'].unstack().plot(kind='line', ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dfg = df.set_index(['report_date', 'shift']).sort_index(level=[0,1])

fig, ax = plt.subplots(figsize=(15,6))

ax2  = ax.twinx()

sns.barplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.Head_Count,
           hue=dfg.index.get_level_values('shift'), ax=ax, alpha=0.7)

sns.lineplot(x=dfg.index.get_level_values('report_date'),
            y=dfg.UTL_R,
           hue=dfg.index.get_level_values('shift'), ax=ax2, marker='o', legend=None)

ax.set_title('My Graph')
plt.show()

编辑#2

这是第二次请求的图形（堆叠，但堆栈n+1不会从堆栈n结束的位置开始）

因为我们要做很多事情，所以它稍微有点复杂： -我们需要手动将颜色分配给df中的

shift

-一旦我们分配了颜色，我们将迭代每个日期范围，1）排序或

人头计数

值递减（这样当我们绘制图形时，我们最大的袋子在后面），2）绘制数据并为每个stacj分配颜色 -然后我们可以创建第二个y轴并绘制

UTL\u R

值 -然后我们需要为图例标签指定正确的颜色

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def assignColor(shift):
    if shift == 'A':
        return 'R'
    if shift == '1':
        return 'B'
    if shift == '2':
        return 'G'
    if shift == '3':
        return 'Y'

# map a color to a shift
df['color'] = df['shift'].apply(assignColor)

fig, ax = plt.subplots(figsize=(15,6))

# plot our Head_Count values
for date in df.report_date.unique():
    d = df[df.report_date == date].sort_values(by='Head_Count', ascending=False)
    y = d.Head_Count.values
    x = date
    color = d.color
    b = plt.bar(x,y, color=color)

# Plot our UTL_R values
ax2 = ax.twinx()    

sns.lineplot(x=df.report_date, y=df.UTL_R, hue=df['shift'], marker='o', legend=None)

# Assign the color label color to our legend
leg = ax.legend(labels=df['shift'].unique(), loc=1)

legend_maping = dict()

for shift in df['shift'].unique():
    legend_maping[shift] = df[df['shift'] == shift].color.unique()[0]

i = 0
for leg_lab in leg.texts:
    leg.legendHandles[i].set_color(legend_maping[leg_lab.get_text()])
    i += 1

感谢您在SO上发布，欢迎访问该网站。为了确保我理解您的问题，您正在尝试将

head\u count

绘制为一个堆叠条形图，其中每个stach表示一个移位；

UTL\u R

绘制为一个折线图，其中每条线表示一个移位？感谢您的欢迎！我希望员工人数不加统计，因为我想比较每个班次每天的员工人数（A代表整个站点/所有班次总数）。每天都需要将4个条彼此相邻，但看看这看起来有多干净，我不介意将它们堆叠（但不是将整个值堆叠在一起，而是更像图中所示，从y=0开始，每个条彼此重叠），只要可以区分堆栈中的每个移位，如果它们是由相同的颜色分配给折线图中的位移。这张无标记的图表正是我开始时想要做的。现在，是否存在这样的情况，即钢筋可以彼此“在顶部”，但不能堆叠？我发布的图形图片示例中，条形图层叠在一起，但它们没有堆叠。那样看起来干净多了。有可能吗？@ricsilo，我不确定这种类型的图形是否最能代表该值，因为如果堆栈所代表的值是堆栈的顶部或堆栈的宽度，则读取器将不得不进行分离。对于您特定的问题，无标记的图形似乎是最合适的。但是，如果你想走堆叠路线，你可以使用100%堆叠图。我在下面做了，100%堆叠图中的值将不准确（因为a代表100%的移位）。不过，从你的数据来看，情况似乎并非如此（也就是说，

3/17/2019

A=72

，但是

1+2+3=58

。不管怎样，我添加了代码，让你看看如何创建它——如果你需要的话。那么如果它是一个叠加的100%，我有没有办法将比率乘以A的值，因此，值将是0-80（或者不管人数是多少）？还有，很好的一点，但是我提供的数据是随机的，没有提供私有数据。我知道它不会叠加，但真实的数据会……这正是我想要的！我也在寻找一个利用c=df['color']的解决方案。应用（lambda x:colors[x]，通过从shift映射一个颜色列并应用它。