Python 3.x 动态直方图子图,带线标记目标

Python 3.x 动态直方图子图,带线标记目标,python-3.x,pandas,matplotlib,subplot,Python 3.x,Pandas,Matplotlib,Subplot,我一直在试图得到一些类似的解决方案,以工作没有运气 我正在尝试获取我们制造过程中所有步骤编号的成本直方图。每个部分有不同数量的步骤,因此我想在每个部分的一个绘图/图像上有一组直方图 在我的真实数据中有很多部分,所以如果这可以循环通过许多部分并保存图形,那将是理想的 此外,我们有一个目标成本的每一步,我想叠加在直方图上。这在单独的数据框中表示。我被困在子地块的循环中,所以我还没有尝试这个 下面是我能找到的最接近于每个步骤直方图应该是什么样子的: 以下是我目前的代码: import pandas

我一直在试图得到一些类似的解决方案,以工作没有运气

我正在尝试获取我们制造过程中所有
步骤编号
成本
直方图。每个部分有不同数量的步骤,因此我想在每个部分的一个绘图/图像上有一组直方图

在我的真实数据中有很多部分,所以如果这可以循环通过许多部分并保存图形,那将是理想的

此外,我们有一个目标成本的每一步,我想叠加在直方图上。这在单独的数据框中表示。我被困在子地块的循环中,所以我还没有尝试这个

下面是我能找到的最接近于每个步骤直方图应该是什么样子的:

以下是我目前的代码:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
    df2 = df1[df1['Step No'].isin([step])]
    axs[step].hist(df2['Cost'])
plt.show()
提前谢谢你对我的帮助

这是
目标成本
我希望在柱状图上显示为垂直线:

PartNo  StepNo  TargetCost
ABC     10      12
ABC     20      20
ABC     30     13
PartNo  SerialNo    StepNo  CostType    Cost
ABC      123        10      Labor       11
ABC      123        10      Material    16
ABC      456        10      Labor       21
ABC      456        10      Material    26
ABC      789        10      Labor       21
ABC      789        10      Material    16
ABC      1011       10      Labor       11
ABC      1011       10      Material    6
ABC      1112       10      Labor       1
ABC      1112       10      Material    -4
ABC      123        20      Labor       11
ABC      123        20      Material    19
ABC      456        20      Labor       24
ABC      456        20      Material    29
ABC      789        20      Labor       24
ABC      789        20      Material    19
ABC      1011       20      Labor       14
ABC      1011       20      Material    9
ABC      1112       20      Labor       4
ABC      1112       20      Material    -1
ABC      123        30      Labor       11
ABC      123        30      Material    13
ABC      456        30      Labor       18
ABC      456        30      Material    23
ABC      789        30      Labor       18
ABC      789        30      Material    13
ABC      1011       30      Labor       8
ABC      1011       30      Material    3
ABC      1112       30      Labor       -2
ABC      1112       30      Material    -7
以下是一些样本历史数据,应放在柱状图中的箱子中:

PartNo  StepNo  TargetCost
ABC     10      12
ABC     20      20
ABC     30     13
PartNo  SerialNo    StepNo  CostType    Cost
ABC      123        10      Labor       11
ABC      123        10      Material    16
ABC      456        10      Labor       21
ABC      456        10      Material    26
ABC      789        10      Labor       21
ABC      789        10      Material    16
ABC      1011       10      Labor       11
ABC      1011       10      Material    6
ABC      1112       10      Labor       1
ABC      1112       10      Material    -4
ABC      123        20      Labor       11
ABC      123        20      Material    19
ABC      456        20      Labor       24
ABC      456        20      Material    29
ABC      789        20      Labor       24
ABC      789        20      Material    19
ABC      1011       20      Labor       14
ABC      1011       20      Material    9
ABC      1112       20      Labor       4
ABC      1112       20      Material    -1
ABC      123        30      Labor       11
ABC      123        30      Material    13
ABC      456        30      Labor       18
ABC      456        30      Material    23
ABC      789        30      Labor       18
ABC      789        30      Material    13
ABC      1011       30      Labor       8
ABC      1011       30      Material    3
ABC      1112       30      Labor       -2
ABC      1112       30      Material    -7
以及第二个样本数据集:

PartNo  SerialNo    StepNo  CostType    Cost
DEF     Aplha       10  Labor   2
DEF     Zed         10  Labor   3
DEF     Kelly       10  Labor   4
DEF     Aplha       20  Labor   3
DEF     Zed         20  Labor   2
DEF     Kelly       20  Labor   5
DEF     Aplha       30  Labor   6
DEF     Zed         30  Labor   7
DEF     Kelly       30  Labor   5
DEF     Aplha       40  Labor   3
DEF     Zed         40  Labor   4
DEF     Kelly       40  Labor   2
DEF     Aplha       50  Labor   8
DEF     Zed         50  Labor   9
DEF     Kelly       50  Labor   7

您找不到一个直方图函数可以直接为您的数据集解决这个问题。您需要以适合您需要的方式聚合数据,然后用条形图表示您的发现

我发现你的目标和数据有点令人困惑,但我想我已经了解了你的想法,给出了以下假设:

  • 您要汇总每个步骤的成本否
  • 成本类型是不相关的
  • 必须计算总目标成本,因为您要在每个步骤编号中汇总所有成本
  • 绘图

    编辑

    这不是OP想要的。经过一番反复,我们找到了一个似乎有效的解决方案

    (来自问题)我正在尝试获取所有步骤的成本直方图

    (来自评论)我实际上想要一个直方图,显示每个步骤中每个序列号的成本总和

    由于直方图中y轴上必须有
    计数
    或频率,因此必须以某种有意义的方式聚合数据。下面您将看到每个步骤中每个序列号的聚合成本的bin选择数

    结果:

    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import numpy as np
    import matplotlib.pyplot as plt
    import pylab
    
    
    # Load data in two steps:
    # df1 = pd.read_clipboard(sep='\\s+')
    # Part No Serial No   Step No Cost Type   Cost
    # ABC      123        10      Labor       11
    # ABC      123        10      Material    16
    # ABC      456        10      Labor       21
    # ABC      456        10      Material    26
    # ...
    
    # df2 = pd.read_clipboard(sep='\\s+')
    # Part No Step No Target Cost
    # ABC     10      12
    # ABC     20      20
    # ABC     30     13
    
    # Cost type and SerialNo irrelevant
    df11 = df1.drop(['CostType'] , axis = 1)
    
    # Aggregate by StepNo, find total cost and count
    ##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
    df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
    
    df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
    df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
    
    # Calculate total target cost
    df3['TargetTotal'] = df3['TargetCost']*df3['Count']
    
    # pylab.rcParams['figure.figsize'] = (2, 1)
    
    def multiHist(x_data, x_label, bins):
    
        # Hisrogram setup
        fig, ax = plt.subplots()
        ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
    
        # Horizontal line
        x0 = dfs['TargetTotal'].iloc[0]
        ax.axvline(x0, color='red', linewidth=2)
    
        # Annotation
        ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
                xycoords=('data', 'axes fraction'), textcoords='offset points',
                horizontalalignment='left', verticalalignment='center',
                arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
                                connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
    
        # Labels
        ax.set_xlabel(x_label, color = 'grey')
        ax.legend(loc='upper left')
        plt.show()
    
    # Identify and plot  data for each StepNo
    for step in df3['StepNo'].unique():
        dfs = df3[df3['StepNo']==step]
    
        # Data to plot
        cost = dfs['Cost']
        labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
    
        # Plot
        multiHist(x_data = cost, x_label = labels, bins = 4)    
    

    代码:

    import pandas as pd
    import numpy as np
    
    import matplotlib.pyplot as plt
    import numpy as np
    import matplotlib.pyplot as plt
    import pylab
    
    
    # Load data in two steps:
    # df1 = pd.read_clipboard(sep='\\s+')
    # Part No Serial No   Step No Cost Type   Cost
    # ABC      123        10      Labor       11
    # ABC      123        10      Material    16
    # ABC      456        10      Labor       21
    # ABC      456        10      Material    26
    # ...
    
    # df2 = pd.read_clipboard(sep='\\s+')
    # Part No Step No Target Cost
    # ABC     10      12
    # ABC     20      20
    # ABC     30     13
    
    # Cost type and SerialNo irrelevant
    df11 = df1.drop(['CostType'] , axis = 1)
    
    # Aggregate by StepNo, find total cost and count
    ##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
    df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
    
    df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
    df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
    
    # Calculate total target cost
    df3['TargetTotal'] = df3['TargetCost']*df3['Count']
    
    # pylab.rcParams['figure.figsize'] = (2, 1)
    
    def multiHist(x_data, x_label, bins):
    
        # Hisrogram setup
        fig, ax = plt.subplots()
        ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
    
        # Horizontal line
        x0 = dfs['TargetTotal'].iloc[0]
        ax.axvline(x0, color='red', linewidth=2)
    
        # Annotation
        ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
                xycoords=('data', 'axes fraction'), textcoords='offset points',
                horizontalalignment='left', verticalalignment='center',
                arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
                                connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
    
        # Labels
        ax.set_xlabel(x_label, color = 'grey')
        ax.legend(loc='upper left')
        plt.show()
    
    # Identify and plot  data for each StepNo
    for step in df3['StepNo'].unique():
        dfs = df3[df3['StepNo']==step]
    
        # Data to plot
        cost = dfs['Cost']
        labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
    
        # Plot
        multiHist(x_data = cost, x_label = labels, bins = 4)    
    

    我已自由编辑列名,以便使用
    pd.read_剪贴板(sep='\\s+')
    更轻松地获取数据集。感谢您的帮助和编辑。实际上,我想要一个直方图,表示每个步骤中每个序列号的成本总和。应保持阶跃顺序,整个零件的阶跃顺序为一个大plt。因此,第10步、第20步、第30步将有一个共享轴的直方图子图,然后是成本模型成本的垂直水平线。很抱歉,原来的问题不够清楚。我很乐意再看一次。一张类似于你想要的图表的截图真的很有帮助!但我还是有点困惑。我添加了一个屏幕截图,显示StepNo=10的每个序列号的成本,Part=abc非常接近!x轴是料仓成本范围(即10-20、20-30等),y轴是该料仓中序列号的计数(即3、4、2等)。我需要利用真实数据中的垃圾箱来做正确的事情。有意义吗?垂直线就是成本模型的成本所在。。。忘了提那件事了。