Python 3.x 动态直方图子图,带线标记目标
我一直在试图得到一些类似的解决方案,以工作没有运气 我正在尝试获取我们制造过程中所有Python 3.x 动态直方图子图,带线标记目标,python-3.x,pandas,matplotlib,subplot,Python 3.x,Pandas,Matplotlib,Subplot,我一直在试图得到一些类似的解决方案,以工作没有运气 我正在尝试获取我们制造过程中所有步骤编号的成本直方图。每个部分有不同数量的步骤,因此我想在每个部分的一个绘图/图像上有一组直方图 在我的真实数据中有很多部分,所以如果这可以循环通过许多部分并保存图形,那将是理想的 此外,我们有一个目标成本的每一步,我想叠加在直方图上。这在单独的数据框中表示。我被困在子地块的循环中,所以我还没有尝试这个 下面是我能找到的最接近于每个步骤直方图应该是什么样子的: 以下是我目前的代码: import pandas
步骤编号
的成本
直方图。每个部分有不同数量的步骤,因此我想在每个部分的一个绘图/图像上有一组直方图
在我的真实数据中有很多部分,所以如果这可以循环通过许多部分并保存图形,那将是理想的
此外,我们有一个目标成本的每一步,我想叠加在直方图上。这在单独的数据框中表示。我被困在子地块的循环中,所以我还没有尝试这个
下面是我能找到的最接近于每个步骤直方图应该是什么样子的:
以下是我目前的代码:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
df2 = df1[df1['Step No'].isin([step])]
axs[step].hist(df2['Cost'])
plt.show()
提前谢谢你对我的帮助
这是目标成本
我希望在柱状图上显示为垂直线:
PartNo StepNo TargetCost
ABC 10 12
ABC 20 20
ABC 30 13
PartNo SerialNo StepNo CostType Cost
ABC 123 10 Labor 11
ABC 123 10 Material 16
ABC 456 10 Labor 21
ABC 456 10 Material 26
ABC 789 10 Labor 21
ABC 789 10 Material 16
ABC 1011 10 Labor 11
ABC 1011 10 Material 6
ABC 1112 10 Labor 1
ABC 1112 10 Material -4
ABC 123 20 Labor 11
ABC 123 20 Material 19
ABC 456 20 Labor 24
ABC 456 20 Material 29
ABC 789 20 Labor 24
ABC 789 20 Material 19
ABC 1011 20 Labor 14
ABC 1011 20 Material 9
ABC 1112 20 Labor 4
ABC 1112 20 Material -1
ABC 123 30 Labor 11
ABC 123 30 Material 13
ABC 456 30 Labor 18
ABC 456 30 Material 23
ABC 789 30 Labor 18
ABC 789 30 Material 13
ABC 1011 30 Labor 8
ABC 1011 30 Material 3
ABC 1112 30 Labor -2
ABC 1112 30 Material -7
以下是一些样本历史数据,应放在柱状图中的箱子中:
PartNo StepNo TargetCost
ABC 10 12
ABC 20 20
ABC 30 13
PartNo SerialNo StepNo CostType Cost
ABC 123 10 Labor 11
ABC 123 10 Material 16
ABC 456 10 Labor 21
ABC 456 10 Material 26
ABC 789 10 Labor 21
ABC 789 10 Material 16
ABC 1011 10 Labor 11
ABC 1011 10 Material 6
ABC 1112 10 Labor 1
ABC 1112 10 Material -4
ABC 123 20 Labor 11
ABC 123 20 Material 19
ABC 456 20 Labor 24
ABC 456 20 Material 29
ABC 789 20 Labor 24
ABC 789 20 Material 19
ABC 1011 20 Labor 14
ABC 1011 20 Material 9
ABC 1112 20 Labor 4
ABC 1112 20 Material -1
ABC 123 30 Labor 11
ABC 123 30 Material 13
ABC 456 30 Labor 18
ABC 456 30 Material 23
ABC 789 30 Labor 18
ABC 789 30 Material 13
ABC 1011 30 Labor 8
ABC 1011 30 Material 3
ABC 1112 30 Labor -2
ABC 1112 30 Material -7
以及第二个样本数据集:
PartNo SerialNo StepNo CostType Cost
DEF Aplha 10 Labor 2
DEF Zed 10 Labor 3
DEF Kelly 10 Labor 4
DEF Aplha 20 Labor 3
DEF Zed 20 Labor 2
DEF Kelly 20 Labor 5
DEF Aplha 30 Labor 6
DEF Zed 30 Labor 7
DEF Kelly 30 Labor 5
DEF Aplha 40 Labor 3
DEF Zed 40 Labor 4
DEF Kelly 40 Labor 2
DEF Aplha 50 Labor 8
DEF Zed 50 Labor 9
DEF Kelly 50 Labor 7
您找不到一个直方图函数可以直接为您的数据集解决这个问题。您需要以适合您需要的方式聚合数据,然后用条形图表示您的发现 我发现你的目标和数据有点令人困惑,但我想我已经了解了你的想法,给出了以下假设:
计数
或频率,因此必须以某种有意义的方式聚合数据。下面您将看到每个步骤中每个序列号的聚合成本的bin选择数
结果:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Load data in two steps:
# df1 = pd.read_clipboard(sep='\\s+')
# Part No Serial No Step No Cost Type Cost
# ABC 123 10 Labor 11
# ABC 123 10 Material 16
# ABC 456 10 Labor 21
# ABC 456 10 Material 26
# ...
# df2 = pd.read_clipboard(sep='\\s+')
# Part No Step No Target Cost
# ABC 10 12
# ABC 20 20
# ABC 30 13
# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)
# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']
# pylab.rcParams['figure.figsize'] = (2, 1)
def multiHist(x_data, x_label, bins):
# Hisrogram setup
fig, ax = plt.subplots()
ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
# Horizontal line
x0 = dfs['TargetTotal'].iloc[0]
ax.axvline(x0, color='red', linewidth=2)
# Annotation
ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
xycoords=('data', 'axes fraction'), textcoords='offset points',
horizontalalignment='left', verticalalignment='center',
arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
# Labels
ax.set_xlabel(x_label, color = 'grey')
ax.legend(loc='upper left')
plt.show()
# Identify and plot data for each StepNo
for step in df3['StepNo'].unique():
dfs = df3[df3['StepNo']==step]
# Data to plot
cost = dfs['Cost']
labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
# Plot
multiHist(x_data = cost, x_label = labels, bins = 4)
代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab
# Load data in two steps:
# df1 = pd.read_clipboard(sep='\\s+')
# Part No Serial No Step No Cost Type Cost
# ABC 123 10 Labor 11
# ABC 123 10 Material 16
# ABC 456 10 Labor 21
# ABC 456 10 Material 26
# ...
# df2 = pd.read_clipboard(sep='\\s+')
# Part No Step No Target Cost
# ABC 10 12
# ABC 20 20
# ABC 30 13
# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)
# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()
df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])
# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']
# pylab.rcParams['figure.figsize'] = (2, 1)
def multiHist(x_data, x_label, bins):
# Hisrogram setup
fig, ax = plt.subplots()
ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')
# Horizontal line
x0 = dfs['TargetTotal'].iloc[0]
ax.axvline(x0, color='red', linewidth=2)
# Annotation
ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
xycoords=('data', 'axes fraction'), textcoords='offset points',
horizontalalignment='left', verticalalignment='center',
arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
connectionstyle='angle,angleA=0,angleB=90,rad=10'),)
# Labels
ax.set_xlabel(x_label, color = 'grey')
ax.legend(loc='upper left')
plt.show()
# Identify and plot data for each StepNo
for step in df3['StepNo'].unique():
dfs = df3[df3['StepNo']==step]
# Data to plot
cost = dfs['Cost']
labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])
# Plot
multiHist(x_data = cost, x_label = labels, bins = 4)
我已自由编辑列名,以便使用
pd.read_剪贴板(sep='\\s+')
更轻松地获取数据集。感谢您的帮助和编辑。实际上,我想要一个直方图,表示每个步骤中每个序列号的成本总和。应保持阶跃顺序,整个零件的阶跃顺序为一个大plt。因此,第10步、第20步、第30步将有一个共享轴的直方图子图,然后是成本模型成本的垂直水平线。很抱歉,原来的问题不够清楚。我很乐意再看一次。一张类似于你想要的图表的截图真的很有帮助!但我还是有点困惑。我添加了一个屏幕截图,显示StepNo=10的每个序列号的成本,Part=abc非常接近!x轴是料仓成本范围(即10-20、20-30等),y轴是该料仓中序列号的计数(即3、4、2等)。我需要利用真实数据中的垃圾箱来做正确的事情。有意义吗?垂直线就是成本模型的成本所在。。。忘了提那件事了。