Python 获得';销售视窗';对于熊猫中的每个产品类别?
因此,我的dataframe拥有多年来许多产品的销售详细信息,图表如下所示: 我正试图找出每种产品的销售窗口 到目前为止,我所尝试的: 我想到的方法是获得每年六个月间隔的最小、中位数和最大日期值,并宣布(最小到中位数)为最差销售期,中位数到最大值为该产品的最佳销售窗口。我现在使用的代码已经有六个月了,但我也希望能在一年内得到它。无论哪种方法最有效:Python 获得';销售视窗';对于熊猫中的每个产品类别?,python,pandas,analytics,Python,Pandas,Analytics,因此,我的dataframe拥有多年来许多产品的销售详细信息,图表如下所示: 我正试图找出每种产品的销售窗口 到目前为止,我所尝试的: 我想到的方法是获得每年六个月间隔的最小、中位数和最大日期值,并宣布(最小到中位数)为最差销售期,中位数到最大值为该产品的最佳销售窗口。我现在使用的代码已经有六个月了,但我也希望能在一年内得到它。无论哪种方法最有效: def dater(date): print(date) if type(date)==float: return
def dater(date):
print(date)
if type(date)==float:
return '-'
months = ['','Jan', 'Feb', 'Mar', 'Apr', 'May','Jun', 'Jul', 'Aug','Sep', 'Oct', 'Nov', 'Dec']
period = ['Start', 'Mid', 'End','End']
return months[date.month]+' '+period[date.day//10]
def grpRes(grp):
return pd.Series([grp.Date.min(), grp.Date.max(), grp.Amount.mean()],
index=['start', 'end', 'value'])
best_windows = pd.DataFrame(columns = df.select_dtypes(exclude='object').columns)
for col in df.select_dtypes(exclude='object').columns:
for year in ['2017', '2018', '2019', '2020']:
print(f'For year {year} and category {col}')
temp = df.loc[year,col][df[col]>=df[col].quantile(0.7)]
print('temp created')
if len(temp)>0:
du = temp.reset_index().rename(columns = {'order_start_date': 'Date', col:'Amount'})
res = du.groupby(du.Date.diff().dt.days.fillna(1, downcast='infer')
.gt(20).cumsum()).apply(grpRes)
res.index.name = 'chunk'
for row in res.iterrows():
print(row)
best_windows.loc[year+' Window: '+str(row[0]+1)+' start',col] = row[1].start.date().strftime('%d-%m-%Y')
然后,我根据所有年份的值定义窗口,作为窗口的开始范围和结束范围。但这似乎是一种可怕的做法。尽管如此,我还是给出了不同年份的日期范围,如下所示:
2017 Window: 1 end 2017 Window: 1 start 2017 Window: 2 end 2017 Window: 2 start 2018 Window: 1 end 2018 Window: 1 start 2018 Window: 2 end 2018 Window: 2 start 2018 Window: 3 end 2018 Window: 3 start 2019 Window: 1 end 2019 Window: 1 start 2019 Window: 2 end 2019 Window: 2 start 2019 Window: 3 end 2019 Window: 3 start 2020 Window: 1 end 2020 Window: 1 start 2020 Window: 2 end 2020 Window: 2 start 2020 Window: 3 end 2020 Window: 3 start 2020 Window: 4 end 2020 Window: 4 start
B 31-12-2019 08-11-2019 09-01-2020 01-01-2020 31-07-2020 11-02-2020
D 12-06-2017 13-05-2017 14-10-2017 16-08-2017 13-06-2018 24-05-2018 20-08-2018 11-07-2018 03-11-2018 27-09-2018 10-11-2019 22-10-2019 31-12-2019 28-12-2019 31-07-2020 01-01-2020
H 06-04-2018 23-03-2018 09-08-2018 27-06-2018 16-11-2018 02-11-2018 25-05-2019 21-04-2019 15-08-2019 12-07-2019 31-12-2019 30-10-2019 31-07-2020 01-01-2020
J 12-02-2017 15-01-2017 31-12-2017 25-10-2017 11-02-2018 01-01-2018 31-12-2018 12-10-2018 24-02-2019 01-01-2019 31-12-2019 10-10-2019 04-02-2020 01-01-2020
L 08-11-2018 03-11-2018 31-12-2018 06-12-2018 07-03-2019 01-01-2019 01-05-2019 24-04-2019 31-12-2019 02-09-2019 06-03-2020 01-01-2020 19-04-2020 10-04-2020 14-05-2020 10-05-2020 31-07-2020 26-07-2020
LO 31-12-2017 06-09-2017 03-01-2018 01-01-2018 31-12-2018 23-09-2018 10-02-2019 01-01-2019 31-12-2019 25-09-2019 11-02-2020 01-01-2020
M 11-09-2017 15-01-2017 15-10-2018 03-07-2018 02-05-2019 22-04-2019 24-11-2019 18-11-2019 13-05-2020 28-03-2020 23-07-2020 21-06-2020
P 03-05-2017 21-01-2017 19-10-2017 11-08-2017 23-04-2018 31-01-2018 10-10-2018 02-08-2018 23-04-2019 23-02-2019 06-10-2019 04-09-2019 04-04-2020 29-02-2020
S 26-07-2017 24-03-2017 01-07-2018 25-03-2018 01-05-2019 18-04-2019 10-08-2019 23-05-2019 31-07-2020 01-04-2020
SH 12-08-2017 07-05-2017 11-08-2018 05-05-2018 10-08-2019 01-05-2019 31-07-2020 29-04-2020
SK 31-12-2019 12-12-2019 01-01-2020 01-01-2020 31-07-2020 24-05-2020
SKO 26-09-2017 01-05-2017 19-09-2018 03-05-2018 25-07-2019 09-07-2019 31-07-2020 04-05-2020
SL 10-06-2017 24-05-2017 06-05-2018 06-05-2018 16-07-2018 31-05-2018 01-08-2019 12-03-2019 31-07-2020 16-02-2020
U 17-05-2019 18-04-2019 24-06-2019 10-06-2019 01-06-2020 27-03-2020 31-07-2020 25-06-2020
V 13-02-2017 15-01-2017 31-12-2017 14-09-2017 05-03-2018 01-01-2018 31-12-2018 25-09-2018 19-02-2019 01-01-2019 31-12-2019 22-10-2019 22-01-2020 01-01-2020
现在,我可以使用我编写的dater函数将其转换为月份&在精确的月份窗口中:
best\u windows=best\u windows.transpose().applymap(dater)
但这给了我全年的解决方案,而不是一个单一的销售窗口
理想情况下,我想要实现的目标是:
每年每个产品的畅销窗口和最差窗口,我可以说,嘿,在每年的这个时候,这个产品很受欢迎(例如,像产品A在3月底到6月中旬销售最好),由图中所示的%销售曲线的波峰/波谷松散地定义,理想情况下,过渡期以及对每种产品的销售窗口有更好的直觉
数据样本:
我的数据如下所示。请注意,这些是基于每个类别所代表的总销售额的%s。我说的%是指总销售额的%。假设总销售额为10美元。其中产品A的售价为5美元,B为3美元,C为2美元。那么%的值为:A=50%,B=30%,C=20%。当然,只有当我尝试添加一整年数据的产品不止一种时,这才有效,因为它可以更好地解释我的数据中的季节性,这在较小的样本中无法检测到
链接:像这样的东西怎么样:
# usng sin to generate seasonal data
period = 365 * 4
dates = pd.date_range('2016-01-01', periods=period)
np.random.seed(42)
pure = np.sin(np.linspace(6, 30, period))
noise = np.random.normal(0, 1, period)
signal = pure + 20 + noise
df = pd.DataFrame({'date': dates, 'signal': signal}).set_index('date')
df['smoothed'] = df['signal'].rolling(30).mean()
# get best/worst selling months
# rolling max/min method
threshold = 0.97
window = 320
df['best'] = df['smoothed'].where( df['smoothed'] > df['smoothed'].rolling(window).max() * threshold, other=np.nan)
df['worst'] = df['smoothed'].where( df['smoothed'] < df['smoothed'].rolling(window).min() / threshold, other=np.nan)
df.iloc[365:, 1:].plot(figsize=(14,10))
#使用sin生成季节数据
周期=365*4
日期=pd.日期范围('2016-01-01',期间=期间)
np.随机种子(42)
pure=np.sin(np.linspace(6,30,period))
噪声=np.随机.正常(0,1,周期)
信号=纯+20+噪声
df=pd.DataFrame({'date':dates,'signal':signal})。set_index('date')
df['smooted']=df['signal'].滚动(30).平均值()
#获得最佳/最差销售月份
#滚动最大/最小法
阈值=0.97
窗口=320
df['best']=df['smooted'].where(df['smooted']>df['smooted'].rolling(window.max()*threshold,other=np.nan)
df['west']=df['smooted'].where(df['smooted']
滚动最大/最小位并不完美,但如果年度最大/最小值每年都有显著变化,则滚动最大/最小位是必要的。使用这种方法,您还必须忽略第一年的数据
下一种方法通过首先分别拉动年度最大/最小值来解决这些问题:
# annual max/min method
threshold = 0.97
df['max'], df['min'] = df['smoothed'].max(), df['smoothed'].min()
df['best'] = df['smoothed'].where( df['smoothed'] > df['max'] * threshold, other=np.nan)
df['worst'] = df['smoothed'].where( df['smoothed'] < df['min'] / threshold, other=np.nan)
df.iloc[365:, 1:-2].plot(figsize=(14,10))
#年最大/最小值法
阈值=0.97
df['max'],df['min']=df['smooted'].max(),df['smooted'].min()
df['best']=df['smooted'].其中(df['smooted']>df['max']*阈值,其他=np.nan)
df['west']=df['smooted']。其中(df['smooted']
像这样的东西怎么样:
# usng sin to generate seasonal data
period = 365 * 4
dates = pd.date_range('2016-01-01', periods=period)
np.random.seed(42)
pure = np.sin(np.linspace(6, 30, period))
noise = np.random.normal(0, 1, period)
signal = pure + 20 + noise
df = pd.DataFrame({'date': dates, 'signal': signal}).set_index('date')
df['smoothed'] = df['signal'].rolling(30).mean()
# get best/worst selling months
# rolling max/min method
threshold = 0.97
window = 320
df['best'] = df['smoothed'].where( df['smoothed'] > df['smoothed'].rolling(window).max() * threshold, other=np.nan)
df['worst'] = df['smoothed'].where( df['smoothed'] < df['smoothed'].rolling(window).min() / threshold, other=np.nan)
df.iloc[365:, 1:].plot(figsize=(14,10))
#使用sin生成季节数据
周期=365*4
日期=pd.日期范围('2016-01-01',期间=期间)
np.随机种子(42)
pure=np.sin(np.linspace(6,30,period))
噪声=np.随机.正常(0,1,周期)
信号=纯+20+噪声
df=pd.DataFrame({'date':dates,'signal':signal})。set_index('date')
df['smooted']=df['signal'].滚动(30).平均值()
#获得最佳/最差销售月份
#滚动最大/最小法
阈值=0.97
窗口=320
df['best']=df['smooted'].where(df['smooted']>df['smooted'].rolling(window.max()*threshold,other=np.nan)
df['west']=df['smooted'].where(df['smooted']
滚动最大/最小位并不完美,但如果年度最大/最小值每年都有显著变化,则滚动最大/最小位是必要的。使用这种方法,您还必须忽略第一年的数据
下一种方法通过首先分别拉动年度最大/最小值来解决这些问题:
# annual max/min method
threshold = 0.97
df['max'], df['min'] = df['smoothed'].max(), df['smoothed'].min()
df['best'] = df['smoothed'].where( df['smoothed'] > df['max'] * threshold, other=np.nan)
df['worst'] = df['smoothed'].where( df['smoothed'] < df['min'] / threshold, other=np.nan)
df.iloc[365:, 1:-2].plot(figsize=(14,10))
#年最大/最小值法
阈值=0.97
df['max'],df['min']=df['smooted'].max(),df['smooted'].min()
df['best']=df['smooted'].其中(df['smooted']>df['max']*阈值,其他=np.nan)
df['west']=df['smooted']。其中(df['smooted']
我认为首先要考虑的是,您是想要一个静态模型,还是想要一种自我更新的模型 我的建议是使用静态模型作为目前为止积累的所有数据,以获得产品的畅销窗口和最畅销窗口,并将其作为下一年的建议。发布您可以再次更新您的推荐 接下来,你需要决定什么是好的,什么是坏的。可能是这样的,前20%的分数是好的,后20%的分数是坏的。我们称之为阈值T百分位 现在来看主要部分,所以你们的假设是,当一种产品的销售额百分比高(高于T)或低(低于T)时,每年都有固定的窗口。 因此,首先,我们需要得到一年中每一天的平均值(你也可以拟合回归模型,而不是进行平均,这将使事情变得平稳,使你的预测更加稳健) 然后,无论平均/预测销售曲线在何处穿过T百分位,我们都会开始区间,并在再次穿过时停止
def get_thresh_crossing_intervals(arr):
crossings = np.diff(np.sign(arr))
# You might also want to wrap arrays to cover spans around end of year
ends = np.where(crossings == -2)[0]
starts = np.where(crossings == 2)[0][:len(ends)]
return list(zip(starts, ends))
def post_process_intervals(intervals):
return [(p, q) for p, q in intervals if q-p>=7]
def get_col_intervals(df, col, top_thresh=0.2, bot_thresh=0.2):
# Get quantile based thresholds
top_qnt = df[col].quantile(1 - top_thresh)
bot_qnt = df[col].quantile(bot_thresh)
# Make threshold as zero line
top_df = df[col] - top_qnt
bot_df = df[col] - bot_qnt
# Get top crossings and intervals
top_intervals = get_thresh_crossing_intervals(top_df)
bot_intervals = get_thresh_crossing_intervals(bot_df)
# Some post processings (e.g. only keep intervals with more than a week)
top_intervals = post_process_intervals(top_intervals)
bot_intervals = post_process_intervals(bot_intervals)
return {'top_intervals': top_intervals, 'bot_intervals': bot_intervals}
product_intervals = {}
for col in ["A", "B"]:
product_intervals[col] = get_col_intervals(dfg, col)
product_intervals
此外,我们只保留超过一定长度的间隔,否则我们会将其删除或删除