Python 要在一个函数中查找年度和季度计算吗
这是我贴出的问题的延续。答案是惊人的,但很快就遇到了问题,因为我还希望使用相同的逻辑进行季度对季度的计算 数据帧如下,不介意长度,我真的不知道如何隐藏它(如果有人能告诉我如何隐藏长df,我将不胜感激) 因为有一些修改,让我再次解释一下这个问题。我想找出每个市场中每个产品在每个时间段的同比差异(真是太多了!)例如,对于美国市场QTR期间日期为2020-06-01的产品A,其值为100。这仅仅意味着在2020年第二季度,我们实现了100美元的收入。我希望找出2020年第二季度与quar相比的增长率因此,2019年第二季度的增长率仅为(100-300)/300=-66.6%。时间段MAT(移动年总量)在计算同比增长时也适用同样的逻辑 现在我还想找到季度同比增长,现在请注意,此计算与时间段MAT无关,因此我下面的代码会处理此问题(不确定是否正确处理)。我修改过的函数可以工作的输出-但是以输出的可读性为代价。因为现在每行的年度参考日期与季度参考日期不匹配。因为最终我需要使用此输出进行一些分析。是否可以进行任何改进Python 要在一个函数中查找年度和季度计算吗,python,pandas,dataframe,group-by,apply,Python,Pandas,Dataframe,Group By,Apply,这是我贴出的问题的延续。答案是惊人的,但很快就遇到了问题,因为我还希望使用相同的逻辑进行季度对季度的计算 数据帧如下,不介意长度,我真的不知道如何隐藏它(如果有人能告诉我如何隐藏长df,我将不胜感激) 因为有一些修改,让我再次解释一下这个问题。我想找出每个市场中每个产品在每个时间段的同比差异(真是太多了!)例如,对于美国市场QTR期间日期为2020-06-01的产品A,其值为100。这仅仅意味着在2020年第二季度,我们实现了100美元的收入。我希望找出2020年第二季度与quar相比的增长率因
MARKET PRODUCT TIMEPERIOD DATE VALUES
0 USA A QTR 2018-06-01 300
1 USA A QTR 2019-06-01 300
2 USA A QTR 2020-03-01 100
3 USA A QTR 2020-06-01 100
4 USA A MAT 2018-06-01 2000
5 USA A MAT 2019-06-01 2000
6 USA A MAT 2020-06-01 1000
7 USA B QTR 2018-06-01 100
8 USA B QTR 2019-06-01 100
9 USA B QTR 2020-03-01 300
10 USA B QTR 2020-06-01 200
11 USA B MAT 2018-06-01 3000
12 USA B MAT 2019-06-01 3000
13 USA B MAT 2020-06-01 5000
14 UK C QTR 2018-06-01 500
15 UK C QTR 2019-06-01 500
16 UK C QTR 2020-03-01 200
17 UK C QTR 2020-06-01 200
18 UK C MAT 2018-06-01 300
19 UK C MAT 2019-06-01 300
20 UK C MAT 2020-06-01 5000
21 UK D QTR 2018-06-01 50
22 UK D QTR 2019-06-01 50
23 UK D QTR 2020-03-01 200
24 UK D QTR 2020-06-01 200
25 UK D MAT 2018-06-01 500
26 UK D MAT 2019-06-01 500
27 UK D MAT 2020-06-01 5000
我的代码如下:
import numpy as np
import pandas as pd
from itertools import combinations
def get_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.Date, 2)])
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=3)]
# Get year-combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i,years in enumerate(new_year_comb_lists): # for each year combination ...
actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, ['Annual_Reference', 'Annual_Growth']] = [year_comb_strings[i], growth]
for i, quarters in enumerate(quarter_comb_lists):
actual_value, last_value = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
growth = (actual_value - last_value) / last_value
subdf.loc[i, ["Quarterly_Reference",'Quarterly_Growth']] = [quarter_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
df_2['Quarterly_Reference'] = pd.to_datetime(gr_products['Quarterly_Reference'])
对于任何想要复制代码的人,如下所示:
df_list = [['USA', 'A', 'QTR', '2020-06-01', 100], ['USA', 'A', 'MAT', '2020-06-01', 1000],
['USA', 'B', 'QTR', '2020-06-01', 200], ['USA', 'B', 'MAT', '2020-06-01', 5000],
['USA', 'A', 'QTR', '2020-03-01', 500], ['USA', 'B', 'QTR', '2020-03-01', 300],
['USA', 'A', 'QTR', '2019-06-01', 300], ['USA', 'A', 'MAT', '2019-06-01', 2000],
['USA', 'B', 'QTR', '2019-06-01', 100], ['USA', 'B', 'MAT', '2019-06-01', 3000],
['USA', 'A', 'QTR', '2018-06-01', 300], ['USA', 'A', 'MAT', '2018-06-01', 2000],
['USA', 'B', 'QTR', '2018-06-01', 100], ['USA', 'B', 'MAT', '2018-06-01', 3000],
['UK', 'C', 'QTR', '2020-06-01', 200], ['UK', 'C', 'MAT', '2020-06-01', 5000],
['UK', 'C', 'QTR', '2020-03-01', 100], ['UK', 'D', 'QTR', '2020-03-01', 50],
['UK', 'D', 'QTR', '2020-06-01', 200], ['UK', 'D', 'MAT', '2020-06-01', 5000],
['UK', 'C', 'QTR', '2019-06-01', 500], ['UK', 'C', 'MAT', '2019-06-01', 300],
['UK', 'D', 'QTR', '2019-06-01', 50], ['UK', 'D', 'MAT', '2019-06-01', 500],
['UK', 'C', 'QTR', '2018-06-01', 500], ['UK', 'C', 'MAT', '2018-06-01', 300],
['UK', 'D', 'QTR', '2018-06-01', 50], ['UK', 'D', 'MAT', '2018-06-01', 500]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'Date', 'Values']
df_2 = pd.DataFrame(df_list, columns = column_names)
df_2['Date']= pd.to_datetime(df_2['Date'])
df_2 = df_2.sort_values(by=['PRODUCT', 'TIMEPERIOD', 'Date']).reset_index(drop=True)
我发现自己找到了一种方法,可以在如下相同的日期级别上进行输出,虽然不是最优雅的,但它目前仍然有效
def get_annual_growth(grp):
# The all possible combination from the years in dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(subset_group.Date, 2)])
# Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists
if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
new_year_comb_lists=sorted(new_year_comb_lists,key=lambda x: x[1])
quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists
if comb_dates[0]==comb_dates[1]-relativedelta(months=3)
and comb_dates[1].year != 2018]
quarter_comb_lists=sorted(quarter_comb_lists, key=lambda x: x[1])
# Get combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
# Creat sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
for i , (years, quarters) in enumerate(zip(new_year_comb_lists, quarter_comb_lists)): # for each year combination ...
try:
curr_year_val, prev_year_val = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
curr_qtr_val, prev_qtr_val = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
year_gr = (curr_year_val - prev_year_val) / prev_year_val # calculate the annual growth
qtr_gr = (curr_qtr_val - prev_qtr_val) / prev_qtr_val
subdf.loc[i, ['Annual_Reference', 'Annual_Growth',
'Quarterly_Reference','Quarterly_Growth']] = [year_comb_strings[i], year_gr, quarter_comb_strings[i], qtr_gr]
except ZeroDivisionError:
year_gr = 0
qtr_gr = 0
return subdf
您可以尝试以下方法:
df_2['month'] = df_2['Date'].dt.month
df_2['change'] = df_2.groupby(['MARKET','PRODUCT','TIMEPERIOD','month']).Values.pct_change()
如果你没有任何年份的数据丢失,它应该可以工作
编辑:以上内容适用于同比,对于季度对季度,请勿按月分组。同样,如果您没有任何季度丢失数据,则应该可以使用此功能。回答得很好,尽管需要进行一些清理,但这是最容易操作的!