Python 要在一个函数中查找年度和季度计算吗

Python 要在一个函数中查找年度和季度计算吗,python,pandas,dataframe,group-by,apply,Python,Pandas,Dataframe,Group By,Apply,这是我贴出的问题的延续。答案是惊人的,但很快就遇到了问题,因为我还希望使用相同的逻辑进行季度对季度的计算 数据帧如下,不介意长度,我真的不知道如何隐藏它(如果有人能告诉我如何隐藏长df,我将不胜感激) 因为有一些修改,让我再次解释一下这个问题。我想找出每个市场中每个产品在每个时间段的同比差异(真是太多了!)例如,对于美国市场QTR期间日期为2020-06-01的产品A,其值为100。这仅仅意味着在2020年第二季度,我们实现了100美元的收入。我希望找出2020年第二季度与quar相比的增长率因

这是我贴出的问题的延续。答案是惊人的,但很快就遇到了问题,因为我还希望使用相同的逻辑进行季度对季度的计算

数据帧如下,不介意长度,我真的不知道如何隐藏它(如果有人能告诉我如何隐藏长df,我将不胜感激)

因为有一些修改,让我再次解释一下这个问题。我想找出每个市场中每个产品在每个时间段的同比差异(真是太多了!)例如,对于美国市场QTR期间日期为2020-06-01的产品A,其值为100。这仅仅意味着在2020年第二季度,我们实现了100美元的收入。我希望找出2020年第二季度与quar相比的增长率因此,2019年第二季度的增长率仅为(100-300)/300=-66.6%时间段MAT(移动年总量)在计算同比增长时也适用同样的逻辑

现在我还想找到季度同比增长,现在请注意,此计算与时间段MAT无关,因此我下面的代码会处理此问题(不确定是否正确处理)。我修改过的函数可以工作的输出-但是以输出的可读性为代价。因为现在每行的年度参考日期与季度参考日期不匹配。因为最终我需要使用此输出进行一些分析。是否可以进行任何改进

    MARKET  PRODUCT TIMEPERIOD  DATE        VALUES
0   USA     A       QTR         2018-06-01  300
1   USA     A       QTR         2019-06-01  300
2   USA     A       QTR         2020-03-01  100
3   USA     A       QTR         2020-06-01  100
4   USA     A       MAT         2018-06-01  2000
5   USA     A       MAT         2019-06-01  2000
6   USA     A       MAT         2020-06-01  1000
7   USA     B       QTR         2018-06-01  100
8   USA     B       QTR         2019-06-01  100
9   USA     B       QTR         2020-03-01  300
10  USA     B       QTR         2020-06-01  200
11  USA     B       MAT         2018-06-01  3000
12  USA     B       MAT         2019-06-01  3000
13  USA     B       MAT         2020-06-01  5000
14  UK      C       QTR         2018-06-01  500
15  UK      C       QTR         2019-06-01  500
16  UK      C       QTR         2020-03-01  200
17  UK      C       QTR         2020-06-01  200
18  UK      C       MAT         2018-06-01  300
19  UK      C       MAT         2019-06-01  300
20  UK      C       MAT         2020-06-01  5000
21  UK      D       QTR         2018-06-01  50
22  UK      D       QTR         2019-06-01  50
23  UK      D       QTR         2020-03-01  200
24  UK      D       QTR         2020-06-01  200
25  UK      D       MAT         2018-06-01  500
26  UK      D       MAT         2019-06-01  500
27  UK      D       MAT         2020-06-01  5000
我的代码如下:

import numpy as np
import pandas as pd
from itertools import combinations

def get_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.Date, 2)])
    new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
    quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=3)]
    # Get year-combination labels
    year_comb_strings = [comb[1] for comb in new_year_comb_lists]
    quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]

    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])
    for i,years in enumerate(new_year_comb_lists): # for each year combination ...
        actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, ['Annual_Reference', 'Annual_Growth']] = [year_comb_strings[i], growth] 
    for i, quarters in enumerate(quarter_comb_lists):
        actual_value, last_value = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
        growth = (actual_value - last_value) / last_value
        subdf.loc[i, ["Quarterly_Reference",'Quarterly_Growth']] = [quarter_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
df_2['Quarterly_Reference'] = pd.to_datetime(gr_products['Quarterly_Reference'])
对于任何想要复制代码的人,如下所示:

df_list = [['USA', 'A', 'QTR', '2020-06-01', 100], ['USA', 'A', 'MAT', '2020-06-01', 1000],
           ['USA', 'B', 'QTR', '2020-06-01', 200],  ['USA', 'B', 'MAT', '2020-06-01', 5000], 
           ['USA', 'A', 'QTR', '2020-03-01', 500], ['USA', 'B', 'QTR', '2020-03-01', 300],        
           ['USA', 'A', 'QTR', '2019-06-01', 300],  ['USA', 'A', 'MAT', '2019-06-01', 2000],
           ['USA', 'B', 'QTR', '2019-06-01', 100],  ['USA', 'B', 'MAT', '2019-06-01', 3000],
           ['USA', 'A', 'QTR', '2018-06-01', 300],  ['USA', 'A', 'MAT', '2018-06-01', 2000],
           ['USA', 'B', 'QTR', '2018-06-01', 100],  ['USA', 'B', 'MAT', '2018-06-01', 3000],
           ['UK', 'C', 'QTR', '2020-06-01', 200],  ['UK', 'C', 'MAT', '2020-06-01', 5000], 
           ['UK', 'C', 'QTR', '2020-03-01', 100],  ['UK', 'D', 'QTR', '2020-03-01', 50], 
           ['UK', 'D', 'QTR', '2020-06-01', 200],    ['UK', 'D', 'MAT', '2020-06-01', 5000],
           ['UK', 'C', 'QTR', '2019-06-01', 500],  ['UK', 'C', 'MAT', '2019-06-01', 300], 
           ['UK', 'D', 'QTR', '2019-06-01', 50],    ['UK', 'D', 'MAT', '2019-06-01', 500],
           ['UK', 'C', 'QTR', '2018-06-01', 500],  ['UK', 'C', 'MAT', '2018-06-01', 300], 
           ['UK', 'D', 'QTR', '2018-06-01', 50],    ['UK', 'D', 'MAT', '2018-06-01', 500]]

column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'Date', 'Values']
df_2 = pd.DataFrame(df_list, columns = column_names)
df_2['Date']= pd.to_datetime(df_2['Date'])
df_2 = df_2.sort_values(by=['PRODUCT', 'TIMEPERIOD', 'Date']).reset_index(drop=True)

我发现自己找到了一种方法,可以在如下相同的日期级别上进行输出,虽然不是最优雅的,但它目前仍然有效

def get_annual_growth(grp):
    # The all possible combination from the years in dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(subset_group.Date, 2)])
    # Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
    new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists 
                           if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
    new_year_comb_lists=sorted(new_year_comb_lists,key=lambda x: x[1]) 
    quarter_comb_lists = [comb_dates for comb_dates in year_comb_lists 
                          if comb_dates[0]==comb_dates[1]-relativedelta(months=3)
                          and comb_dates[1].year != 2018]
    quarter_comb_lists=sorted(quarter_comb_lists, key=lambda x: x[1])
    # Get combination labels
    year_comb_strings = [comb[1] for comb in new_year_comb_lists]
    quarter_comb_strings = [comb[1] for comb in quarter_comb_lists]
    
    # Creat sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual_Reference', 'Annual_Growth', "Quarterly_Reference",'Quarterly_Growth'])

    for i , (years, quarters) in enumerate(zip(new_year_comb_lists, quarter_comb_lists)): # for each year combination ...
        try:
            curr_year_val, prev_year_val = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
            curr_qtr_val, prev_qtr_val = grp[grp['Date']==quarters[1]].Values.mean(), grp[grp['Date']==quarters[0]].Values.mean()
            year_gr = (curr_year_val - prev_year_val) / prev_year_val # calculate the annual growth
            qtr_gr = (curr_qtr_val - prev_qtr_val) / prev_qtr_val
            subdf.loc[i, ['Annual_Reference', 'Annual_Growth',
                          'Quarterly_Reference','Quarterly_Growth']] = [year_comb_strings[i], year_gr, quarter_comb_strings[i], qtr_gr] 
            
        except ZeroDivisionError:
            year_gr = 0
            qtr_gr = 0

    return subdf
您可以尝试以下方法:

df_2['month'] = df_2['Date'].dt.month
df_2['change'] = df_2.groupby(['MARKET','PRODUCT','TIMEPERIOD','month']).Values.pct_change()
如果你没有任何年份的数据丢失,它应该可以工作


编辑:以上内容适用于同比,对于季度对季度,请勿按月分组。同样,如果您没有任何季度丢失数据,则应该可以使用此功能。

回答得很好,尽管需要进行一些清理,但这是最容易操作的!