使用python按类别汇总dataframe中过去12个月的数据_Python_Pandas_Date_Datetime_Group By

使用python按类别汇总dataframe中过去12个月的数据

python pandas date datetime

使用python按类别汇总dataframe中过去12个月的数据,python,pandas,date,datetime,group-by,Python,Pandas,Date,Datetime,Group By,我正在尝试按类别创建过去12个月（不包括本月）的数据摘要。我已经用下面的代码总结了前3个月，但这样做12个月似乎很麻烦。我想知道在过去的12个月里，是否有一种更有效的动态切片数据的方法。df1是我使用SQL查询从DB连接加载的完整数据集。我使用.drop（）切掉不需要的数据列，只留下计数 import pandas as pd import datetime df1.Start_Date = pd.DatetimeIndex(df1.Start_Date) today = datetime.

我正在尝试按类别创建过去12个月（不包括本月）的数据摘要。我已经用下面的代码总结了前3个月，但这样做12个月似乎很麻烦。我想知道在过去的12个月里，是否有一种更有效的动态切片数据的方法。df1是我使用SQL查询从DB连接加载的完整数据集。我使用.drop（）切掉不需要的数据列，只留下计数

import pandas as pd
import datetime

df1.Start_Date = pd.DatetimeIndex(df1.Start_Date)

today = datetime.date.today()
currentfirst = today.replace(day=1)
thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)

fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')

def monthly_vol(df, label, start_date, end_date):
    """Slices df1 into previous months and sums the volume of each change class."""
    if start_date is not None:
        df = df1[df1.Start_Date >= start_date]
    if end_date is not None:
        df = df[df.Start_Date < end_date]
    df_count = df.groupby('Change Class').count().drop(['Start_Date', 'Risk Level', 'Change Coordinator', 'Change Coordinator Group'], axis=1)
    return df_count

fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)

def month_merge(df1, df2, df3):
    """Merges monthly dataframes together."""
    new_df = pd.merge(df1, df2, left_index=True, right_index=True).merge(df3, left_index=True, right_index=True)
    new_df.columns = [fst_label, snd_label, thd_label]
    print(new_df)
    return new_df

monthly_vol = month_merge(fst_month, snd_month, thd_month)

奖金问题：在同一数据帧中获得每个类别的总卷的平均值会更好。有点像这样：

              May  MayAVG  April  AprilAVG   March  MarchAVG
Change Class                   
Emergency      36   7.33   36     8.65       32     6.84
Expedited      17   3.46   24     5.77       35     7.48
Normal        182   37.07  146    35.10      134    28.63
Standard      256   52.14  10     50.48      267    57.05

任何帮助都将不胜感激

你为什么不试试用字典呢？字典是数据的键值对。例如：{“3”：“三月”，“4”：“四月”}。所以，无论你在哪里维护一对，你都可以使用字典。在循环中填充这些字典。见下文

month_dict = {"3": "March", "2": "April", "1": "May"} 

thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)


label_dict = {}

fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')

vol_month = {}

fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)

你的问题是什么？也提供一个而不是整个程序。您可以提供输入数据、预期输出和提供当前输出的最小代码。

datetime

具有一些属性，可以方便地计算月份。也就是说，

df.date\u col.dt.month

将给出月份。你可以按此分组，轻松计算平均值、总数和大小。等等。我的问题是‘除了为每个月创建

firstMonth=currentfirst-pd.offsets.MonthBegin（1）

变量并将其传递给函数之外，编译类似上述输出的最佳方法是什么？’。我运行脚本时，firstMonth变量每月都会更改，即现在的第一个月是五月，但下个月将是六月，这并没有改变逻辑。每次运行脚本ie时，您都会填充每月字典。月刊不是硬编码的。我只给出了一个示例来说明它的样子。字典如何与

pd.offset.MonthBegin（）交互？它们生成的日期时间比currentfirst变量早1、2或3个月。
month_dict = {"3": "March", "2": "April", "1": "May"} 

thirdMonth = currentfirst - pd.offsets.MonthBegin(3)
secondMonth = currentfirst - pd.offsets.MonthBegin(2)
firstMonth = currentfirst - pd.offsets.MonthBegin(1)


label_dict = {}

fst_label = firstMonth.strftime('%B')
snd_label = secondMonth.strftime('%B')
thd_label = thirdMonth.strftime('%B')

vol_month = {}

fst_month = monthly_vol(df1, fst_label, firstMonth, currentfirst)
snd_month = monthly_vol(df1, snd_label, secondMonth, firstMonth)
thd_month = monthly_vol(df1, thd_label, thirdMonth, secondMonth)