Python 一段时间内的累计总和
我的数据帧具有以下结构:Python 一段时间内的累计总和,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我的数据帧具有以下结构: date_today = dt.datetime.now() size=20 df = pd.DataFrame({"usd": pd.Series(np.random.randint(1,100,size))*10, "sent": dt.datetime.now(), "temp":np.random.randint(0,15, size=size) }) df.sent
date_today = dt.datetime.now()
size=20
df = pd.DataFrame({"usd": pd.Series(np.random.randint(1,100,size))*10,
"sent": dt.datetime.now(),
"temp":np.random.randint(0,15, size=size)
})
df.sent += df.temp.map(dt.timedelta)
df.temp = np.random.randint(10,25, size=size)
df["reminder"] = df.sent + df.temp.map(dt.timedelta)
df.temp = np.random.randint(1,65, size=size)
df["completed"] = df.reminder + df.temp.map(dt.timedelta)
df.loc[df['temp']%3 == 0, ['reminder']] = [""]
df.loc[df['temp']%2 == 0, ['completed']] = [""]
df=df[["usd", "sent", "reminder", "completed"]]
df_result = pd.DataFrame(columns=["date","sent_amount","reminder_amount","completed_amount"])
usd是我请求的钱(数字),其他列是datetime(当我请求时,当我发送提醒时,当我收到钱时;最后两列可以为空)。
我还创建了以下每月季度列表:
date_index = []
previous_date=""
for m in range(0,14):
month = (m%12)+1
year = m//12
current_date = dt.date(2019+year, month, 1)
if previous_date:
timedelta = current_date-previous_date
date_index.append(previous_date+1*timedelta/4)
date_index.append(previous_date+2*timedelta/4)
date_index.append(previous_date+3*timedelta/4)
date_index.append(current_date)
previous_date = current_date
我希望获得具有以下结构的数据帧:
date_today = dt.datetime.now()
size=20
df = pd.DataFrame({"usd": pd.Series(np.random.randint(1,100,size))*10,
"sent": dt.datetime.now(),
"temp":np.random.randint(0,15, size=size)
})
df.sent += df.temp.map(dt.timedelta)
df.temp = np.random.randint(10,25, size=size)
df["reminder"] = df.sent + df.temp.map(dt.timedelta)
df.temp = np.random.randint(1,65, size=size)
df["completed"] = df.reminder + df.temp.map(dt.timedelta)
df.loc[df['temp']%3 == 0, ['reminder']] = [""]
df.loc[df['temp']%2 == 0, ['completed']] = [""]
df=df[["usd", "sent", "reminder", "completed"]]
df_result = pd.DataFrame(columns=["date","sent_amount","reminder_amount","completed_amount"])
其中,df_result.date列是从上一点开始的日期索引序列,sent_amount是df.sent列
融化数据框,将日期切割为日期索引
的日期范围,然后根据变量组合(完成/提醒/发送)+date,sum
upusd
amounts,然后将其反叠回列中,并cumsum
以获得累计金额:
x = df.melt('usd', value_name='date')
x['date'] = pd.cut(x['date'], pd.to_datetime(date_index)).apply(lambda x: x.right)
x['variable'] += '_amount'
df_result = x.dropna().groupby(['variable', 'date'])['usd'].sum().unstack(0, 0).sort_index().cumsum()
print(df_result)
输出:
variable completed_amount reminder_amount sent_amount
date
2019-03-16 0 0 3180
2019-03-24 0 0 8840
2019-04-01 0 1700 10350
2019-04-08 0 3230 10350
2019-04-16 0 6200 10350
2019-04-23 320 6860 10350
2019-05-01 1170 6860 10350
2019-05-16 2300 6860 10350
2019-06-01 5130 6860 10350
2019-06-08 5710 6860 10350
您可以融化
数据框,将日期从日期索引
切割成日期范围,然后根据变量组合(完成/提醒/发送)+日期进行分组,总和
向上美元
金额,然后将其反叠回列和总和
以获得累计总和:
x = df.melt('usd', value_name='date')
x['date'] = pd.cut(x['date'], pd.to_datetime(date_index)).apply(lambda x: x.right)
x['variable'] += '_amount'
df_result = x.dropna().groupby(['variable', 'date'])['usd'].sum().unstack(0, 0).sort_index().cumsum()
print(df_result)
输出:
variable completed_amount reminder_amount sent_amount
date
2019-03-16 0 0 3180
2019-03-24 0 0 8840
2019-04-01 0 1700 10350
2019-04-08 0 3230 10350
2019-04-16 0 6200 10350
2019-04-23 320 6860 10350
2019-05-01 1170 6860 10350
2019-05-16 2300 6860 10350
2019-06-01 5130 6860 10350
2019-06-08 5710 6860 10350