Python 使用自定义功能将年度数据分解为月度数据
我正试图按费用将年度订阅分为月度订阅 示例数据集-Python 使用自定义功能将年度数据分解为月度数据,python,pandas,pandas-groupby,resampling,Python,Pandas,Pandas Groupby,Resampling,我正试图按费用将年度订阅分为月度订阅 示例数据集- import numpy as np import pandas as pd df = pd.DataFrame({ 'Customer_ID': [1, 2, 3, 4, 5], 'Plan' : ['Yearly', 'Monthly', 'Monthly', 'Yearly', 'Yearly'], 'Join_Date': ['1/10/2020', '1/15/2020', '2/21/2020', '2/2
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Customer_ID': [1, 2, 3, 4, 5],
'Plan' : ['Yearly', 'Monthly', 'Monthly', 'Yearly', 'Yearly'],
'Join_Date': ['1/10/2020', '1/15/2020', '2/21/2020', '2/21/2020', '3/09/2020'],
'Fee' : [120, 12, 18, 86, 144]
})
df['Join_Date'] = pd.to_datetime(df['Join_Date'])
df
在这里,客户1在2020年1月至2021年1月期间的年订阅费为120美元。我希望我的数据框通过显示该年每个月的月费($10),将2020-01和2020-12之间的费用细分为10美元($120/12个月)
我尝试了很多重采样方法,但都不起作用。一个例子-
def atom(row):
if df.Plan=='Yearly':
return (df.Fee/12)
df.groupby(pd.Grouper(key='Join_Date', freq='1M')).apply(atom)
第一个客户的预期输出-
还有其他方法吗?你在找这样的方法吗
import pandas as pd
df = pd.DataFrame({
'Cutomer_ID': [1, 2, 3, 4, 5],
'Plan' : ['Yearly', 'Monthly', 'Monthly', 'Yearly', 'Yearly'],
'Join_Date': ['1/10/2020', '1/15/2020', '2/21/2020', '2/21/2020', '3/09/2020'],
'Fee' : [120, 12, 18, 86, 144]
})
df['Join_Date'] = pd.to_datetime(df['Join_Date'])
df['Monthly_Fee'] = df['Fee']
df.loc[df['Plan'] == 'Yearly','Monthly_Fee'] = (df.Fee/12).round(2)
print (df)
其结果将是:
Cutomer_ID Plan Join_Date Fee Monthly_Fee
0 1 Yearly 2020-01-10 120 10.00
1 2 Monthly 2020-01-15 12 12.00
2 3 Monthly 2020-02-21 18 18.00
3 4 Yearly 2020-02-21 86 7.17
4 5 Yearly 2020-03-09 144 12.00
首先通过
np.repeat()
展开年度记录。然后在df1[“计划”]=“每年”
上有选择地执行以下操作:
- 月费可以直接计算
- 可以使用
获得月增量,并映射到。这种方法接收一个groupby cumcount
,该警告可以被抑制(在代码中省略)性能警告
您的最终数据帧是否应该每个月都有一行?你能把你想要的结果添加到这个问题上吗?你能不能不简单地做
df.loc[df['Plan']='Yearly'],'new_col']=df.Fee/12
@JoeFerndz不,它不会在日期X和日期Y之间的每个月输出每月值。我试着在2020年1月到2021年1月之间的每个月在同一个数据框中显示10美元。你能发布所需的输出,以便我们知道你在寻找什么吗?
# expand the Yearly records
df1 = df.loc[np.repeat(df.index, df["Plan"].map({"Yearly": 12, "Monthly":1}))]
# compute monthly fee and join date
df1.loc[df1["Plan"] == "Yearly", "Fee"] /= 12
df1.loc[df1["Plan"] == "Yearly", "Join_Date"] += \
df1.groupby(["Customer_ID", "Plan"]).cumcount()\
.loc[df1["Plan"] == "Yearly"]\
.map(lambda i: pd.DateOffset(months=i))
print(df1)
Customer_ID Plan Join_Date Fee
0 1 Yearly 2020-01-10 10.000000
0 1 Yearly 2020-02-10 10.000000
0 1 Yearly 2020-03-10 10.000000
0 1 Yearly 2020-04-10 10.000000
0 1 Yearly 2020-05-10 10.000000
0 1 Yearly 2020-06-10 10.000000
0 1 Yearly 2020-07-10 10.000000
0 1 Yearly 2020-08-10 10.000000
0 1 Yearly 2020-09-10 10.000000
0 1 Yearly 2020-10-10 10.000000
0 1 Yearly 2020-11-10 10.000000
0 1 Yearly 2020-12-10 10.000000
1 2 Monthly 2020-01-15 12.000000
2 3 Monthly 2020-02-21 18.000000
3 4 Yearly 2020-02-21 7.166667
3 4 Yearly 2020-03-21 7.166667
3 4 Yearly 2020-04-21 7.166667
3 4 Yearly 2020-05-21 7.166667
3 4 Yearly 2020-06-21 7.166667
3 4 Yearly 2020-07-21 7.166667
3 4 Yearly 2020-08-21 7.166667
3 4 Yearly 2020-09-21 7.166667
3 4 Yearly 2020-10-21 7.166667
3 4 Yearly 2020-11-21 7.166667
3 4 Yearly 2020-12-21 7.166667
3 4 Yearly 2021-01-21 7.166667
4 5 Yearly 2020-03-09 12.000000
4 5 Yearly 2020-04-09 12.000000
4 5 Yearly 2020-05-09 12.000000
4 5 Yearly 2020-06-09 12.000000
4 5 Yearly 2020-07-09 12.000000
4 5 Yearly 2020-08-09 12.000000
4 5 Yearly 2020-09-09 12.000000
4 5 Yearly 2020-10-09 12.000000
4 5 Yearly 2020-11-09 12.000000
4 5 Yearly 2020-12-09 12.000000
4 5 Yearly 2021-01-09 12.000000
4 5 Yearly 2021-02-09 12.000000