Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:给定一个开始和结束日期,为中间的每一天添加一列,然后添加值?_Python_Pandas - Fatal编程技术网

Python 熊猫:给定一个开始和结束日期,为中间的每一天添加一列,然后添加值?

Python 熊猫:给定一个开始和结束日期,为中间的每一天添加一列,然后添加值?,python,pandas,Python,Pandas,这是我的数据: df = pd.DataFrame([ {start_date: '2019/12/01', end_date: '2019/12/05', spend: 10000, campaign_id: 1} {start_date: '2019/12/05', end_date: '2019/12/09', spend: 50000, campaign_id: 2} {start_date: '2019/12/01', end_date: '', spend: 100

这是我的数据:

df = pd.DataFrame([
   {start_date: '2019/12/01', end_date: '2019/12/05', spend: 10000, campaign_id: 1}
   {start_date: '2019/12/05', end_date: '2019/12/09', spend: 50000, campaign_id: 2}
   {start_date: '2019/12/01', end_date: '', spend: 10000, campaign_id: 3}
   {start_date: '2019/12/01', end_date: '2019/12/01', spend: 50, campaign_id: 4}
]);
我需要为2019/12/01以来每天的每一行添加一列,并计算该活动当天的花费,我将通过将该活动的花费除以其活动的总天数得到

所以在这里我要为12月1日到今天12月10日之间的每一天添加一列。对于第1行,12月1日至5日的五列内容为2000,而12月5日至10日的六列内容为零


我知道熊猫是为这类问题精心设计的,但我不知道从哪里开始

对我来说似乎不是一项直截了当的任务。但如果尚未转换日期列,请首先转换日期列:

df["start_date"] = pd.to_datetime(df["start_date"])
df["end_date"] = pd.to_datetime(df["end_date"])
然后创建用于重采样的辅助函数:

def resampler(data, daterange):
    temp = (data.set_index('start_date').groupby('campaign_id')
                 .apply(daterange)
                 .drop("campaign_id",axis=1)
                 .reset_index().rename(columns={"level_1":"start_date"}))
    return temp
现在这是一个三步的过程。首先根据每组的结束日期对数据重新采样:

df1 = resampler(df, lambda d: d.reindex(pd.date_range(min(d.index),max(d["end_date"]),freq="D")) if d["end_date"].notnull().all() else d)

df1["spend"] = df1.groupby("campaign_id")["spend"].transform(lambda x: x.mean()/len(x))
计算平均值后,重新采样到当前日期:

dates = pd.date_range(min(df["start_date"]),pd.Timestamp.today(),freq="D")

df1 = resampler(df1,lambda d: d.reindex(dates))
最后,转置数据帧:

df1 = pd.concat([df1.drop("end_date",axis=1).set_index(["campaign_id","start_date"]).unstack(),
                 df1.groupby("campaign_id")["end_date"].min()], axis=1)
df1.columns = [*dates,"end_date"]

print (df1)

#
             2019-12-01 00:00:00  2019-12-02 00:00:00  2019-12-03 00:00:00  2019-12-04 00:00:00  2019-12-05 00:00:00  2019-12-06 00:00:00  2019-12-07 00:00:00  2019-12-08 00:00:00  2019-12-09 00:00:00  2019-12-10 00:00:00   end_date
campaign_id                                                                                                                                                                                                                             
1                         2000.0               2000.0               2000.0               2000.0               2000.0                  NaN                  NaN                  NaN                  NaN                  NaN 2019-12-05
2                            NaN                  NaN                  NaN                  NaN              10000.0              10000.0              10000.0              10000.0              10000.0                  NaN 2019-12-09
3                        10000.0                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN        NaT
4                           50.0                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN 2019-12-01

您如何处理第三行中缺少的结束日期?您能给出一个预期的结束日期吗output@AdibP抱歉,应该指定-应设置为今天的日期。