Python 填充所有datetime列直到特定日期
我有一个数据框架,代表不同产品和不同商店的日常需求Python 填充所有datetime列直到特定日期,python,pandas,datetime,Python,Pandas,Datetime,我有一个数据框架,代表不同产品和不同商店的日常需求 SKU Store F LeadTime Date Qty Value Price Level 0 504777 1 135828 11 2018-01-22 1 3.99 3.99 45 1 504777 1 135828 11 2018-01-23 0 0.00 0.00 45 2 504777
SKU Store F LeadTime Date Qty Value Price Level
0 504777 1 135828 11 2018-01-22 1 3.99 3.99 45
1 504777 1 135828 11 2018-01-23 0 0.00 0.00 45
2 504777 1 135828 11 2018-01-24 3 11.97 3.99 42
3 504777 1 135828 11 2018-01-25 1 3.99 3.99 41
4 504777 1 135828 11 2018-01-26 0 0.00 0.00 41
300 704777 2 135828 11 2018-01-22 1 4.99 3.99 45
301 704777 2 135828 11 2018-01-23 0 0.00 0.00 47
302 704777 2 135828 11 2018-01-24 4 12.97 3.99 48
303 704777 2 135828 11 2018-01-25 1 3.99 3.99 49
在本例中,我试图使用以下条件完成数据集,直到2018-01-31:
- 以下列:
应填写最后一个值SKU、Store、F、前置时间、日期、级别
- 以下列:
应填写0数量、价值、价格
SKU Store F LeadTime Date Qty Value Price Level
0 504777 1 135828 11 2018-01-22 1 3.99 3.99 45
1 504777 1 135828 11 2018-01-23 0 0.00 0.00 45
2 504777 1 135828 11 2018-01-24 3 11.97 3.99 42
3 504777 1 135828 11 2018-01-25 1 3.99 3.99 41
4 504777 1 135828 11 2018-01-26 1 3.99 3.99 41
5 504777 1 135828 11 2018-01-27 0 0.00 0.00 41
6 504777 1 135828 11 2018-01-28 0 0.00 0.00 41
7 504777 1 135828 11 2018-01-29 0 0.00 0.00 41
8 504777 1 135828 11 2018-01-30 0 0.00 0.00 41
9 504777 1 135828 11 2018-01-31 0 0.00 0.00 41
300 704777 2 135828 11 2018-01-22 1 4.99 3.99 45
301 704777 2 135828 11 2018-01-23 0 0.00 0.00 47
302 704777 2 135828 11 2018-01-24 4 12.97 3.99 48
303 704777 2 135828 11 2018-01-25 1 3.99 3.99 49
304 704777 2 135828 11 2018-01-26 0 0 0 49
305 704777 2 135828 11 2018-01-27 0 0 0 49
306 704777 2 135828 11 2018-01-28 0 0 0 49
307 704777 2 135828 11 2018-01-29 0 0 0 49
307 704777 2 135828 11 2018-01-30 0 0 0 49
307 704777 2 135828 11 2018-01-31 0 0 0 49
我试过这个:
df = df.set_index('Date').groupby(['SKU', 'Store']).date_range(end = '2018-01-31', freq='D').agg({
'F':'last',
'LeadTime':'last',
'Price':0,
'Value':0,
'Qty':0,
'Level':'last'}).reset_index()
但这不是正确的方法:
'DataFrameGroupBy' object has no attribute 'date_range'
PS:每种产品都有不同的起始日期在
SKU
和Store
上的第一个groupby
同时,您可以将start
作为df的最大值,将end
作为2018-01-31
创建一个
注意我在这里使用列表理解来获得速度方面的胜利
然后根据需要选择0
列
最后,所有groupby数据帧和使用:
我建议您尝试对每组重新编制索引。然后创建一个列表来存储每个组,并从该列表中创建一个
DataFrame
df['Date'] = pd.to_datetime(df['Date'])
dfs = []
for _, d in df.groupby(['SKU', 'Store']):
start_date = d.Date.iloc[0]
end_date = start_date + pd.offsets.MonthEnd()
d.set_index('Date', inplace=True)
d = d.reindex(pd.date_range(start_date, end_date))
d.fillna
dfs.append(d)
new_df = pd.concat(dfs)
new_df
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
然后使用ffill
填充NaN
new_df = pd.concat(dfs)
new_df[['Price', 'Qty', 'Value']] = new_df[['Price', 'Qty', 'Value']].fillna(0)
new_df.ffill(inplace=True)
new_df
Out[17]:
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-28 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-29 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-30 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-31 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-27 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-28 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-29 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-30 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-31 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
我需要使用groupby(['SKU','Store'])
。我应该把这个放在哪里?
df['Date'] = pd.to_datetime(df['Date'])
dfs = []
for _, d in df.groupby(['SKU', 'Store']):
start_date = d.Date.iloc[0]
end_date = start_date + pd.offsets.MonthEnd()
d.set_index('Date', inplace=True)
d = d.reindex(pd.date_range(start_date, end_date))
d.fillna
dfs.append(d)
new_df = pd.concat(dfs)
new_df
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-27 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-28 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-29 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-30 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-31 NaN NaN NaN NaN NaN NaN NaN NaN
new_df = pd.concat(dfs)
new_df[['Price', 'Qty', 'Value']] = new_df[['Price', 'Qty', 'Value']].fillna(0)
new_df.ffill(inplace=True)
new_df
Out[17]:
SKU Store F LeadTime Qty Value Price Level
2018-01-22 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 45.0
2018-01-23 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 45.0
2018-01-24 504777.0 1.0 135828.0 11.0 3.0 11.97 3.99 42.0
2018-01-25 504777.0 1.0 135828.0 11.0 1.0 3.99 3.99 41.0
2018-01-26 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-27 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-28 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-29 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-30 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-31 504777.0 1.0 135828.0 11.0 0.0 0.00 0.00 41.0
2018-01-22 704777.0 2.0 135828.0 11.0 1.0 4.99 3.99 45.0
2018-01-23 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 47.0
2018-01-24 704777.0 2.0 135828.0 11.0 4.0 12.97 3.99 48.0
2018-01-25 704777.0 2.0 135828.0 11.0 1.0 3.99 3.99 49.0
2018-01-26 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-27 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-28 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-29 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-30 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0
2018-01-31 704777.0 2.0 135828.0 11.0 0.0 0.00 0.00 49.0