Python 熊猫产品的连续日期
我开始与熊猫合作,我有一些问题我真的不知道如何解决 我有一个包含日期、产品、库存和销售额的数据框。缺少一些日期和产品。我想得到一个时间序列的每个产品的日期范围 例如:Python 熊猫产品的连续日期,python,pandas,dataframe,time-series,Python,Pandas,Dataframe,Time Series,我开始与熊猫合作,我有一些问题我真的不知道如何解决 我有一个包含日期、产品、库存和销售额的数据框。缺少一些日期和产品。我想得到一个时间序列的每个产品的日期范围 例如: product udsStock udsSales date 2019-12-26 14 161 848 2019-12-27 14 1340 914 2019-12-30 14 856 0 2019-12-25 4
product udsStock udsSales
date
2019-12-26 14 161 848
2019-12-27 14 1340 914
2019-12-30 14 856 0
2019-12-25 4 3132 439
2019-12-27 4 3177 616
2020-01-01 4 500 883
所有产品的范围必须相同,即使一个产品没有出现在范围内的某个日期
如果我想要2019-12-25到2020-01-01的范围,最终数据帧应该如下所示:
product udsStock udsSales
date
2019-12-25 14 NaN NaN
2019-12-26 14 161 848
2019-12-27 14 1340 914
2019-12-28 14 NaN NaN
2019-12-29 14 NaN NaN
2019-12-30 14 856 0
2019-12-31 14 NaN NaN
2020-01-01 14 NaN NaN
2019-12-25 4 3132 439
2019-12-26 4 NaN NaN
2019-12-27 4 3177 616
2019-12-28 4 NaN NaN
2019-12-29 4 NaN NaN
2019-12-30 4 NaN NaN
2019-12-31 4 NaN NaN
2020-01-01 4 500 883
我曾尝试按范围重新编制索引,但没有效果,因为有相同的索引
idx = pd.date_range('25-12-2019', '01-01-2020')
df = df.reindex(idx)
我还尝试按日期和产品编制索引,然后重新编制索引,但我不知道如何放置缺少的产品
还有什么想法吗
提前感谢将索引转换为日期时间对象:
df2.index = pd.to_datetime(df2.index)
创建日期和产品的独特组合:
import itertools
idx = pd.date_range("25-12-2019", "01-01-2020")
product = df2["product"].unique()
temp = itertools.product(idx, product)
temp = pd.MultiIndex.from_tuples(temp, names=["date", "product"])
temp
MultiIndex([('2019-12-25', 14),
('2019-12-25', 4),
('2019-12-26', 14),
('2019-12-26', 4),
('2019-12-27', 14),
('2019-12-27', 4),
('2019-12-28', 14),
('2019-12-28', 4),
('2019-12-29', 14),
('2019-12-29', 4),
('2019-12-30', 14),
('2019-12-30', 4),
('2019-12-31', 14),
('2019-12-31', 4),
('2020-01-01', 14),
('2020-01-01', 4)],
names=['date', 'product'])
重新索引数据帧:
df2.set_index("product", append=True).reindex(temp).sort_index(
level=1, ascending=False
).reset_index(level="product")
product udsStock udsSales
date
2020-01-01 14 NaN NaN
2019-12-31 14 NaN NaN
2019-12-30 14 856.0 0.0
2019-12-29 14 NaN NaN
2019-12-28 14 NaN NaN
2019-12-27 14 1340.0 914.0
2019-12-26 14 161.0 848.0
2019-12-25 14 NaN NaN
2020-01-01 4 500.0 883.0
2019-12-31 4 NaN NaN
2019-12-30 4 NaN NaN
2019-12-29 4 NaN NaN
2019-12-28 4 NaN NaN
2019-12-27 4 3177.0 616.0
2019-12-26 4 NaN NaN
2019-12-25 4 3132.0 439.0
在R,特别是tidyverse中,可以使用该方法实现。在Python中,包有一些东西,但仍有一些问题需要解决(已经为此提交了PR)。我们可以使用
pd.date\u range
和groupby.reindex
来实现您的结果:
date_range = pd.date_range(start='2019-12-25', end='2020-01-01', freq='D')
df = df.groupby('product', sort=False).apply(lambda x: x.reindex(date_range))
df['product'] = df.groupby(level=0)['product'].ffill().bfill()
df = df.droplevel(0)
product udsStock udsSales
2019-12-25 14.0 NaN NaN
2019-12-26 14.0 161.0 848.0
2019-12-27 14.0 1340.0 914.0
2019-12-28 14.0 NaN NaN
2019-12-29 14.0 NaN NaN
2019-12-30 14.0 856.0 0.0
2019-12-31 14.0 NaN NaN
2020-01-01 14.0 NaN NaN
2019-12-25 4.0 3132.0 439.0
2019-12-26 4.0 NaN NaN
2019-12-27 4.0 3177.0 616.0
2019-12-28 4.0 NaN NaN
2019-12-29 4.0 NaN NaN
2019-12-30 4.0 NaN NaN
2019-12-31 4.0 NaN NaN
2020-01-01 4.0 500.0 883.0