Python 如何在给定的datetime64值范围内获取错过的日期?
我在熊猫中有以下数据帧df:Python 如何在给定的datetime64值范围内获取错过的日期?,python,pandas,datetime,Python,Pandas,Datetime,我在熊猫中有以下数据帧df: dti id_n 2016-07-27 13:55:00 1 2016-07-29 13:50:07 1 2016-07-29 14:50:08 1 2016-07-30 23:50:01 2 2016-08-01 12:50:00 3 2016-08-02 12:50:00 3 dti的类型是datetime64。 我希望获得新的数据帧结果,其中dti的最小值和最大值之间缺少日期: 结果= 2016-07-28 20
dti id_n
2016-07-27 13:55:00 1
2016-07-29 13:50:07 1
2016-07-29 14:50:08 1
2016-07-30 23:50:01 2
2016-08-01 12:50:00 3
2016-08-02 12:50:00 3
dti的类型是datetime64。
我希望获得新的数据帧结果,其中dti的最小值和最大值之间缺少日期:
结果=
2016-07-28
2016-07-31
如何获取它?用于删除时间,然后创建并获取:
另一种解决方案是通过平均值进行下采样,并获得NaNs值的指数:
这里是另一个解决方案,供比较。我使用normalize删除时间并执行一组比较
import pandas as pd
df = pd.DataFrame([['2016-07-27 13:55:00', 1], ['2016-07-29 13:50:07', 1],
['2016-07-29 14:50:08', 1], ['2016-07-30 23:50:01', 2],
['2016-08-01 12:50:00', 3], ['2016-08-02 12:50:00', 3]],
columns=['dti', 'id_n'])
df['dti'] = pd.to_datetime(df['dti'])
full = set(pd.to_datetime(pd.date_range(df['dti'].dt.date.min(), df['dti'].dt.date.max(), normalize=True)))
select = set(df['dti'].dt.normalize())
full - select
# {Timestamp('2016-07-28 00:00:00', freq='D'),
# Timestamp('2016-07-31 00:00:00', freq='D')}
a = df.resample('d', on='dti').mean()
print (a)
id_n
dti
2016-07-27 1.0
2016-07-28 NaN
2016-07-29 1.0
2016-07-30 2.0
2016-07-31 NaN
2016-08-01 3.0
2016-08-02 3.0
b = a.index[a['id_n'].isnull()]
print (b)
DatetimeIndex(['2016-07-28', '2016-07-31'], dtype='datetime64[ns]', name='dti', freq=None)
import pandas as pd
df = pd.DataFrame([['2016-07-27 13:55:00', 1], ['2016-07-29 13:50:07', 1],
['2016-07-29 14:50:08', 1], ['2016-07-30 23:50:01', 2],
['2016-08-01 12:50:00', 3], ['2016-08-02 12:50:00', 3]],
columns=['dti', 'id_n'])
df['dti'] = pd.to_datetime(df['dti'])
full = set(pd.to_datetime(pd.date_range(df['dti'].dt.date.min(), df['dti'].dt.date.max(), normalize=True)))
select = set(df['dti'].dt.normalize())
full - select
# {Timestamp('2016-07-28 00:00:00', freq='D'),
# Timestamp('2016-07-31 00:00:00', freq='D')}