在Python中设置子数据集中的datetime格式
我有数据在Python中设置子数据集中的datetime格式,python,pandas,numpy,datetime,Python,Pandas,Numpy,Datetime,我有数据df: Id timestamp data Date 27585 27826 2020-01-02 08:55:46.297 19.0 2020-01-02 27586 27827 2020-01-02 08:55:46.397 20.0 2020-01-02 27587 27828 2020-01-02 08:55:47.283 20.0 2020-01-02 27588
df
:
Id timestamp data Date
27585 27826 2020-01-02 08:55:46.297 19.0 2020-01-02
27586 27827 2020-01-02 08:55:46.397 20.0 2020-01-02
27587 27828 2020-01-02 08:55:47.283 20.0 2020-01-02
27588 27829 2020-01-02 08:55:47.383 21.5 2020-01-02
27589 27830 2020-01-02 08:55:48.287 21.5 2020-01-02
我想找出每个唯一的日期在12pm
和4pm
之间的平均数据
我试过:
for date in df['Date'].unique():
df_date = df[df['Date'] == date]
start_date = pd.to_datetime('12:00:00')
end_date = pd.to_datetime('16:00:00')
df_date1 = df_date.loc[(df_date['timestamp'].dt.time >= start_date) &
(df_date['timestamp'].dt.time <= end_date)]
df.set_index(["data"], inplace=True)
df = df.sort_index()
df = df.resample('1S').fillna('ffill')
df['data'].mean()
df['date']中日期的。唯一()
日期=日期
开始日期=pd.到日期时间('12:00:00')
结束日期=pd.至日期时间('16:00:00')
df_date1=df_date.loc[(df_date['timestamp'].dt.time>=开始日期)和
(df_date['timestamp'].dt.time我认为您需要使用DatetimeIndex
来选择2次之间的行,然后聚合平均值
:
#changed data sample for match
print (df)
Id timestamp data Date
27585 27826 2020-01-02 11:55:46.297 19.0 2020-01-02
27586 27827 2020-01-02 12:55:46.397 25.0 2020-02-02
27587 27828 2020-01-02 13:55:47.283 20.0 2020-02-02
27588 27829 2020-01-02 14:55:47.383 21.5 2020-03-02
27589 27830 2020-01-02 08:55:48.287 21.5 2020-04-02
df['timestamp'] = pd.to_datetime(df['timestamp'])
print (df.set_index('timestamp')
.between_time('12:00:00','16:00:00'))
Id data Date
timestamp
2020-01-02 12:55:46.397 27827 25.0 2020-02-02
2020-01-02 13:55:47.283 27828 20.0 2020-02-02
2020-01-02 14:55:47.383 27829 21.5 2020-03-02
df1 = (df.set_index('timestamp')
.between_time('12:00:00','16:00:00')
.groupby('Date')['data']
.mean())
print (df1)
Date
2020-02-02 22.5
2020-03-02 21.5
Name: data, dtype: float64
如果需要,使用groupby
bytimestamp
s重新采样
:
df1 = (df.set_index('timestamp')
.between_time('12:00:00','16:00:00')
.groupby('Date')['data']
.resample('1S')
.ffill())
print (df1)
Date timestamp
2020-02-02 2020-01-02 12:55:46 NaN
2020-01-02 12:55:47 25.0
2020-01-02 12:55:48 25.0
2020-01-02 12:55:49 25.0
2020-01-02 12:55:50 25.0
...
2020-01-02 13:55:44 25.0
2020-01-02 13:55:45 25.0
2020-01-02 13:55:46 25.0
2020-01-02 13:55:47 25.0
2020-03-02 2020-01-02 14:55:47 NaN
Name: data, Length: 3603, dtype: float64
然后是每个第一个日期级别的可能计数<代码>平均值<代码>:
df1 = (df.set_index('timestamp')
.between_time('12:00:00','16:00:00')
.groupby('Date')['data']
.resample('1S')
.ffill()
.mean(level=0)
.reset_index())
print (df1)
Date data
0 2020-02-02 25.0
1 2020-03-02 NaN
我可以添加另一个步骤来使用ffill
对数据重新采样吗?@nilsinelabre-Sure;)@NilsineLabre-更改的数据样本。但不确定平均值。是否需要按秒对每个组重新采样时间戳
?是的,如果可能的话。我想做1)按日期分组
;2)过滤时间戳
从12pm
到4pm
每个唯一的日期
;3)用1秒fre向前填充重新采样quency;4)计算每个日期的数据的平均值。是否可以在没有for循环的情况下执行所有操作?@nilsinelabore-使用df.loc[df.groupby('Date')['Date'].idxmax()]
-因为日期
是列