使用python将大型数据集的数据分组为每周、每月和每年?
我有使用python将大型数据集的数据分组为每周、每月和每年?,python,pandas,time-series,pandas-groupby,python-datetime,Python,Pandas,Time Series,Pandas Groupby,Python Datetime,我有数据集,它以数据帧格式记录了20年的“X”值。X以3小时平均值记录数据,数据样本如下所示 Time_stamp X 1992-01-01 03:00:00 10.2 1992-01-01 06:00:00 10.4 1992-01-01 09:00:00 11.8 1992-01-01 12:00:00 12.0 1992-01-01 15:00:00 10.4 1992-01-01 18:00:00 9.4 1
数据集
,它以数据帧格式记录了20年的“X”值。X以3小时平均值记录数据,数据样本如下所示
Time_stamp X
1992-01-01 03:00:00 10.2
1992-01-01 06:00:00 10.4
1992-01-01 09:00:00 11.8
1992-01-01 12:00:00 12.0
1992-01-01 15:00:00 10.4
1992-01-01 18:00:00 9.4
1992-01-01 21:00:00 10.4
1992-01-02 00:00:00 13.6
1992-01-02 03:00:00 13.2
1992-01-02 06:00:00 11.8
1992-01-02 09:00:00 12.0
1992-01-02 12:00:00 12.8
1992-01-02 15:00:00 12.6
1992-01-02 18:00:00 11.0
1992-01-02 21:00:00 12.2
1992-01-03 00:00:00 13.8
1992-01-03 03:00:00 14.0
1992-01-03 06:00:00 13.4
1992-01-03 09:00:00 14.2
1992-01-03 12:00:00 16.2
1992-01-03 15:00:00 13.2
1992-01-03 18:00:00 13.4
1992-01-03 21:00:00 13.8
1992-01-04 00:00:00 14.8
1992-01-04 03:00:00 13.8
1992-01-04 06:00:00 7.6
1992-01-04 09:00:00 5.8
1992-01-04 12:00:00 4.4
1992-01-04 15:00:00 5.6
1992-01-04 18:00:00 6.0
1992-01-04 21:00:00 7.0
1992-01-05 00:00:00 6.8
1992-01-05 03:00:00 3.4
1992-01-05 06:00:00 5.8
1992-01-05 09:00:00 10.6
1992-01-05 12:00:00 9.2
1992-01-05 15:00:00 10.6
1992-01-05 18:00:00 9.8
1992-01-05 21:00:00 11.2
1992-01-06 00:00:00 12.0
1992-01-06 03:00:00 10.2
1992-01-06 06:00:00 9.0
1992-01-06 09:00:00 9.0
1992-01-06 12:00:00 8.6
1992-01-06 15:00:00 8.4
1992-01-06 18:00:00 8.2
1992-01-06 21:00:00 8.8
1992-01-07 00:00:00 10.0
1992-01-07 03:00:00 9.6
1992-01-07 06:00:00 8.0
1992-01-07 09:00:00 9.6
1992-01-07 12:00:00 10.8
1992-01-07 15:00:00 10.2
1992-01-07 18:00:00 9.8
1992-01-07 21:00:00 10.2
1992-01-08 00:00:00 9.4
1992-01-08 03:00:00 11.4
1992-01-08 06:00:00 12.6
1992-01-08 09:00:00 12.8
1992-01-08 12:00:00 10.4
1992-01-08 15:00:00 11.2
1992-01-08 18:00:00 9.0
1992-01-08 21:00:00 10.2
1992-01-09 00:00:00 8.2
我想创建单独的数据框架,计算并记录给定数据集的年平均值、周平均值和日平均值。我是python新手,刚刚开始使用时间序列数据。我在stackoverflow这里发现了一些与此相关的问题,但没有找到与此相关的适当答案,也没有找到任何开始的想法。有什么帮助吗?
到目前为止我写了这段代码
import pandas as pd
import numpy as np
datasets['date_minus_time'] = df["Time_stamp"].apply( lambda df :
datetime.datetime(year=datasets.year, month=datasets.month,
day=datasets.day))
datasets.set_index(df["date_minus_time"],inplace=True)
df['count'].resample('D', how='sum')
df['count'].resample('W', how='sum')
df['count'].resample('M', how='sum')
但却不知道如何每3小时记录一次数据。为了得到我想要的结果,接下来应该做些什么。您可以使用:
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'], format='%Y-%m-%d %H:%M:%S')
df.set_index('Time_stamp',inplace=True)
df_monthly = df.resample('M').mean()
df\u每月输出:
X
Time_stamp
1992-01-31 10.403125
对于每日平均值使用:df_daily=df.resample('D').mean()
,其输出:
X
Time_stamp
1992-01-01 10.657143
1992-01-02 12.400000
1992-01-03 14.000000
1992-01-04 8.125000
1992-01-05 8.425000
1992-01-06 9.275000
1992-01-07 9.775000
1992-01-08 10.875000
1992-01-09 8.200000
用于列中的日期时间以提高性能,然后使用参数on
指定日期时间列:
df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])
df_daily = df.resample('D', on='Time_stamp').mean()
df_monthly = df.resample('M', on='Time_stamp').mean()
df_weekly = df.resample('W', on='Time_stamp').mean()
你试过什么吗?非常感谢。关于你的答案,只有一个问题。每3小时记录一次的数据是否会对您显示的计算结果产生影响?@AnkitaDebnath-hmmm,熊猫队通常在1k-1M行(近似)中速度较快,因此如果行数较多,则速度应较慢。这取决于数据。谢谢你的解释。