Python 熊猫将每小时数据拆分为15分钟间隔数据
我有一个csv文件,其中包含2年内每天每小时的温度、湿度数据。 我想通过减去小时之间的温度和湿度差,然后将差值除以4,将此数据拆分为15分钟间隔数据(以获得15分钟间隔数据) 如何在熊猫身上实现这一点 下面是数据示例Python 熊猫将每小时数据拆分为15分钟间隔数据,python,pandas,datetime,Python,Pandas,Datetime,我有一个csv文件,其中包含2年内每天每小时的温度、湿度数据。 我想通过减去小时之间的温度和湿度差,然后将差值除以4,将此数据拆分为15分钟间隔数据(以获得15分钟间隔数据) 如何在熊猫身上实现这一点 下面是数据示例 Location,Temperature,Humidity,Date,Hour WA,70.403,73.493,2019-03-01,0 WA,71.593,73.153,2019-03-01,1 NY,73.131,74.93,2019-03-01,0 NY,73.085,73
Location,Temperature,Humidity,Date,Hour
WA,70.403,73.493,2019-03-01,0
WA,71.593,73.153,2019-03-01,1
NY,73.131,74.93,2019-03-01,0
NY,73.085,73.161,2019-03-01,1
首先重新采样()您的df:
df['Date']=df['Date']+''+df['Hour']+':00:00'
df['Date']=pd.to_datetime(df['Date'])
df.set_索引('Date',inplace=True)
df=df.resample('15T').asfreq()
接下来需要使用interpolate():
df['Temperature']=df['Temperature'].interpolate()
(!)但请注意,您需要分别处理每个位置。首先重新采样()您的df:
df['Date']=df['Date']+''+df['Hour']+':00:00'
df['Date']=pd.to_datetime(df['Date'])
df.set_索引('Date',inplace=True)
df=df.resample('15T').asfreq()
接下来需要使用interpolate():
df['Temperature']=df['Temperature'].interpolate()
(!)但请注意,您需要分别处理每个位置。开箱即用解决方案,并创建DatetimeIndex
、每列最后一次排序和索引
,将两列除以4
:
df = pd.concat([df.assign(minute='0'),
df.assign(minute = '15'),
df.assign(minute = '30'),
df.assign(minute = '45')])
df.index = pd.to_datetime(df['Date'].astype(str) +
df['Hour'].astype(str) +
df['minute'], format='%Y-%m-%d%H%M')
df = df.rename_axis('datetimes').sort_values(['Location','datetimes'])
df[['Temperature','Humidity']] /= 4
print (df)
Location Temperature Humidity Date Hour minute
datetimes
2019-03-01 00:00:00 NY 18.28275 18.73250 2019-03-01 0 0
2019-03-01 01:00:00 NY 18.27125 18.29025 2019-03-01 1 0
2019-03-01 01:05:00 NY 18.28275 18.73250 2019-03-01 0 15
2019-03-01 03:00:00 NY 18.28275 18.73250 2019-03-01 0 30
2019-03-01 04:05:00 NY 18.28275 18.73250 2019-03-01 0 45
2019-03-01 11:05:00 NY 18.27125 18.29025 2019-03-01 1 15
2019-03-01 13:00:00 NY 18.27125 18.29025 2019-03-01 1 30
2019-03-01 14:05:00 NY 18.27125 18.29025 2019-03-01 1 45
2019-03-01 00:00:00 WA 17.60075 18.37325 2019-03-01 0 0
2019-03-01 01:00:00 WA 17.89825 18.28825 2019-03-01 1 0
2019-03-01 01:05:00 WA 17.60075 18.37325 2019-03-01 0 15
2019-03-01 03:00:00 WA 17.60075 18.37325 2019-03-01 0 30
2019-03-01 04:05:00 WA 17.60075 18.37325 2019-03-01 0 45
2019-03-01 11:05:00 WA 17.89825 18.28825 2019-03-01 1 15
2019-03-01 13:00:00 WA 17.89825 18.28825 2019-03-01 1 30
2019-03-01 14:05:00 WA 17.89825 18.28825 2019-03-01 1 45
如果每组的最后几天不应包含15、30和45分钟:
df.index = pd.to_datetime(df['Date'].astype(str) + df['Hour'].astype(str),
format='%Y-%m-%d%H')
df = (df.groupby('Location').resample('15Min')[['Temperature','Humidity']]
.ffill()
.rename_axis(['Location','Datetime'])
.reset_index(level=0))
df[['Temperature','Humidity']] /= 4
print (df)
Location Temperature Humidity
Datetime
2019-03-01 00:00:00 NY 18.28275 18.73250
2019-03-01 00:15:00 NY 18.28275 18.73250
2019-03-01 00:30:00 NY 18.28275 18.73250
2019-03-01 00:45:00 NY 18.28275 18.73250
2019-03-01 01:00:00 NY 18.27125 18.29025
2019-03-01 00:00:00 WA 17.60075 18.37325
2019-03-01 00:15:00 WA 17.60075 18.37325
2019-03-01 00:30:00 WA 17.60075 18.37325
2019-03-01 00:45:00 WA 17.60075 18.37325
2019-03-01 01:00:00 WA 17.89825 18.28825
感谢您对interpolate
解决方案的建议:
df.index = pd.to_datetime(df['Date'].astype(str) + df['Hour'].astype(str),
format='%Y-%m-%d%H')
df = (df.groupby('Location').resample('15Min')[['Temperature','Humidity']]
.asfreq())
df = (df.groupby(['Location', pd.Grouper(freq='d', level=1)])
.transform(lambda x: x.interpolate()))
print (df)
Temperature Humidity
Location
NY 2019-03-01 00:00:00 73.1310 74.93000
2019-03-01 00:15:00 73.1195 74.48775
2019-03-01 00:30:00 73.1080 74.04550
2019-03-01 00:45:00 73.0965 73.60325
2019-03-01 01:00:00 73.0850 73.16100
WA 2019-03-01 00:00:00 70.4030 73.49300
2019-03-01 00:15:00 70.7005 73.40800
2019-03-01 00:30:00 70.9980 73.32300
2019-03-01 00:45:00 71.2955 73.23800
2019-03-01 01:00:00 71.5930 73.15300
开箱即用的解决方案,并创建DatetimeIndex
,每列最后一次排序和index
,将两列除以4
:
df = pd.concat([df.assign(minute='0'),
df.assign(minute = '15'),
df.assign(minute = '30'),
df.assign(minute = '45')])
df.index = pd.to_datetime(df['Date'].astype(str) +
df['Hour'].astype(str) +
df['minute'], format='%Y-%m-%d%H%M')
df = df.rename_axis('datetimes').sort_values(['Location','datetimes'])
df[['Temperature','Humidity']] /= 4
print (df)
Location Temperature Humidity Date Hour minute
datetimes
2019-03-01 00:00:00 NY 18.28275 18.73250 2019-03-01 0 0
2019-03-01 01:00:00 NY 18.27125 18.29025 2019-03-01 1 0
2019-03-01 01:05:00 NY 18.28275 18.73250 2019-03-01 0 15
2019-03-01 03:00:00 NY 18.28275 18.73250 2019-03-01 0 30
2019-03-01 04:05:00 NY 18.28275 18.73250 2019-03-01 0 45
2019-03-01 11:05:00 NY 18.27125 18.29025 2019-03-01 1 15
2019-03-01 13:00:00 NY 18.27125 18.29025 2019-03-01 1 30
2019-03-01 14:05:00 NY 18.27125 18.29025 2019-03-01 1 45
2019-03-01 00:00:00 WA 17.60075 18.37325 2019-03-01 0 0
2019-03-01 01:00:00 WA 17.89825 18.28825 2019-03-01 1 0
2019-03-01 01:05:00 WA 17.60075 18.37325 2019-03-01 0 15
2019-03-01 03:00:00 WA 17.60075 18.37325 2019-03-01 0 30
2019-03-01 04:05:00 WA 17.60075 18.37325 2019-03-01 0 45
2019-03-01 11:05:00 WA 17.89825 18.28825 2019-03-01 1 15
2019-03-01 13:00:00 WA 17.89825 18.28825 2019-03-01 1 30
2019-03-01 14:05:00 WA 17.89825 18.28825 2019-03-01 1 45
如果每组的最后几天不应包含15、30和45分钟:
df.index = pd.to_datetime(df['Date'].astype(str) + df['Hour'].astype(str),
format='%Y-%m-%d%H')
df = (df.groupby('Location').resample('15Min')[['Temperature','Humidity']]
.ffill()
.rename_axis(['Location','Datetime'])
.reset_index(level=0))
df[['Temperature','Humidity']] /= 4
print (df)
Location Temperature Humidity
Datetime
2019-03-01 00:00:00 NY 18.28275 18.73250
2019-03-01 00:15:00 NY 18.28275 18.73250
2019-03-01 00:30:00 NY 18.28275 18.73250
2019-03-01 00:45:00 NY 18.28275 18.73250
2019-03-01 01:00:00 NY 18.27125 18.29025
2019-03-01 00:00:00 WA 17.60075 18.37325
2019-03-01 00:15:00 WA 17.60075 18.37325
2019-03-01 00:30:00 WA 17.60075 18.37325
2019-03-01 00:45:00 WA 17.60075 18.37325
2019-03-01 01:00:00 WA 17.89825 18.28825
感谢您对interpolate
解决方案的建议:
df.index = pd.to_datetime(df['Date'].astype(str) + df['Hour'].astype(str),
format='%Y-%m-%d%H')
df = (df.groupby('Location').resample('15Min')[['Temperature','Humidity']]
.asfreq())
df = (df.groupby(['Location', pd.Grouper(freq='d', level=1)])
.transform(lambda x: x.interpolate()))
print (df)
Temperature Humidity
Location
NY 2019-03-01 00:00:00 73.1310 74.93000
2019-03-01 00:15:00 73.1195 74.48775
2019-03-01 00:30:00 73.1080 74.04550
2019-03-01 00:45:00 73.0965 73.60325
2019-03-01 01:00:00 73.0850 73.16100
WA 2019-03-01 00:00:00 70.4030 73.49300
2019-03-01 00:15:00 70.7005 73.40800
2019-03-01 00:30:00 70.9980 73.32300
2019-03-01 00:45:00 71.2955 73.23800
2019-03-01 01:00:00 71.5930 73.15300
“通过减去小时之间的温度和湿度差并将其除以4,将此数据拆分为15分钟间隔数据”-因此基本上您需要线性插值?“通过减去小时之间的温度和湿度差并将其除以4,将此数据拆分为15分钟间隔数据”-所以基本上你想要一个线性插值?