Pandas 根据时间范围设置值
我想将所有值设置为某个阈值(如7)上任何值的某个时间段(如1小时)内出现的某个值(如999)。我有一些不可靠的非矢量化方法,但必须有一个更好的,泛弹性的方法来做到这一点 例如: 设置随机数据帧:Pandas 根据时间范围设置值,pandas,time-series,Pandas,Time Series,我想将所有值设置为某个阈值(如7)上任何值的某个时间段(如1小时)内出现的某个值(如999)。我有一些不可靠的非矢量化方法,但必须有一个更好的,泛弹性的方法来做到这一点 例如: 设置随机数据帧: hr_rng = pd.date_range(start='7/1/2014 00:00:00', end='7/1/2014 10:00:00', freq='H') df = pd.DataFrame(hr_rng, columns=['date_time']) df.set_index(pd.Da
hr_rng = pd.date_range(start='7/1/2014 00:00:00', end='7/1/2014 10:00:00', freq='H')
df = pd.DataFrame(hr_rng, columns=['date_time'])
df.set_index(pd.DatetimeIndex(df['date_time']),inplace=True)
df['val0']=np.random.randint(1, 10, df.shape[0])
随机输出:
date_time val0
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 4
2014-07-01 01:00:00 2014-07-01 01:00:00 8
2014-07-01 02:00:00 2014-07-01 02:00:00 4
2014-07-01 03:00:00 2014-07-01 03:00:00 7
2014-07-01 04:00:00 2014-07-01 04:00:00 2
2014-07-01 05:00:00 2014-07-01 05:00:00 4
2014-07-01 06:00:00 2014-07-01 06:00:00 4
2014-07-01 07:00:00 2014-07-01 07:00:00 9
2014-07-01 08:00:00 2014-07-01 08:00:00 1
2014-07-01 09:00:00 2014-07-01 09:00:00 9
2014-07-01 10:00:00 2014-07-01 10:00:00 5
我想得到的是:
date_time val0
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 999
2014-07-01 01:00:00 2014-07-01 01:00:00 999
2014-07-01 02:00:00 2014-07-01 02:00:00 999
2014-07-01 03:00:00 2014-07-01 03:00:00 7
2014-07-01 04:00:00 2014-07-01 04:00:00 2
2014-07-01 05:00:00 2014-07-01 05:00:00 4
2014-07-01 06:00:00 2014-07-01 06:00:00 999
2014-07-01 07:00:00 2014-07-01 07:00:00 999
2014-07-01 08:00:00 2014-07-01 08:00:00 999
2014-07-01 09:00:00 2014-07-01 09:00:00 999
2014-07-01 10:00:00 2014-07-01 10:00:00 999
另一个随机示例:
date_time val0
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 5
2014-07-01 01:00:00 2014-07-01 01:00:00 6
2014-07-01 02:00:00 2014-07-01 02:00:00 3
2014-07-01 03:00:00 2014-07-01 03:00:00 2
2014-07-01 04:00:00 2014-07-01 04:00:00 9
2014-07-01 05:00:00 2014-07-01 05:00:00 7
2014-07-01 06:00:00 2014-07-01 06:00:00 6
2014-07-01 07:00:00 2014-07-01 07:00:00 8
2014-07-01 08:00:00 2014-07-01 08:00:00 6
2014-07-01 09:00:00 2014-07-01 09:00:00 7
2014-07-01 10:00:00 2014-07-01 10:00:00 3
应该是这样的:
date_time val0
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 5
2014-07-01 01:00:00 2014-07-01 01:00:00 6
2014-07-01 02:00:00 2014-07-01 02:00:00 3
2014-07-01 03:00:00 2014-07-01 03:00:00 999
2014-07-01 04:00:00 2014-07-01 04:00:00 999
2014-07-01 05:00:00 2014-07-01 05:00:00 999
2014-07-01 06:00:00 2014-07-01 06:00:00 999
2014-07-01 07:00:00 2014-07-01 07:00:00 999
2014-07-01 08:00:00 2014-07-01 08:00:00 999
2014-07-01 09:00:00 2014-07-01 09:00:00 999
2014-07-01 10:00:00 2014-07-01 10:00:00 999
以下是一种方法,IIUC:
import pandas as pd
import numpy as np
np.random.seed(42)
hr_rng = pd.date_range(start='7/1/2014 00:00:00',
end='7/1/2014 10:00:00',
freq='H')
df = pd.DataFrame(hr_rng, columns=['date_time'])
df.set_index(pd.DatetimeIndex(df['date_time']),inplace=True)
df['val0']=np.random.randint(1, 10, df.shape[0])
现在,更新等于或大于阈值的行
threshold = 7
# initialize
df['test'] = df['val0']
mask = df['val0'] >= threshold
df.loc[mask, 'test'] = 999
print(df.head())
date_time val0 test
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 7 999
2014-07-01 01:00:00 2014-07-01 01:00:00 4 4
2014-07-01 02:00:00 2014-07-01 02:00:00 8 999
2014-07-01 03:00:00 2014-07-01 03:00:00 5 5
2014-07-01 04:00:00 2014-07-01 04:00:00 7 999
您是否对查找和更新选定值有疑问?或者将观察值放入一小时的桶中?这里有一种方法,IIUC:
import pandas as pd
import numpy as np
np.random.seed(42)
hr_rng = pd.date_range(start='7/1/2014 00:00:00',
end='7/1/2014 10:00:00',
freq='H')
df = pd.DataFrame(hr_rng, columns=['date_time'])
df.set_index(pd.DatetimeIndex(df['date_time']),inplace=True)
df['val0']=np.random.randint(1, 10, df.shape[0])
现在,更新等于或大于阈值的行
threshold = 7
# initialize
df['test'] = df['val0']
mask = df['val0'] >= threshold
df.loc[mask, 'test'] = 999
print(df.head())
date_time val0 test
date_time
2014-07-01 00:00:00 2014-07-01 00:00:00 7 999
2014-07-01 01:00:00 2014-07-01 01:00:00 4 4
2014-07-01 02:00:00 2014-07-01 02:00:00 8 999
2014-07-01 03:00:00 2014-07-01 03:00:00 5 5
2014-07-01 04:00:00 2014-07-01 04:00:00 7 999
您是否对查找和更新选定值有疑问?或者将观察值放入一个小时的桶中?您能否更具体地说明为什么设置所有值时没有3行?哪里设置了7个阈值?也可以发布您的非矢量化解决方案吗?您能否更具体地说明为什么设置所有值时没有3行?哪里设置了7个阈值?还有可能发布你的非矢量化解决方案吗?嗯,如果使用OP的数据,这是错误的解决方案。似乎OP还需要其他东西。嗯,如果使用OP的数据,这是错误的解决方案。看来我还需要别的东西。