Python 基于时间戳索引创建布尔列_Python_Pandas

Python 基于时间戳索引创建布尔列

python pandas

Python 基于时间戳索引创建布尔列,python,pandas,Python,Pandas,我有一个timeseries数据框，如果时间在一天中的某些小时之间，我需要创建一个布尔列。我可以得到一个具有匹配索引位置的数组，但如何将其转换为布尔列呢？索引器时间法是进行此计算的最快方法吗 aapl.csv Datetime,Open,High,Low,Close,Volume,Dividends,Stock Splits 2020-10-26 04:15:00-04:00,113.7,113.78,113.5,113.5,0,0,0 2020-10-26 04:16:00-04:00,113

我有一个timeseries数据框，如果时间在一天中的某些小时之间，我需要创建一个布尔列。我可以得到一个具有匹配索引位置的数组，但如何将其转换为布尔列呢？索引器时间法是进行此计算的最快方法吗

aapl.csv

Datetime,Open,High,Low,Close,Volume,Dividends,Stock Splits
2020-10-26 04:15:00-04:00,113.7,113.78,113.5,113.5,0,0,0
2020-10-26 04:16:00-04:00,113.5,113.72,113.5,113.72,0,0,0
2020-10-26 04:17:00-04:00,113.69,113.79,113.65,113.74,0,0,0
2020-10-26 04:18:00-04:00,113.65,113.65,113.59,113.6,0,0,0
2020-10-26 04:19:00-04:00,113.55,113.59,113.4,113.54,0,0,0
2020-10-26 04:20:00-04:00,113.5,113.68,113.5,113.68,0,0,0
2020-10-26 04:21:00-04:00,113.71,113.71,113.6,113.6,0,0,0
2020-10-26 04:22:00-04:00,113.68,113.68,113.67,113.68,0,0,0


>>> df.read_csv("aapl.csv")
                             Open    High     Low   Close  Volume  Dividends  Stock Splits
Datetime                                                                                  
2020-10-26 04:15:00-04:00  113.70  113.78  113.50  113.50       0          0             0
2020-10-26 04:16:00-04:00  113.50  113.72  113.50  113.72       0          0             0
2020-10-26 04:17:00-04:00  113.69  113.79  113.65  113.74       0          0             0
2020-10-26 04:18:00-04:00  113.65  113.65  113.59  113.60       0          0             0
2020-10-26 04:19:00-04:00  113.55  113.59  113.40  113.54       0          0             0


df.index.indexer_between_time('9:30','15:59')
array([ 264,  265,  266, ..., 4166, 4167, 4168])


df['rth'] = ... 1 if in above array, else 0

下面是我尝试过的一些替代方法。 apply方法对索引不起作用。我必须先将索引复制到列

df['rth'] = df['bar_start'].apply(lambda dt: '0' if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59) else '1')

df['rth']=df['bar_start'].apply（如果dt.time（）datetime.time（15,59）或其他“1”，则lambda dt:'0'）

Loc方法很慢

for i in range(0, len(df.values)):
    dt = df.index[i]
    if dt.time() < datetime.time(9,30) or dt.time() > datetime.time(15,59):
        df.loc['rth', i] = 0
    else:
        df.loc['rth', i] = 1

范围内的i（0，len（df.values））：
dt=df.指数[i]
如果dt.time（）datetime.time（15,59）：
df.loc['rth'，i]=0
其他：
df.loc['rth'，i]=1

在速度更快的地方使用np：

df['rth'] = np.where( (df['bar_start'] < datetime.time(9,30)) | ( df['bar_start'] > datetime.time(15,59)),False, True)

df['rth']=np.where（（df['bar_start']datetime.time（15,59）），False，True）

请再试一次

df=df.reset_index()#Reset index

#强制Datetime到str H:m并将df['status']=pd.to_Datetime（df['Datetime']）.dt.strftime（“%H:%m”）。介于（'09:30'，'15:39'）。astype（int）

您可以使用：

df = df.reset_index()
d = pd.to_datetime(df['Datetime'].str[:-6])
m = (d.dt.hour.between(9,15)) & (d.dt.minute.between(30,59))
df['rth'] = np.where(m, 1, 0)
df

这有几个问题。我必须先用

df['rth']=df将索引复制到一个列中。index

python抱怨将带有时区的日期时间与您可以重置的时间进行比较。在执行此操作之前，是否有方法返回1,0整数而不是布尔值

df['status']=df['status'].astype（int）

有效，只是想知道我是否可以在一行代码中完成所有操作请查看我的编辑。这能解决你的问题吗？是的，我认为有效。我想我试过了，但在评论之前没有效果，但我一定是打字错误。由于某种原因，09:30已经接近10:00了。在10:00到15:59这段时间里才恢复为真。我还试着在你的答案中编辑15:39到15:59，但编辑必须至少6个字符

df=df.reset_index()    
df['rth'] = pd.to_datetime(df['Datetime']).dt.time.between(datetime.time(9,30),datetime.time(15,59)).astype(int)