正在寻找一种方法,如果使用Python在两个日期之间使用datetime,则按datetime进行分组
我正在尝试使用Pandas(Python)执行以下操作 我有一个包含以下列的数据框: 建筑、门颜色、门打开时间、门关闭时间、门打开宽度 我试图按日期和时间对数据进行分组,这样每秒钟我都会计算开门的数量和开门的宽度之和 例如:正在寻找一种方法,如果使用Python在两个日期之间使用datetime,则按datetime进行分组,python,pandas,date,datetime,Python,Pandas,Date,Datetime,我正在尝试使用Pandas(Python)执行以下操作 我有一个包含以下列的数据框: 建筑、门颜色、门打开时间、门关闭时间、门打开宽度 我试图按日期和时间对数据进行分组,这样每秒钟我都会计算开门的数量和开门的宽度之和 例如: Data: Building, Door_Color, Door_Time_Open, Door_Time_Close, Opening_Width A , Red , 2000-01-01 00:00:00, 2000-01-01 00:00:05, 10 A , Red
Data:
Building, Door_Color, Door_Time_Open, Door_Time_Close, Opening_Width
A , Red , 2000-01-01 00:00:00, 2000-01-01 00:00:05, 10
A , Red , 2000-01-01 00:00:02, 2000-01-01 00:00:04, 5
Result:
Date, Building, Door_Color, Door_Count, Sum_Opening_Width
2000-01-01 00:00:00, A, Red, 1 , 10
2000-01-01 00:00:01, A, Red, 1 , 10
2000-01-01 00:00:02, A, Red, 2 , 15
2000-01-01 00:00:03, A, Red, 2 , 15
2000-01-01 00:00:04, A, Red, 2 , 15
2000-01-01 00:00:05, A, Red, 1 , 10
2000-01-01 00:00:06, A, Red, 0 , 0
我知道如何按多个列进行常规分组,并分别聚合不同的列,但我不知道如何让机器检查我们分组的日期是否在数据中的两个日期之间
任何帮助都将不胜感激
edit1:数据有点大,大约600万行。如果数据不太大(覆盖很长一段时间),可以进行交叉合并:
times = pd.DataFrame({'Date':pd.date_range(df['Door_Time_Open'].min(),
df['Door_Time_Close'].max(), freq='s'),
'dummy':1
})
(df.assign(dummy=1)
.merge(times, on='dummy')
.query('Door_Time_Open<=Date<=Door_Time_Close')
.groupby(['Date','Building','Door_Color'])
['Opening_Width'].agg(['count','sum'])
.reset_index()
)
处理每行的时间,然后分组
def news(r):
df1 = pd.DataFrame()
df1['Date'] = pd.date_range(r['Door_Time_Open'],r['Door_Time_Close'],freq='s')
for idx in ['Building','Door_Color','Opening_Width']:
df1[idx] = r[idx]
return df1
df['Door_Time_Open'] = pd.to_datetime(df['Door_Time_Open'])
df['Door_Time_Close'] = pd.to_datetime(df['Door_Time_Close'])
df_list = []
for idx,row in df.iterrows():
df_list.append(news(row))
data = pd.concat(df_list).groupby(['Date','Building','Door_Color'])['Opening_Width'].agg(['count','sum'])
print(data)
32 GB的RAM不足以在少于1GB的数据上运行此答案。它拒绝运行。刚刚给了我一个“回忆者之罗”。不过我很感激你的努力!
def news(r):
df1 = pd.DataFrame()
df1['Date'] = pd.date_range(r['Door_Time_Open'],r['Door_Time_Close'],freq='s')
for idx in ['Building','Door_Color','Opening_Width']:
df1[idx] = r[idx]
return df1
df['Door_Time_Open'] = pd.to_datetime(df['Door_Time_Open'])
df['Door_Time_Close'] = pd.to_datetime(df['Door_Time_Close'])
df_list = []
for idx,row in df.iterrows():
df_list.append(news(row))
data = pd.concat(df_list).groupby(['Date','Building','Door_Color'])['Opening_Width'].agg(['count','sum'])
print(data)