Pandas 按日期-时间列的日级分组聚合
我有一个如下所示的数据帧。这是一个医生预约的数据Pandas 按日期-时间列的日级分组聚合,pandas,pandas-groupby,Pandas,Pandas Groupby,我有一个如下所示的数据帧。这是一个医生预约的数据 Doctor Appointment Show A 2020-01-18 12:00:00 Yes A 2020-01-18 12:30:00 Yes A 2020-01-18 13:00:00 No A 2020-01-18 13:30:00 Yes B 20
Doctor Appointment Show
A 2020-01-18 12:00:00 Yes
A 2020-01-18 12:30:00 Yes
A 2020-01-18 13:00:00 No
A 2020-01-18 13:30:00 Yes
B 2020-01-18 12:00:00 Yes
B 2020-01-18 12:30:00 Yes
B 2020-01-18 13:00:00 No
B 2020-01-18 13:30:00 Yes
B 2020-01-18 16:00:00 No
B 2020-01-18 16:30:00 Yes
A 2020-01-19 12:00:00 Yes
A 2020-01-19 12:30:00 Yes
A 2020-01-19 13:00:00 No
A 2020-01-19 13:30:00 Yes
A 2020-01-19 14:00:00 Yes
A 2020-01-19 14:30:00 No
A 2020-01-19 16:00:00 No
A 2020-01-19 16:30:00 Yes
B 2020-01-19 12:00:00 Yes
B 2020-01-19 12:30:00 Yes
B 2020-01-19 13:00:00 No
B 2020-01-19 13:30:00 Yes
B 2020-01-19 14:00:00 No
B 2020-01-19 14:30:00 Yes
B 2020-01-19 15:00:00 No
B 2020-01-18 15:30:00 Yes
从上面的数据框中,我想在pandas中创建一个函数,它将输出以下内容
我在下面试过了
def Doctor_date_summary(doctor, date):
Number of slots = df.groupby([doctor, date] ).sum()
预期产出:
Doctor_date_summary(Doctor, date)
If Doctor = A, date = 2020-01-19
Number of slots = 8
Number of show up = 5
show up percentage = 62.5
如果该日期的“显示”列中的“是”数=5,则您可以首先从以下位置创建“日期”列: 然后可以使用布尔索引:
def Doctor_date_summary(Doctor, date):
number_of_show_up = np.sum((df['Doctor']==Doctor) & (df['day']==date) & (df['Show']=='Yes'))
number_of_slots = np.sum((df['Doctor']==Doctor) & (df['day']==date))
return number_of_show_up, number_of_slots, 100*number_of_show_up/number_of_slots
最后:
number_of_show_up, number_of_slots, percentage = Doctor_date_summary('A', '2020-01-19')
print("Number of slots = {}".format(number_of_slots))
print("Number of show up = {}".format(number_of_show_up))
print("show up percentage = {:.1f}".format(percentage))
Number of slots = 8
Number of show up = 5
show up percentage = 62.5
您可以在函数中分别创建每个掩码,然后按
&
按位和
和求和
按计数链:
df['Appointment'] = pd.to_datetime(df['Appointment'])
def Doctor_date_summary(doctor, date):
m1 = df['Doctor'] == doctor
m2 = df['Appointment'].dt.normalize() == date
m3 = df['Show'] == 'Yes'
show_up = (m1 & m2 & m3).sum()
no = (m1 & m2).sum()
return show_up, no
up, no = Doctor_date_summary('A', '2020-01-19')
最后用于输出的是f-string
s:
print(f"Number of slots = {up}")
print(f"Number of show up = {no}")
print(f"show up percentage = {up/no*100}")
Number of slots = 5
Number of show up = 8
show up percentage = 62.5
一个问题-你是否需要像我的问题中那样计算所有数据,然后按日期和医生查看LCT?或者只需要选择一些值并像另一个问题一样计数?只需要选择一些值并像另一个问题一样计数。不需要全部,只需要选择一些
print(f"Number of slots = {up}")
print(f"Number of show up = {no}")
print(f"show up percentage = {up/no*100}")
Number of slots = 5
Number of show up = 8
show up percentage = 62.5