在Python中的for循环中追加多个时间序列数据_Python_Pandas_Numpy_For Loop_Time Series

在Python中的for循环中追加多个时间序列数据

python pandas numpy for-loop

在Python中的for循环中追加多个时间序列数据,python,pandas,numpy,for-loop,time-series,Python,Pandas,Numpy,For Loop,Time Series,我想在下面的时间序列数据框td1中循环查找一些活动的起点和终点活动的定义： Type列由medium和low值组成（您可以将medium和low视为两个独立的时间序列）。对于相同的X，如果a将1转换为Type的任一值（例如，对于X==18，a变为1，而Type==medium或a变为1而Type==low），则标志着活动的开始，我想在此时间戳处分别记下Id和Timestamp，作为Start\u Id和StartTime 一旦活动开始，它就处于持续状态。活动正在进行时，如果a为两个类型值（即中

我想在下面的时间序列数据框

td1

中循环查找一些活动的起点和终点

活动的定义：

Type

列由

medium

和

low

值组成（您可以将

medium

和

low

视为两个独立的时间序列）。对于相同的

，如果

将

转换为

Type

的任一值（例如，对于

X==18

，

变为

，而

Type==medium

或

变为

而

Type==low

），则标志着活动的开始，我想在此时间戳处分别记下

Id

和

Timestamp

，作为

Start\u Id

和

StartTime

一旦活动开始，它就处于持续状态。活动正在进行时，如果

为两个
类型值（即中值和低值）旋转0 ，则表示活动结束（例如，对于X==18 ，a 变为0 ，而Type==medium 和a 变为0 ，而Type==low ，跟随时间序列）。我想把Id 和Timestamp 分别记为End\u Id 和EndTime 最后，在以下情况下，收集每个活动期间的所有b 值：类型==中等；以及 a==1 放入名为list\u容器的列表中 td1: Timestamp X Y a b Type Id 0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium 18 1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low 18 Start 2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1 3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1 4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium 18 5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium 18 6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium 18 ---- collect `b` value in `list_container` 1 7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium 18 ---- collect `b` value in `list_container` 1 8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium 18 ---- collect `b` value in `list_container` 1 9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low 18 10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium 18 ---- collect `b` value in `list_container` 1 11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium 18 12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low 18 End 13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium 18 14 2000-10-26 17:29:34.770 19 14 0.0 62.0 medium 19 15 2000-10-26 17:29:40.770 19 14 1.0 61.0 medium 19 Start 16 2000-10-26 17:29:46.770 19 14 1.0 60.0 medium 19 ---- collect `b` value in `list_container` 2 17 2000-10-26 17:32:01.180 19 13 1.0 25.0 low 19 18 2000-10-26 17:32:01.180 19 14 0.0 51.5 low 19 19 2000-10-26 17:32:35.180 19 13 0.0 50.0 medium 19 End 可复制示例： td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'), 1: Timestamp('2000-10-26 10:39:24.310000'), 2: Timestamp('2000-10-26 11:50:48.190000'), 3: Timestamp('2000-10-26 17:18:07.610000'), 4: Timestamp('2000-10-26 17:18:09.610000'), 5: Timestamp('2000-10-26 17:29:10.610000'), 6: Timestamp('2000-10-26 17:29:10.770000'), 7: Timestamp('2000-10-26 17:29:12.610000'), 8: Timestamp('2000-10-26 17:29:14.610000'), 9: Timestamp('2000-10-26 17:29:14.770000'), 10: Timestamp('2000-10-26 17:29:16.610000'), 11: Timestamp('2000-10-26 17:29:18.770000'), 12: Timestamp('2000-10-26 17:29:18.770000'), 13: Timestamp('2000-10-26 17:29:28.770000'), 14: Timestamp('2000-10-26 17:29:34.770000'), 15: Timestamp('2000-10-26 17:29:40.770000'), 16: Timestamp('2000-10-26 17:29:46.770000'), 17: Timestamp('2000-10-26 17:32:01.180000'), 18: Timestamp('2000-10-26 17:32:01.180000'), 19: Timestamp('2000-10-26 17:32:35.180000')}, 'X': {0: 18, 1: 18, 2: 18, 3: 18, 4: 18, 5: 18, 6: 18, 7: 18, 8: 18, 9: 18, 10: 18, 11: 18, 12: 18, 13: 18, 14: 19, 15: 19, 16: 19, 17: 19, 18: 19, 19: 19}, 'Y': {0: 14, 1: 13, 2: 14, 3: 14, 4: 14, 5: 14, 6: 14, 7: 14, 8: 14, 9: 13, 10: 14, 11: 14, 12: 13, 13: 14, 14: 14, 15: 14, 16: 14, 17: 13, 18: 14, 19: 13}, 'a': {0: 0.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 0.0, 5: 0.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 0.0, 12: 0.0, 13: 0.0, 14: 0.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 0.0, 19: 0.0}, 'b': {0: 24.5, 1: 24.0, 2: 23.5, 3: 23.5, 4: 23.5, 5: 26.5, 6: 26.5, 7: 53.5, 8: 62.0, 9: 24.0, 10: 64.5, 11: 64.5, 12: 24.0, 13: 63.5, 14: 62.0, 15: 61.0, 16: 60.0, 17: 25.0, 18: 51.5, 19: 50.0}, 'Type': {0: 'medium', 1: 'low', 2: 'medium', 3: 'medium', 4: 'medium', 5: 'medium', 6: 'medium', 7: 'medium', 8: 'medium', 9: 'low', 10: 'medium', 11: 'medium', 12: 'low', 13: 'medium', 14: 'medium', 15: 'medium', 16: 'medium', 17: 'low', 18: 'low', 19: 'medium'}, 'Id': {0: 18, 1: 18, 2: 18, 3: 18, 4: 18, 5: 18, 6: 18, 7: 18, 8: 18, 9: 18, 10: 18, 11: 18, 12: 18, 13: 18, 14: 19, 15: 19, 16: 19, 17: 19, 18: 19, 19: 19}}) td1 预期产出： Start_Id StartTime End_Id EndTime list_container 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:18.770 [23.5, 23.5, 26.5, 53.5, 62.0, 64.5] 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0] 通过分析每次迭代前后状态的可能组合，我尝试了以下for循环： def combined_func(td1): td1['Timestamp'] = pd.to_datetime(td1['Timestamp']) td1 = td1.sort_values(by=['Id','Timestamp']) td1 = td1.reset_index(drop=True) low_on = 0 # Flag to indicate status of low medium_on = 0 # Flag to indicate status of medium my_list = [] container_list = [] data = [] time_start = None start_Id = None time_end = None end_Id = None for i in range(1, len(td1.index)-1): if (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']): if ((not low_on) & (not medium_on)): if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')): b13 = td1.loc[i, 'b'] my_list.append(b13) medium_on = 1 time_start = td1.loc[i, 'Timestamp'] start_Id = td1.loc[i, 'Id'] print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}") elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')): time_start = td1.loc[i, 'Timestamp'] start_Id = td1.loc[i, 'Id'] print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}') low_on = 1 else: continue elif ((not low_on) & (medium_on)): if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')): b5 = td1.loc[i, 'b'] my_list.append(b5) if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')): low_on = 1 if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')): b7 = td1.loc[i, 'b'] my_list.append(b7) list_container = my_list my_list = [] medium_on = 0 time_end = td1.loc[i, 'Timestamp'] end_Id = td1.loc[i, 'Id'] print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}") else: continue elif ((low_on) & (not medium_on)): if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')): b11 = td1.loc[i, 'b'] my_list.append(b11) medium_on = 1 if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')): time_end = td1.loc[i, 'Timestamp'] end_Id = td1.loc[i, 'Id'] low_on = 0 print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}") else: continue elif ((low_on) & (medium_on)): if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'): b1 = td1.loc[i, 'b'] my_list.append(b1) if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')): low_on = 0 if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')): b3 = td1.loc[i, 'b'] my_list.append(b3) list_container = my_list my_list = [] medium_on = 0 else: continue data.append([start_Id, time_start, end_Id, time_end, list_container]) else: continue else: continue data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime", "End_Id", "EndTime", "list_container"]) return data_table1 output = combined_func(td1) output 它返回： This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000 This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is [] This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000 Start_Id StartTime End_Id EndTime list_container 0 18 2000-10-26 10:39:24.310 None None [23.5, 23.5, 23.5] 1 18 2000-10-26 10:39:24.310 None None [26.5, 53.5, 62.0, 64.5, 64.5] 不知何故，End\u Id 和EndTime 丢失，列表容器值也关闭。我不确定哪些步骤出错。非常感谢您的任何建议。我找不到比按X 分组并根据您的描述为每个返回值创建特定逻辑更好的方法开 def times(df): start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0] end_time = pd.NaT if(df.loc[df.a == 0, 'Type'].nunique() == 2): end_time = ( df.loc[df.a == 0, ['Timestamp', 'Type']] .drop_duplicates('Type', keep='last') .Timestamp .iloc[-1] ) if (pd.notnull([start_time, end_time]).all()): temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)] start_id, end_id = temp.Id.iloc[[0, -1]].to_list() list_container = temp[temp.a == 1].b.to_list() return pd.Series({ 'Start_Id': start_id, 'StartTime': start_time, 'End_Id': end_id, 'EndTime': end_time, 'list_container': list_container }) results = td1.groupby('X').apply(times) results # Start_Id StartTime End_Id EndTime list_container # X # 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5] # 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0] def时间（df）：开始时间=df.loc[df.a==1，'时间戳'].iloc[0] 结束时间=pd.NaT 如果（df.loc[df.a==0，'键入'].nunique（）==2）：结束时间=( df.loc[df.a==0，['Timestamp'，'Type']] .删除重复项（'Type'，keep='last'） .时间戳 .iloc[-1] ) if（pd.notnull（[开始时间，结束时间]）.all（）： temp=df[（df.Timestamp>开始时间）和（df.Timestamp<结束时间）] 开始\u id，结束\u id=temp.id.iloc[[0，-1]]。到\u列表（） list_container=temp[temp.a==1].b.to_list（）返回pd系列({ “开始Id”：开始Id， “开始时间”：开始时间， “结束Id”：结束Id， “结束时间”：结束时间， “列表容器”：列表容器 }) 结果=td1.groupby（'X'）。应用（次）结果 #开始\u Id开始时间结束\u Id结束时间列表\u容器 #X # 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5] # 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0]