在Python中的for循环中追加多个时间序列数据

在Python中的for循环中追加多个时间序列数据,python,pandas,numpy,for-loop,time-series,Python,Pandas,Numpy,For Loop,Time Series,我想在下面的时间序列数据框td1中循环查找一些活动的起点和终点 活动的定义: Type列由medium和low值组成(您可以将medium和low视为两个独立的时间序列)。对于相同的X,如果a将1转换为Type的任一值(例如,对于X==18,a变为1,而Type==medium或a变为1而Type==low),则标志着活动的开始,我想在此时间戳处分别记下Id和Timestamp,作为Start\u Id和StartTime 一旦活动开始,它就处于持续状态。活动正在进行时,如果a为两个类型值(即中

我想在下面的时间序列数据框
td1
中循环查找一些活动的起点和终点

活动的定义:
Type
列由
medium
low
值组成(您可以将
medium
low
视为两个独立的时间序列)。对于相同的
X
,如果
a
1
转换为
Type
的任一值(例如,对于
X==18
a
变为
1
,而
Type==medium
a
变为
1
Type==low
),则标志着活动的开始,我想在此时间戳处分别记下
Id
Timestamp
,作为
Start\u Id
StartTime

一旦活动开始,它就处于持续状态。活动正在进行时,如果
a
两个
类型
值(即
中值
低值
)旋转
0
,则表示活动结束(例如,对于
X==18
a
变为
0
,而
Type==medium
a
变为
0
,而
Type==low
,跟随时间序列)。我想把
Id
Timestamp
分别记为
End\u Id
EndTime

最后,在以下情况下,收集每个活动期间的所有
b
值:

  • 类型==中等
    ;以及
  • a==1
    放入名为
    list\u容器的列表中
  • td1:

        Timestamp                X  Y   a   b       Type    Id
    0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium  18  
    1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low     18  Start
    2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium  18  ---- collect `b` value in `list_container` 1
    3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium  18  ---- collect `b` value in `list_container` 1
    4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium  18
    5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium  18
    6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium  18  ---- collect `b` value in `list_container` 1
    7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium  18  ---- collect `b` value in `list_container` 1
    8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium  18  ---- collect `b` value in `list_container` 1
    9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low     18
    10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium  18  ---- collect `b` value in `list_container` 1
    11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium  18
    12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low     18  End
    13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium  18
    14  2000-10-26 17:29:34.770 19  14  0.0 62.0    medium  19
    15  2000-10-26 17:29:40.770 19  14  1.0 61.0    medium  19  Start
    16  2000-10-26 17:29:46.770 19  14  1.0 60.0    medium  19  ---- collect `b` value in `list_container` 2
    17  2000-10-26 17:32:01.180 19  13  1.0 25.0    low     19
    18  2000-10-26 17:32:01.180 19  14  0.0 51.5    low     19
    19  2000-10-26 17:32:35.180 19  13  0.0 50.0    medium  19  End
    
    可复制示例:

    td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
      1: Timestamp('2000-10-26 10:39:24.310000'),
      2: Timestamp('2000-10-26 11:50:48.190000'),
      3: Timestamp('2000-10-26 17:18:07.610000'),
      4: Timestamp('2000-10-26 17:18:09.610000'),
      5: Timestamp('2000-10-26 17:29:10.610000'),
      6: Timestamp('2000-10-26 17:29:10.770000'),
      7: Timestamp('2000-10-26 17:29:12.610000'),
      8: Timestamp('2000-10-26 17:29:14.610000'),
      9: Timestamp('2000-10-26 17:29:14.770000'),
      10: Timestamp('2000-10-26 17:29:16.610000'),
      11: Timestamp('2000-10-26 17:29:18.770000'),
      12: Timestamp('2000-10-26 17:29:18.770000'),
      13: Timestamp('2000-10-26 17:29:28.770000'),
      14: Timestamp('2000-10-26 17:29:34.770000'),
      15: Timestamp('2000-10-26 17:29:40.770000'),
      16: Timestamp('2000-10-26 17:29:46.770000'),
      17: Timestamp('2000-10-26 17:32:01.180000'),
      18: Timestamp('2000-10-26 17:32:01.180000'),
      19: Timestamp('2000-10-26 17:32:35.180000')},
     'X': {0: 18,
      1: 18,
      2: 18,
      3: 18,
      4: 18,
      5: 18,
      6: 18,
      7: 18,
      8: 18,
      9: 18,
      10: 18,
      11: 18,
      12: 18,
      13: 18,
      14: 19,
      15: 19,
      16: 19,
      17: 19,
      18: 19,
      19: 19},
     'Y': {0: 14,
      1: 13,
      2: 14,
      3: 14,
      4: 14,
      5: 14,
      6: 14,
      7: 14,
      8: 14,
      9: 13,
      10: 14,
      11: 14,
      12: 13,
      13: 14,
      14: 14,
      15: 14,
      16: 14,
      17: 13,
      18: 14,
      19: 13},
     'a': {0: 0.0,
      1: 1.0,
      2: 1.0,
      3: 1.0,
      4: 0.0,
      5: 0.0,
      6: 1.0,
      7: 1.0,
      8: 1.0,
      9: 1.0,
      10: 1.0,
      11: 0.0,
      12: 0.0,
      13: 0.0,
      14: 0.0,
      15: 1.0,
      16: 1.0,
      17: 1.0,
      18: 0.0,
      19: 0.0},
     'b': {0: 24.5,
      1: 24.0,
      2: 23.5,
      3: 23.5,
      4: 23.5,
      5: 26.5,
      6: 26.5,
      7: 53.5,
      8: 62.0,
      9: 24.0,
      10: 64.5,
      11: 64.5,
      12: 24.0,
      13: 63.5,
      14: 62.0,
      15: 61.0,
      16: 60.0,
      17: 25.0,
      18: 51.5,
      19: 50.0},
     'Type': {0: 'medium',
      1: 'low',
      2: 'medium',
      3: 'medium',
      4: 'medium',
      5: 'medium',
      6: 'medium',
      7: 'medium',
      8: 'medium',
      9: 'low',
      10: 'medium',
      11: 'medium',
      12: 'low',
      13: 'medium',
      14: 'medium',
      15: 'medium',
      16: 'medium',
      17: 'low',
      18: 'low',
      19: 'medium'},
     'Id': {0: 18,
      1: 18,
      2: 18,
      3: 18,
      4: 18,
      5: 18,
      6: 18,
      7: 18,
      8: 18,
      9: 18,
      10: 18,
      11: 18,
      12: 18,
      13: 18,
      14: 19,
      15: 19,
      16: 19,
      17: 19,
      18: 19,
      19: 19}})
    
    td1
    
    预期产出:

    Start_Id  StartTime                End_Id      EndTime                  list_container
    18        2000-10-26 10:39:24.310  18          2000-10-26 17:29:18.770  [23.5, 23.5, 26.5, 53.5, 62.0, 64.5]
    19        2000-10-26 17:29:40.770  19          2000-10-26 17:32:35.180  [60.0]
    
    通过分析每次迭代前后状态的可能组合,我尝试了以下for循环:

    def combined_func(td1):
    
        td1['Timestamp'] = pd.to_datetime(td1['Timestamp'])
        td1 = td1.sort_values(by=['Id','Timestamp'])
        td1 = td1.reset_index(drop=True)
    
        low_on = 0     # Flag to indicate status of low
        medium_on = 0  # Flag to indicate status of medium
        my_list = []
        container_list = []
        data = []
        time_start = None
        start_Id = None
        time_end = None  
        end_Id = None  
    
        for i in range(1, len(td1.index)-1):
    
            if  (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']): 
                
                if ((not low_on) & (not medium_on)):
                    if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                        b13 = td1.loc[i, 'b']
                        my_list.append(b13)
                        medium_on = 1
    
                        time_start = td1.loc[i, 'Timestamp']
                        start_Id =  td1.loc[i, 'Id']
                        print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}")
    
                    elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
    
                        time_start = td1.loc[i, 'Timestamp']
                        start_Id =  td1.loc[i, 'Id']
    
                        print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}')
                        low_on = 1
    
                    else:
                        continue
    
                elif ((not low_on) & (medium_on)):     
                    if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                        b5 = td1.loc[i, 'b']
                        my_list.append(b5)
    
                    if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
                        low_on = 1
    
                    if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
                        b7 = td1.loc[i, 'b']
    
                        my_list.append(b7)
                        list_container = my_list
                        my_list = []
                        medium_on = 0
    
                        time_end = td1.loc[i, 'Timestamp']
                        end_Id =  td1.loc[i, 'Id']
                        
                        print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}")
    
                    else:
                        continue
    
                elif ((low_on) & (not medium_on)):
                    if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                        b11 = td1.loc[i, 'b']
                        my_list.append(b11)
                        medium_on = 1
            
                    if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
                        
                        time_end = td1.loc[i, 'Timestamp']
                        end_Id =  td1.loc[i, 'Id']
                        
                        low_on = 0
                        print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}")
    
                    else:
                        continue
    
                elif ((low_on) & (medium_on)):
    
                    if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'):
                        b1 = td1.loc[i, 'b']
                        my_list.append(b1)
    
                    if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
                        low_on = 0
    
                    if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
                        b3 = td1.loc[i, 'b']
                        my_list.append(b3)
                        list_container = my_list
                        my_list = []
                        medium_on = 0
    
                    else:
                        continue
    
                    data.append([start_Id, time_start, end_Id, time_end,  list_container])
    
                else:
                    continue
            else:
                continue
    
        data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime",  "End_Id", "EndTime", "list_container"])
        
        return data_table1
    
    output = combined_func(td1)
    output
    
    它返回:

    This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000
    This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is []
    This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000
    
        Start_Id    StartTime                   End_Id  EndTime list_container
    0   18          2000-10-26 10:39:24.310     None    None    [23.5, 23.5, 23.5]
    1   18          2000-10-26 10:39:24.310     None    None    [26.5, 53.5, 62.0, 64.5, 64.5]
    

    不知何故,
    End\u Id
    EndTime
    丢失,列表容器值也关闭。我不确定哪些步骤出错。非常感谢您的任何建议。

    我找不到比按
    X
    分组并根据您的描述为每个返回值创建特定逻辑更好的方法开

    def times(df):
        
        start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0]
        end_time = pd.NaT
        
        if(df.loc[df.a == 0, 'Type'].nunique() == 2):
            end_time = (
                df.loc[df.a == 0, ['Timestamp', 'Type']]
                .drop_duplicates('Type', keep='last')
                .Timestamp
                .iloc[-1]
            )
            
        if (pd.notnull([start_time, end_time]).all()):
            temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)]
            start_id, end_id = temp.Id.iloc[[0, -1]].to_list()
            list_container = temp[temp.a == 1].b.to_list()
            
            return pd.Series({
                'Start_Id': start_id,
                'StartTime': start_time,
                'End_Id': end_id,
                'EndTime': end_time,
                'list_container': list_container
            })
        
    results = td1.groupby('X').apply(times)
    results
    
    #       Start_Id  StartTime                 End_Id  EndTime                 list_container
    # X                 
    # 18    18        2000-10-26 10:39:24.310   18      2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
    # 19    19        2000-10-26 17:29:40.770   19      2000-10-26 17:32:35.180 [60.0, 25.0]
    
    def时间(df):
    开始时间=df.loc[df.a==1,'时间戳'].iloc[0]
    结束时间=pd.NaT
    如果(df.loc[df.a==0,'键入'].nunique()==2):
    结束时间=(
    df.loc[df.a==0,['Timestamp','Type']]
    .删除重复项('Type',keep='last')
    .时间戳
    .iloc[-1]
    )
    if(pd.notnull([开始时间,结束时间]).all():
    temp=df[(df.Timestamp>开始时间)和(df.Timestamp<结束时间)]
    开始\u id,结束\u id=temp.id.iloc[[0,-1]]。到\u列表()
    list_container=temp[temp.a==1].b.to_list()
    返回pd系列({
    “开始Id”:开始Id,
    “开始时间”:开始时间,
    “结束Id”:结束Id,
    “结束时间”:结束时间,
    “列表容器”:列表容器
    })
    结果=td1.groupby('X')。应用(次)
    结果
    #开始\u Id开始时间结束\u Id结束时间列表\u容器
    #X
    # 18    18        2000-10-26 10:39:24.310   18      2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
    # 19    19        2000-10-26 17:29:40.770   19      2000-10-26 17:32:35.180 [60.0, 25.0]