在Python中的for循环中追加多个时间序列数据
我想在下面的时间序列数据框在Python中的for循环中追加多个时间序列数据,python,pandas,numpy,for-loop,time-series,Python,Pandas,Numpy,For Loop,Time Series,我想在下面的时间序列数据框td1中循环查找一些活动的起点和终点 活动的定义: Type列由medium和low值组成(您可以将medium和low视为两个独立的时间序列)。对于相同的X,如果a将1转换为Type的任一值(例如,对于X==18,a变为1,而Type==medium或a变为1而Type==low),则标志着活动的开始,我想在此时间戳处分别记下Id和Timestamp,作为Start\u Id和StartTime 一旦活动开始,它就处于持续状态。活动正在进行时,如果a为两个类型值(即中
td1
中循环查找一些活动的起点和终点
活动的定义:
Type
列由medium
和low
值组成(您可以将medium
和low
视为两个独立的时间序列)。对于相同的X
,如果a
将1
转换为Type
的任一值(例如,对于X==18
,a
变为1
,而Type==medium
或a
变为1
而Type==low
),则标志着活动的开始,我想在此时间戳处分别记下Id
和Timestamp
,作为Start\u Id
和StartTime
一旦活动开始,它就处于持续状态。活动正在进行时,如果a
为两个类型值(即中值和低值)旋转0
,则表示活动结束(例如,对于X==18
,a
变为0
,而Type==medium
和a
变为0
,而Type==low
,跟随时间序列)。我想把Id
和Timestamp
分别记为End\u Id
和EndTime
最后,在以下情况下,收集每个活动期间的所有b
值:
类型==中等
;以及
a==1
放入名为list\u容器的列表中
td1:
Timestamp X Y a b Type Id
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium 18
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low 18 Start
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium 18
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium 18
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium 18 ---- collect `b` value in `list_container` 1
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium 18 ---- collect `b` value in `list_container` 1
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium 18 ---- collect `b` value in `list_container` 1
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low 18
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium 18 ---- collect `b` value in `list_container` 1
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium 18
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low 18 End
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium 18
14 2000-10-26 17:29:34.770 19 14 0.0 62.0 medium 19
15 2000-10-26 17:29:40.770 19 14 1.0 61.0 medium 19 Start
16 2000-10-26 17:29:46.770 19 14 1.0 60.0 medium 19 ---- collect `b` value in `list_container` 2
17 2000-10-26 17:32:01.180 19 13 1.0 25.0 low 19
18 2000-10-26 17:32:01.180 19 14 0.0 51.5 low 19
19 2000-10-26 17:32:35.180 19 13 0.0 50.0 medium 19 End
可复制示例:
td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'low',
19: 'medium'},
'Id': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19}})
td1
预期产出:
Start_Id StartTime End_Id EndTime list_container
18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:18.770 [23.5, 23.5, 26.5, 53.5, 62.0, 64.5]
19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0]
通过分析每次迭代前后状态的可能组合,我尝试了以下for循环:
def combined_func(td1):
td1['Timestamp'] = pd.to_datetime(td1['Timestamp'])
td1 = td1.sort_values(by=['Id','Timestamp'])
td1 = td1.reset_index(drop=True)
low_on = 0 # Flag to indicate status of low
medium_on = 0 # Flag to indicate status of medium
my_list = []
container_list = []
data = []
time_start = None
start_Id = None
time_end = None
end_Id = None
for i in range(1, len(td1.index)-1):
if (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']):
if ((not low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b13 = td1.loc[i, 'b']
my_list.append(b13)
medium_on = 1
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}")
elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}')
low_on = 1
else:
continue
elif ((not low_on) & (medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b5 = td1.loc[i, 'b']
my_list.append(b5)
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
low_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b7 = td1.loc[i, 'b']
my_list.append(b7)
list_container = my_list
my_list = []
medium_on = 0
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}")
else:
continue
elif ((low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b11 = td1.loc[i, 'b']
my_list.append(b11)
medium_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
low_on = 0
print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}")
else:
continue
elif ((low_on) & (medium_on)):
if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'):
b1 = td1.loc[i, 'b']
my_list.append(b1)
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
low_on = 0
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b3 = td1.loc[i, 'b']
my_list.append(b3)
list_container = my_list
my_list = []
medium_on = 0
else:
continue
data.append([start_Id, time_start, end_Id, time_end, list_container])
else:
continue
else:
continue
data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime", "End_Id", "EndTime", "list_container"])
return data_table1
output = combined_func(td1)
output
它返回:
This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000
This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is []
This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000
Start_Id StartTime End_Id EndTime list_container
0 18 2000-10-26 10:39:24.310 None None [23.5, 23.5, 23.5]
1 18 2000-10-26 10:39:24.310 None None [26.5, 53.5, 62.0, 64.5, 64.5]
不知何故,End\u Id
和EndTime
丢失,列表容器值也关闭。我不确定哪些步骤出错。非常感谢您的任何建议。我找不到比按X
分组并根据您的描述为每个返回值创建特定逻辑更好的方法开
def times(df):
start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0]
end_time = pd.NaT
if(df.loc[df.a == 0, 'Type'].nunique() == 2):
end_time = (
df.loc[df.a == 0, ['Timestamp', 'Type']]
.drop_duplicates('Type', keep='last')
.Timestamp
.iloc[-1]
)
if (pd.notnull([start_time, end_time]).all()):
temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)]
start_id, end_id = temp.Id.iloc[[0, -1]].to_list()
list_container = temp[temp.a == 1].b.to_list()
return pd.Series({
'Start_Id': start_id,
'StartTime': start_time,
'End_Id': end_id,
'EndTime': end_time,
'list_container': list_container
})
results = td1.groupby('X').apply(times)
results
# Start_Id StartTime End_Id EndTime list_container
# X
# 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
# 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0]
def时间(df):
开始时间=df.loc[df.a==1,'时间戳'].iloc[0]
结束时间=pd.NaT
如果(df.loc[df.a==0,'键入'].nunique()==2):
结束时间=(
df.loc[df.a==0,['Timestamp','Type']]
.删除重复项('Type',keep='last')
.时间戳
.iloc[-1]
)
if(pd.notnull([开始时间,结束时间]).all():
temp=df[(df.Timestamp>开始时间)和(df.Timestamp<结束时间)]
开始\u id,结束\u id=temp.id.iloc[[0,-1]]。到\u列表()
list_container=temp[temp.a==1].b.to_list()
返回pd系列({
“开始Id”:开始Id,
“开始时间”:开始时间,
“结束Id”:结束Id,
“结束时间”:结束时间,
“列表容器”:列表容器
})
结果=td1.groupby('X')。应用(次)
结果
#开始\u Id开始时间结束\u Id结束时间列表\u容器
#X
# 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
# 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0]