Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/batch-file/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何根据时差标准在pandas中插入新行_Pandas_Datetime_Duplicates_Conditional Statements_Criteria - Fatal编程技术网

如何根据时差标准在pandas中插入新行

如何根据时差标准在pandas中插入新行,pandas,datetime,duplicates,conditional-statements,criteria,Pandas,Datetime,Duplicates,Conditional Statements,Criteria,我有以下数据帧: Matricule Startdate Starthour Enddate Endhour 0 5357 2019-01-08 14:21:06 2019-01-08 14:34:42 1 5357 2019-01-08 15:29:23 2019-01-08 15:33:43 2 5357 2019-01-08 19:51:11 2019-01-08 20:02:48 3 5357 20

我有以下数据帧:

  Matricule Startdate   Starthour   Enddate     Endhour
0   5357    2019-01-08  14:21:06    2019-01-08  14:34:42
1   5357    2019-01-08  15:29:23    2019-01-08  15:33:43
2   5357    2019-01-08  19:51:11    2019-01-08  20:02:48
3   5357    2019-03-08  20:05:49    2019-03-08  21:04:52
4   aaaa    2019-01-08  14:17:51    2019-01-08  14:32:10
5   aaaa    2019-01-08  18:21:16    2019-01-08  18:39:26
我试图制作一个表格,在每个新线之间插入,这是基于1号线的到达时间与2号线的出发时间之差大于30分钟的条件。 要插入的行与上一行具有相同的属性。以下是一个例子:

     Matricule  Startdate   Starthour   Enddate     Endhour
    0   5357    2019-01-08  14:21:06    2019-01-08  14:34:42
    1   5357    2019-01-08  14:34:42    2019-01-08  15:04:42
    2   5357    2019-01-08  15:29:23    2019-01-08  15:33:43
    3   5357    2019-01-08  15:33:43    2019-01-08  16:03:43
    4   5357    2019-01-08  19:51:11    2019-01-08  20:02:48
    5   5357    2019-03-08  20:05:49    2019-03-08  21:04:52
    6   aaaa    2019-01-08  14:17:51    2019-01-08  14:32:10
    7   aaaa    2019-01-08  14:32:10    2019-01-08  15:02:10
    8   aaaa    2019-01-08  18:21:16    2019-01-08  18:39:26

首先,我以日期和时间作为统一对象创建了新列:

df['start'] = df['Startdate'].astype(str) + " " + df['Starthour'].astype(str)
df['start'] = pd.to_datetime(df['start'])
df['end'] = df['Enddate'] + " " + df['Endhour']
df['end'] = pd.to_datetime(df['end'])
下一步,计算到下一条记录的间隔,确保先排序:

df = df.sort_values(['Matricule','start'])
df['gap_to_next'] = (df['start'].shift(-1) - df['end'])
处理不同矩阵之间的不匹配:

cut = df['Matricule'] != df['Matricule'].shift(-1)
df.loc[cut, 'gap_to_next'] = np.nan
定义一个布尔序列,显示需要插入新行的位置。我使用了你的要求大约30分钟,但添加了一些关于确保事情间隔不超过1天的内容,因为你的样本中有一个案例似乎暗示了这一点。根据需要调整:

should_insert_next = ( (df['gap_to_next'] > pd.Timedelta(30, 'min')) & (df['gap_to_next'] < pd.Timedelta(24, 'hr')) )
使用这些行作为模板,将插入时间调整为所需的时间。看起来你想要30分钟的开始时间来结束新记录

new_rows['start'] = new_rows['end']
new_rows['end'] = new_rows['start'] + pd.Timedelta(30, 'min')
如果原始的日期和小时列不是字符串,可以在下面的后面添加一个步骤,将它们转换为任何类型

new_rows['Startdate'] = new_rows['start'].dt.strftime("%Y-%m-%d")
new_rows['Enddate'] = new_rows['end'].dt.strftime("%Y-%m-%d")
new_rows['Starthour'] = new_rows['start'].dt.strftime("%H:%M:%S")
new_rows['Endhour'] = new_rows['end'].dt.strftime("%H:%M:%S")
最后,将旧的和新的连接在一起,然后:

final = pd.concat([df, new_rows])
final = final.sort_values(['Matricule','start'])
final = final.drop(columns=['gap_to_next','start','end'])
final = final.reset_index(drop=True)
这使得:

print(final)
  Matricule   Startdate Starthour     Enddate   Endhour
0      5357  2019-01-08  14:21:06  2019-01-08  14:34:42
1      5357  2019-01-08  14:34:42  2019-01-08  15:04:42
2      5357  2019-01-08  15:29:23  2019-01-08  15:33:43
3      5357  2019-01-08  15:33:43  2019-01-08  16:03:43
4      5357  2019-01-08  19:51:11  2019-01-08  20:02:48
5      5357  2019-03-08  20:05:49  2019-03-08  21:04:52
6      aaaa  2019-01-08  14:17:51  2019-01-08  14:32:10
7      aaaa  2019-01-08  14:32:10  2019-01-08  15:02:10
8      aaaa  2019-01-08  18:21:16  2019-01-08  18:39:26

        ​

所以你总是想插入一行,插入行的时间应该是之前相邻行的时间中点?如果n行的开始时间和n-1行的结束时间之间没有30分钟,那么我不想添加一行。但如果有,我想添加n-1行,但时间与您所说的不同
print(final)
  Matricule   Startdate Starthour     Enddate   Endhour
0      5357  2019-01-08  14:21:06  2019-01-08  14:34:42
1      5357  2019-01-08  14:34:42  2019-01-08  15:04:42
2      5357  2019-01-08  15:29:23  2019-01-08  15:33:43
3      5357  2019-01-08  15:33:43  2019-01-08  16:03:43
4      5357  2019-01-08  19:51:11  2019-01-08  20:02:48
5      5357  2019-03-08  20:05:49  2019-03-08  21:04:52
6      aaaa  2019-01-08  14:17:51  2019-01-08  14:32:10
7      aaaa  2019-01-08  14:32:10  2019-01-08  15:02:10
8      aaaa  2019-01-08  18:21:16  2019-01-08  18:39:26

        ​