Python 比较给定列2×2的值的最佳方法
为了创建一个新的列,我将对数据帧中给定列的值进行二乘二的比较(前一列与当前列) 我的输入df如下所示:Python 比较给定列2×2的值的最佳方法,python,pandas,Python,Pandas,为了创建一个新的列,我将对数据帧中给定列的值进行二乘二的比较(前一列与当前列) 我的输入df如下所示: timestamp charging 0 2017-10-15 18:36:46 1 1 2017-10-15 18:41:54 1 2 2017-10-15 18:46:54 1 3 2017-10-15 18:50:35 1 4 2017-10-15 18:54:14 -1 5 2017
timestamp charging
0 2017-10-15 18:36:46 1
1 2017-10-15 18:41:54 1
2 2017-10-15 18:46:54 1
3 2017-10-15 18:50:35 1
4 2017-10-15 18:54:14 -1
5 2017-10-15 18:57:54 -1
6 2017-10-15 19:02:47 -1
7 2017-10-15 19:11:41 1
8 2017-10-15 19:21:25 1
9 2017-10-15 19:31:04 -1
我只想在充电值从正变为负或从负变为正时,创建具有相同时间戳值的新列。
输出应为:
timestamp charging period start/end time
0 2017-10-15 18:36:46 1 NaT
1 2017-10-15 18:41:54 1 NaT
2 2017-10-15 18:46:54 1 NaT
3 2017-10-15 18:50:35 1 2017-10-15 18:50:35
4 2017-10-15 18:54:14 -1 2017-10-15 18:54:14
5 2017-10-15 18:57:54 -1 NaT
6 2017-10-15 19:02:47 -1 2017-10-15 19:02:47
7 2017-10-15 19:11:41 1 2017-10-15 19:11:41
8 2017-10-15 19:21:25 1 2017-10-15 19:21:25
9 2017-10-15 19:31:04 -1 2017-10-15 19:31:04
我这样做的方式不好(但可以使用以下代码):
df['period start/end time'] = pd.NaT
for ind in df.index:
if ind > 0:
if df.at[ind, 'charging'] > 0 and df.at[ind-1, 'charging'] < 0:
df.at[ind-1, 'period start/end time'] = df.at[ind-1, 'timestamp']
df.at[ind, 'period start/end time'] = df.at[ind, 'timestamp']
if df.at[ind, 'charging'] < 0 and df.at[ind-1, 'charging'] > 0:
df.at[ind-1, 'period start/end time'] = df.at[ind-1, 'timestamp']
df.at[ind, 'period start/end time'] = df.at[ind, 'timestamp']
df[“时段开始/结束时间”]=pd.NaT
对于df.index中的ind:
如果ind>0:
如果[ind'充电']>0且[ind-1'充电']<0:
df.at[ind-1,'期间开始/结束时间']=df.at[ind-1,'时间戳']
df.at[ind,'期间开始/结束时间']=df.at[ind,'时间戳']
如果[ind'充电']<0且[ind-1'充电']>0时:
df.at[ind-1,'期间开始/结束时间']=df.at[ind-1,'时间戳']
df.at[ind,'期间开始/结束时间']=df.at[ind,'时间戳']
这太费时了!,有没有办法更快更好地完成这项工作?IIUC
mask = (df.charging != df.charging.shift().bfill())
df.loc[mask | mask.shift(-1).fillna(False), 'new'] = df.timestamp
timestamp charging new
0 2017-10-15 18:36:46 1 NaT
1 2017-10-15 18:41:54 1 NaT
2 2017-10-15 18:46:54 1 NaT
3 2017-10-15 18:50:35 1 2017-10-15 18:50:35
4 2017-10-15 18:54:14 -1 2017-10-15 18:54:14
5 2017-10-15 18:57:54 -1 NaT
6 2017-10-15 19:02:47 -1 2017-10-15 19:02:47
7 2017-10-15 19:11:41 1 2017-10-15 19:11:41
8 2017-10-15 19:21:25 1 2017-10-15 19:21:25
9 2017-10-15 19:31:04 -1 2017-10-15 19:31:04
IIUC
创建遮罩:
condition = df.charging.diff().bfill().ne(0) | df.charging.diff().shift(-1).ne(0)
使用np.where
df['new'] = np.where(condition, df.timestamp, pd.NaT)
timestamp charging new
0 2017-10-1518:36:46 1 NaT
1 2017-10-1518:41:54 1 NaT
2 2017-10-1518:46:54 1 NaT
3 2017-10-1518:50:35 1 2017-10-1518:50:35
4 2017-10-1518:54:14 -1 2017-10-1518:54:14
5 2017-10-1518:57:54 -1 NaT
6 2017-10-1519:02:47 -1 2017-10-1519:02:47
7 2017-10-1519:11:41 1 2017-10-1519:11:41
8 2017-10-1519:21:25 1 2017-10-1519:21:25
9 2017-10-1519:31:04 -1 2017-10-1519:31:04
创建遮罩:
condition = df.charging.diff().bfill().ne(0) | df.charging.diff().shift(-1).ne(0)
使用np.where
df['new'] = np.where(condition, df.timestamp, pd.NaT)
timestamp charging new
0 2017-10-1518:36:46 1 NaT
1 2017-10-1518:41:54 1 NaT
2 2017-10-1518:46:54 1 NaT
3 2017-10-1518:50:35 1 2017-10-1518:50:35
4 2017-10-1518:54:14 -1 2017-10-1518:54:14
5 2017-10-1518:57:54 -1 NaT
6 2017-10-1519:02:47 -1 2017-10-1519:02:47
7 2017-10-1519:11:41 1 2017-10-1519:11:41
8 2017-10-1519:21:25 1 2017-10-1519:21:25
9 2017-10-1519:31:04 -1 2017-10-1519:31:04
第8行不应该也有时间戳吗?是的,我的错误第8行不应该也有时间戳吗?是的,我的错误花了我一段时间来理解逻辑,但它相当聪明!,thx:)我花了一段时间才弄明白其中的逻辑,但它相当聪明!,thx:)