Python 按分组并将一列的最后一个值与另一列的第一个值相减
我试图添加一个新列,其中包含一列的第一个值与另一列的最后一个值之间的差值 我正在使用这个命令Python 按分组并将一列的最后一个值与另一列的第一个值相减,python,pandas,dataframe,group-by,Python,Pandas,Dataframe,Group By,我试图添加一个新列,其中包含一列的第一个值与另一列的最后一个值之间的差值 我正在使用这个命令 df['diff']=df.groupby(['T_Id'])['EndMeterReading'].max()-df['StartMeterReading'].min() 但它用NaN 我怎样才能达到我想要的结果 原始数据帧 +------+-------+--------------+------------+ | D_Id | T_Id | StartReading | EndReading
df['diff']=df.groupby(['T_Id'])['EndMeterReading'].max()-df['StartMeterReading'].min()
但它用NaN
我怎样才能达到我想要的结果
原始数据帧
+------+-------+--------------+------------+
| D_Id | T_Id | StartReading | EndReading |
+------+-------+--------------+------------+
| 1 | 4716a | 4323.17 | 4324.8 |
| 1 | 4716a | 4324.96 | 4325.34 |
| 1 | 4716a | 4326.47 | 4327.22 |
| 1 | 4716a | 4327.4 | 4328.43 |
| 1 | 4716a | 4328.85 | 4330.73 |
| 1 | 4716b | 4346.65 | 4347.62 |
| 1 | 4716b | 4347.67 | 4349.88 |
| 1 | 4716b | 4351.62 | 4351.83 |
| 1 | 4716b | 4352.88 | 4354.32 |
| 1 | 4716b | 4354.93 | 4355.14 |
| 1 | 4716b | 4355.2 | 4355.82 |
| 1 | 4716b | 4356.91 | 4357.37 |
| 1 | 4716b | 4357.74 | 4358.26 |
| 1 | 4716b | 4359.89 | 4360.46 |
| 1 | 4716b | 4360.61 | 4361.43 |
| 1 | 4716b | 4361.47 | 4362.11 |
| 1 | 4716b | 4362.88 | 4368.49 |
| 1 | 4716b | 4368.94 | 4369.78 |
| 1 | 4716b | 4370.91 | 4371.25 |
| 1 | 4716b | 4372.67 | 4372.77 |
+------+-------+--------------+------------+
期望输出:
+------+-------+--------------+------------+------------------+
| D_Id | T_Id | StartReading | EndReading | Diff |
+------+-------+--------------+------------+------------------+
| 1 | 4716a | 4323.17 | 4324.8 | 7.56 |
| 1 | 4716a | 4324.96 | 4325.34 | 7.56 |
| 1 | 4716a | 4326.47 | 4327.22 | 7.56 |
| 1 | 4716a | 4327.4 | 4328.43 | 7.56 |
| 1 | 4716a | 4328.85 | 4330.73 | 7.56 |
| 1 | 4716b | 4346.65 | 4347.62 | 26.12 |
| 1 | 4716b | 4347.67 | 4349.88 | 26.12 |
| 1 | 4716b | 4351.62 | 4351.83 | 26.12 |
| 1 | 4716b | 4352.88 | 4354.32 | 26.12 |
| 1 | 4716b | 4354.93 | 4355.14 | 26.12 |
| 1 | 4716b | 4355.2 | 4355.82 | 26.12 |
| 1 | 4716b | 4356.91 | 4357.37 | 26.12 |
| 1 | 4716b | 4357.74 | 4358.26 | 26.12 |
| 1 | 4716b | 4359.89 | 4360.46 | 26.12 |
| 1 | 4716b | 4360.61 | 4361.43 | 26.12 |
| 1 | 4716b | 4361.47 | 4362.11 | 26.12 |
| 1 | 4716b | 4362.88 | 4368.49 | 26.12 |
| 1 | 4716b | 4368.94 | 4369.78 | 26.12 |
| 1 | 4716b | 4370.91 | 4371.25 | 26.12 |
| 1 | 4716b | 4372.67 | 4372.77 | 26.12 |
+------+-------+--------------+------------+------------------+
与max
和min
功能一起使用,用于系列
,其大小与原始的数据帧
相同,因此可以正确减去:
df['diff']= (df.groupby('T_Id')['EndReading'].transform('max')-
df.groupby('T_Id')['StartReading'].transform('min'))
print (df)
D_Id T_Id StartReading EndReading diff
0 1 4716a 4323.17 4324.80 7.56
1 1 4716a 4324.96 4325.34 7.56
2 1 4716a 4326.47 4327.22 7.56
3 1 4716a 4327.40 4328.43 7.56
4 1 4716a 4328.85 4330.73 7.56
5 1 4716b 4346.65 4347.62 26.12
6 1 4716b 4347.67 4349.88 26.12
7 1 4716b 4351.62 4351.83 26.12
8 1 4716b 4352.88 4354.32 26.12
9 1 4716b 4354.93 4355.14 26.12
10 1 4716b 4355.20 4355.82 26.12
11 1 4716b 4356.91 4357.37 26.12
12 1 4716b 4357.74 4358.26 26.12
13 1 4716b 4359.89 4360.46 26.12
14 1 4716b 4360.61 4361.43 26.12
15 1 4716b 4361.47 4362.11 26.12
16 1 4716b 4362.88 4368.49 26.12
17 1 4716b 4368.94 4369.78 26.12
18 1 4716b 4370.91 4371.25 26.12
19 1 4716b 4372.67 4372.77 26.12
使用
groupby
查找第一个
和最后一个
,然后合并
返回原始df
df2=df.groupby(['T_Id']).agg({'StartReading':'first','EndReading':'last'})。重置索引(0)
df2['Diff']=df2['EndReading']-df2['StartReading']
合并(df2['T\u Id','Diff']],how='left',on='T\u Id')
您能告诉我如何在上面的df中添加一个新的列,除了组的最后一次出现/行(如0或0)之外,所有列都为空吗1@M_S_N-由1
填充的新列的索引是什么?由0
填充的索引是什么?例如,用0,0,0填充4716a的第一个匹配项,但最后一个匹配项在新列中有1column@M_S_Nso索引0
和5
在新列中是1
,所有其他值都是0
。@M\u S\N-您可以检查df['new']=(~df['T\u Id'].重复(keep='last')。astype(int)
?