Python 如何在数据帧中找到另一个没有循环的相关行_Python_Pandas

Python 如何在数据帧中找到另一个没有循环的相关行

python pandas

Python 如何在数据帧中找到另一个没有循环的相关行,python,pandas,Python,Pandas,在这个数据帧中，我试图编写一个函数来调用级别比所选行低的行。例如，对于零件1_3，较低级别的零件是1_2。对于1_7，它是1_5；对于1_9，它是1_7，依此类推。我已经通过for循环进行了此操作，但我想知道是否有一种更有效的方法来完成此操作。有一种无循环的方法来完成此操作，但它会让你头晕目眩。顺便说一句，我假设您希望通过产品编号将其分开，因此分组方式为： product_no part_no level 1 1_1 1 1

在这个数据帧中，我试图编写一个函数来调用级别比所选行低的行。例如，对于零件

1_3

，较低级别的零件是

1_2

。对于

1_7

，它是

1_5

；对于

1_9

，它是

1_7

，依此类推。我已经通过for循环进行了此操作，但我想知道是否有一种更有效的方法来完成此操作。

有一种无循环的方法来完成此操作，但它会让你头晕目眩。顺便说一句，我假设您希望通过

产品编号

将其分开，因此分组方式为：

product_no    part_no    level  
1              1_1         1    
1              1_2         1    
1              1_3         2     
1              1_4         1     
1              1_5         1  
1              1_6         2 
1              1_7         2 
1              1_8         3 
1              1_9         3 
1              1_10        2 
2              2_1         1 
2              2_2         1 
2              2_3         2  
2              2_4         1

结果:

def last_part_no(group):
    dummies = pd.get_dummies(group['level'])

    idx = dummies.index.to_series()
    last_index = dummies.apply(lambda col: idx.where(col != 0, np.nan).fillna(method='ffill'))
    last_index[0] = np.nan

    idx = last_index.lookup(last_index.index, group['level'] - 1)
    return pd.DataFrame({
        'last_prod_no': group.reindex(idx)['part_no'].values
    }, index=group.index)

df['last_part_no'] = df.groupby('product_no').apply(last_part_no)

下面是它的工作原理：

groupby

将数据帧除以

product\u no

并将每个子帧发送到

last\u part\u no

功能中：

    product_no part_no  level last_part_no
0            1     1_1      1          NaN
1            1     1_2      1          NaN
2            1     1_3      2          1_2
3            1     1_4      1          NaN
4            1     1_5      1          NaN
5            1     1_6      2          1_5
6            1     1_7      2          1_5
7            1     1_8      3          1_7
8            1     1_9      3          1_7
9            1    1_10      2          1_5
10           2     2_1      1          NaN
11           2     2_2      1          NaN
12           2     2_3      2          2_2
13           2     2_4      1          NaN

真正的工作发生在

last\u part\u no

函数中。假设函数正在处理第一个子帧，这相当于调用：

    product_no part_no  level
0            1     1_1      1
...
-----------------------------
10           2     2_1      1
...

以下是

子帧的值，供您参考：
subframe = df[df['product_no'] == 1]
last_part_no(subframe)

假人
是级别
列的一种热编码形式：
   product_no part_no  level
0           1     1_1      1
1           1     1_2      1
2           1     1_3      2
3           1     1_4      1
4           1     1_5      1
5           1     1_6      2
6           1     1_7      2
7           1     1_8      3
8           1     1_9      3
9           1    1_10      2

接下来，我们获取dummies.index
并根据每列的“热度”对其进行更改：如果行是“热度”，则保留索引的值，否则替换为np.nan
。然后，我们向前填充这些nan
：
   1  2  3
0  1  0  0      --> this row is level 1 since the column 1 is "hot"
1  1  0  0      
2  0  1  0      --> this row is level 2 since the column 2 is "hot"
3  1  0  0
4  1  0  0
5  0  1  0
6  0  1  0
7  0  0  1      --> this row is level 3 since the column 3 is "hot"
8  0  0  1
9  0  1  0

index  1       np.where(...)      fillna(...)
0      1       0                  0             --> as of index 0, last row with level 1 is row 0
1      1       1                  1
2      0       np.nan             1
3      1       3                  3
4      1  ==>  4             ==>  4
5      0       np.nan             4
6      0       np.nan             4
7      0       np.nan             4             --> as of index 7, last row with level 1 is row 4
8      0       np.nan             4
9      0       np.nan             4

对所有3列重复此操作，您的上一个索引
框架如下所示（列0是为方便而创建的，它都是nan
）：
现在，让我们回到我们的level
专栏（即子帧['level']
）。要查找最后一个零件号
，请转到级别-1
：
     0    1    2    3  
0  NaN  0.0  NaN  NaN  
1  NaN  1.0  NaN  NaN  
2  NaN  1.0  2.0  NaN  
3  NaN  3.0  2.0  NaN  
4  NaN  4.0  2.0  NaN  
5  NaN  4.0  5.0  NaN  
6  NaN  4.0  6.0  NaN  
7  NaN  4.0  6.0  7.0  --> as of index 7, last row with level 1 is 4, with level 2 is 6, with level 3 is 7
8  NaN  4.0  6.0  8.0  
9  NaN  4.0  9.0  8.0  --> as of index 9, last row with level 1 is 4, with level 2 is 9, with level 3 is 8

将其与last\u索引
框架相结合，您可以为每一行找到包含last\u部分编号
的行的索引。这就是查找
调用的目的：
    level  level-1
0       1        0
1       1        0
2       2        1
3       1        0
4       1        0
5       2        1
6       2        1
7       3        2
8       3        2
9       2        1

最后一步是按照idx
中规定的顺序将零件号
列变为最后一个零件号
：
The row index of last_part_no ...                 idx
                                                  ---
   for row 0 is in row 0, col 0 of last_index --> nan
           1       row 1, col 0               --> nan
           2       row 2, col 1               --> 1
           3       row 3, col 0               --> nan
           4       row 4, col 0               --> nan
           5       row 5, col 1               --> 4
           6       row 6, col 1               --> 4
           7       row 7, col 2               --> 6
           8       row 8, col 2               --> 6
           9       row 9, col 1               --> 4

您的逻辑似乎是lower\u leveled\u part=last part\u no其中level==当前level-1
。按照这个逻辑，1_10
的较低级别部分是否等于1_5？是的，这正是我的情况。首先，哇！这个方法给我留下了深刻的印象，正如你在答案的开头所说，它确实让我头晕目眩。谢谢你花了这么多时间来解释，没有它我可能有点困惑，但现在，一切都很清楚。我想在最后一个子帧，第2行的最后一个部分应该是1，如果你不介意我问的话，如果我不是太多的话，可以用类似的方法来解决我目前在同一类型的数据帧上遇到的另一个类似的问题吗？我的抄写错误。谢谢你的更正
   product_no part_no  level  last_part_no
0           1     1_1      1           nan
1           1     1_2      1           nan
2           1     1_3      2           1_2
3           1     1_4      1           nan
4           1     1_5      1           nan
5           1     1_6      2           1_5
6           1     1_7      2           1_5
7           1     1_8      3           1_7
8           1     1_9      3           1_7
9           1    1_10      2           1_5