Python 使用“应用而不迭代”将代码分配给数据帧_Python_Pandas

Python 使用“应用而不迭代”将代码分配给数据帧

python pandas

Python 使用“应用而不迭代”将代码分配给数据帧,python,pandas,Python,Pandas,我想根据给定行中的数字序列对数字数据帧进行编码。数字序列本身具有我想要捕捉的含义。我可以使用循环来解决这个问题，但这非常耗时启动df： 2017-10-06 2017-10-07 2017-10-08 id 1 1.0 46.0 5.0 2 16.0 1.0

我想根据给定行中的数字序列对数字数据帧进行编码。数字序列本身具有我想要捕捉的含义。我可以使用循环来解决这个问题，但这非常耗时

启动df：

   2017-10-06  2017-10-07  2017-10-08
id                                                                 
1         1.0        46.0         5.0   
2        16.0         1.0         0.0   
3        23.0       123.0         0.0   
4         1.0         0.0         0.0   
5         0.0         0.0         0.0

我创建了一个传递每列的函数。它需要了解上一列，并指定一个编码字符串

编码的df如下所示：

   2017-10-06  2017-10-07  2017-10-08
id                                                                 
1      active      active      active   
2      active      active  inactive_1   
3      active      active  inactive_1   
4      active  inactive_1  inactive_1   
5  inactive_1  inactive_1  inactive_3

   2017-10-06  2017-10-07  2017-10-08
id                                                                 
1      active      active      active   
2      active      active           1   
3      active      active           1   
4      active           1           2   
5           1           2           3

我目前能够遍历每一列一次，分配“活动”以方便查找非零值，然后分配一个零计数。如果找到零，请查看上一个值并加1，除非最后一个值为“活动”，在这种情况下，从1开始

for i in range(1, len(cols)):
    test = cols[i]
    prev = cols[i-1]
    df[cols[i]] = df.apply(lambda row: assign_active(row[prev], row[test]), axis=1)

中间df看起来像：

   2017-10-06  2017-10-07  2017-10-08
id                                                                 
1      active      active      active   
2      active      active  inactive_1   
3      active      active  inactive_1   
4      active  inactive_1  inactive_1   
5  inactive_1  inactive_1  inactive_3

   2017-10-06  2017-10-07  2017-10-08
id                                                                 
1      active      active      active   
2      active      active           1   
3      active      active           1   
4      active           1           2   
5           1           2           3

然后我再次迭代，任何非“活动”的内容都会使用相同的方法进行适当编码，即迭代每一列并在函数中使用apply。该函数查看特定值并分配正确的代码，您会注意到没有“active_2”，因此它不仅仅是处理字符串

我想找出一种方法来做到这一点，而不必遍历每一列，更不用说两次了

谢谢

IIUC，定义一个使用np的函数。注意，这非常快

def foo(s):
     return np.where(s > 0, 'active', 'inactive_' + (s.eq(0).cumsum()).astype(str))

现在，沿着第一个轴调用df.apply

df = df.apply(foo, 1)
print(df)
    2017-10-06  2017-10-07  2017-10-08
id                                    
1       active      active      active
2       active      active  inactive_1
3       active      active  inactive_1
4       active  inactive_1  inactive_2
5   inactive_1  inactive_2  inactive_3

考虑到您的中间输出，这可能就是您想要的。

coldspeed-这解决了我描述的问题-谢谢！我的问题已经改变，所以我需要在看到另一个值后重置累积和。例如，如果[5，'2017-10-07']中的值在起始df中为1，则输出df中的底行需要为[inactive_1，active，inactive_1]